idnits 2.17.1 

draft-ietf-idn-requirements-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 643 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 2 instances of too long lines in the document, the longest one
     being 1 character in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (23 November 2001) is 8188 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '1' on line 340

  -- Looks like a reference, but probably isn't: '2' on line 346

  -- Looks like a reference, but probably isn't: '3' on line 350

  -- Looks like a reference, but probably isn't: '4' on line 356

  -- Looks like a reference, but probably isn't: '5' on line 360

  -- Looks like a reference, but probably isn't: '6' on line 365

  -- Looks like a reference, but probably isn't: '7' on line 371

  -- Looks like a reference, but probably isn't: '8' on line 379

  -- Looks like a reference, but probably isn't: '9' on line 384

  -- Looks like a reference, but probably isn't: '10' on line 388

  -- Looks like a reference, but probably isn't: '11' on line 392

  -- Looks like a reference, but probably isn't: '12' on line 398

  -- Looks like a reference, but probably isn't: '13' on line 403

  -- Looks like a reference, but probably isn't: '14' on line 408

  -- Looks like a reference, but probably isn't: '15' on line 411

  -- Looks like a reference, but probably isn't: '16' on line 415

  -- Looks like a reference, but probably isn't: '17' on line 421

  -- Looks like a reference, but probably isn't: '18' on line 426

  -- Looks like a reference, but probably isn't: '19' on line 444

  -- Looks like a reference, but probably isn't: '20' on line 451

  -- Looks like a reference, but probably isn't: '21' on line 454

  -- Looks like a reference, but probably isn't: '22' on line 461

  -- Looks like a reference, but probably isn't: '23' on line 468

  -- Looks like a reference, but probably isn't: '24' on line 472

  -- Looks like a reference, but probably isn't: '25' on line 476

  == Missing Reference: 'UTR15' is mentioned on line 477, but not defined

  -- Looks like a reference, but probably isn't: '26' on line 479

  -- Looks like a reference, but probably isn't: '27' on line 484

  -- Looks like a reference, but probably isn't: '28' on line 486

  -- Looks like a reference, but probably isn't: '29' on line 492

  -- Looks like a reference, but probably isn't: '30' on line 496

  == Unused Reference: 'RFC2119' is defined on line 539, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2279' is defined on line 551, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2825' is defined on line 560, but no explicit
     reference was found in the text

  == Unused Reference: 'IDNCOMP' is defined on line 567, but no explicit
     reference was found in the text

  == Unused Reference: 'UNICODE30' is defined on line 577, but no explicit
     reference was found in the text

  == Unused Reference: 'UAX15' is defined on line 586, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CHARREQ'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DNSEXT'

  ** Downref: Normative reference to an Unknown state RFC: RFC  952

  ** Obsolete normative reference: RFC 2278 (Obsoleted by RFC 2978)

  ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629)

  ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034,
     RFC 4035)

  ** Obsolete normative reference: RFC 2553 (Obsoleted by RFC 3493)

  ** Downref: Normative reference to an Informational RFC: RFC 2825

  ** Downref: Normative reference to an Informational RFC: RFC 2826

  == Outdated reference: A later version (-01) exists of
     draft-ietf-idn-compare-00

  -- Possible downref: Normative reference to a draft: ref. 'IDNCOMP' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE30'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'US-ASCII'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX15'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR17'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR21'


     Summary: 12 errors (**), 0 flaws (~~), 11 warnings (==), 43 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	IETF IDN Working Group               Editors Zita Wenzel, James Seng
2	Internet Draft                       draft-ietf-idn-requirements-07.txt
3	23 May 2001                          Expires 23 November 2001

5	             Requirements of Internationalized Domain Names

7	Status of this Memo

9	This document is an Internet-Draft and is in full conformance with
10	all provisions of Section 10 of RFC2026.

12	Internet-Drafts are working documents of the Internet Engineering
13	Task Force (IETF), its areas, and its working groups. Note that
14	other groups may also distribute working documents as
15	Internet-Drafts.

17	Internet-Drafts are draft documents valid for a maximum of six
18	months and may be updated, replaced, or obsoleted by other
19	documents at any time. It is inappropriate to use Internet-
20	Drafts as reference material or to cite them other than as
21	"work in progress."

23	The list of current Internet-Drafts can be accessed at
24	http://www.ietf.org/ietf/1id-abstracts.txt

26	The list of Internet-Draft Shadow Directories can be accessed at
27	http://www.ietf.org/shadow.html.

29	Intended Scope

31	The intended scope of this document is to explore requirements for the
32	internationalization of domain names on the Internet. It is not
33	intended to document user requirements. It is recommended that
34	solutions not necessarily be within the DNS itself, but could be a layer
35	interjected between the application and the DNS. Proposals SHOULD
36	fulfill most, if not all, of the requirements. This document MAY be
37	updated based on clinical trials.

39	Abstract

41	This document describes the requirement for encoding international
42	characters into DNS names and records. This document is guidance for
43	developing protocols for internationalized domain names.

45	1. Introduction

47	At present, the encoding of Internet domain names is restricted to a
48	subset of 7-bit ASCII (ISO/IEC 646). HTML, XML, IMAP, FTP, and many
49	other text based items on the Internet have already been at least
50	partially internationalized. It is important for domain names to be
51	similarly internationalized or for an equivalent solution to be found.
52	This document assumes that the most effective solution involves putting
53	non-ASCII names inside some parts of the overall DNS system although
54	such assumption may not be the consensus of the IETF community.

56	This document is being discussed on the "idn" mailing list. To join the
57	list, send a message to <majordomo@ops.ietf.org> with the words
58	"subscribe idn" in the body of the message. Archives of the mailing
59	list can also be found at ftp://ops.ietf.org/pub/lists/idn*.

61	1.1 Definitions and Conventions

63	A language is a way that humans interact. In computerised form, a text
64	in a written language can be expressed as a string of characters.
65	The same set of characters can often be used for many written languages,
66	and many written languages can be expressed using different scripts.
67	The same characters are often shown with somewhat different glyphs
68	(shapes) for display of a text depending on the font used, the
69	automatic shaping applied, or the automatic formation of ligatures. In
70	addition, the same characters can be shown with somewhat different
71	glyphs (shapes) for display of a text depending on the language being
72	used, even within the same font or through automatic font change.

74	A character is a member of a set of elements used for organization,
75	control, or representation of textual data.

77	A graphic character is a character, other than a control function,
78	that has a visual representation normally handwritten, printed, or
79	displayed.

81	Characters mentioned in this document are identified by their position
82	in the Unicode [UNICODE] character set.  This character set is also
83	known as the UCS [ISO10646]. The notation U+12AB, for example, indicates
84	the character at position 12AB (hexadecimal) in the Unicode character
85	set.  Note that the use of this notation is not an indication of a
86	requirement to use Unicode.

88	Examples quoted in this document should be considered as a method to
89	further explain the meanings and principles adopted by the document. It
90	is not a requirement for the protocol to satisfy the examples.

92	Unicode Technical Report 17 [UTR17] defines a character encoding
93	model in several levels (much of the text below is quoted from
94	Unicode Technical Report 17 [UTR17]):

96	1. A abstract character repertoire (ACR) is defined as the set of
97	   abstract characters to be encoded, normally a familiar alphabet
98	   or symbol set. The word abstract just means that these objects
99	   are defined by convention (such as the 26 letters of the English
100	   alphabet, uppercase and lowercase forms). Examples: the ASCII
101	   repertoire, the Latin-15 repertoire, the JIS X 0208 repertoire,
102	   the UCS repertiore (of a particular version).

104	2. A coded character set (CCS) is defined to be a mapping from a
105	   set of abstract characters to the set of non-negative integers.
106	   This range of integers need not be contiguous. An abstract
107	   character is defined to be in a coded character set if the coded
108	   character set maps from it to an integer. That integer is said
109	   to be the code point for the abstract character. That abstract
110	   character is then an encoded character. Examples: ASCII, Latin-15,
111	   JIS X 0208, the UCS.

113	3. A character encoding form (CEF) is a mapping from the set of integers
114	   used in a CCS to the set of sequences of code units. A code unit
115	   is an integer occupying a specified binary width in a computer
116	   architecture, such as a septet, an octet, or a 16-bit unit. The
117	   encoding form enables character representation as actual data in
118	   a computer. The sequences of code units do not necessarily have the
119	   same length. Examples: ASCII, Latin-15, Shift-JIS, UTF-16, UTF-8.

121	4. A character encoding scheme (CES) is a mapping of code units into
122	   serialized octet sequences. Character encoding schemes are relevant
123	   to the issue of cross-platform persistent data involving code units
124	   wider than a byte, where byte-swapping may be required to put data
125	   into the byte polarity canonical for a particular platform.

127	   The CES may involve two or more CCS's, and may include code units
128	   (e.g. single shifts, SI/SO, or escape sequences) that are not part
129	   of the CCS per se, but which are defined by the character encoding
130	   architecture and which may require an external registry of particular
131	   values (as for the ISO 2022 escape sequences). In such a case, the
132	   CES is called a compound CES. (A CES that only involves a single
133	   CCS is called a simple CES.)

135	   Examples: ASCII, Latin-15, Shift-JIS, UTF-16BE, UTF-16LE, UTF-8.

137	5. The mapping from an abstract character repertoire (ACR) to a
138	   serialised sequence of octets is called a Character Map (CM). A simple
139	   character map thus implicitly includes a CCS, a CEF, and a CES,
140	   mapping from abstract characters to code units to octets. A compound
141	   character map includes a compound CES, and thus includes more than one
142	   CCS and CEF. In that case, the abstract character repertoire for the
143	   character map is the union of the repertoires covered by the coded
144	   character sets involved.

146	   Character Maps are the things that in the IAB architecture get IANA
147	   charset identifiers. A sequence of encoded characters must be
148	   unambiguously mapped onto a sequence of octets by the charset. The
149	   charset must be specified in all instances, as in Internet
150	   protocols, where textual content is treated as a ordered sequence
151	   of octets, and where the textual content must be reconstructible
152	   from that sequence of octets.  Charset names are registered by the
153	   IANA according to procedures documented in [RFC2278]. In many cases,
154	   the same name is used for both a character map and for a character
155	   encoding scheme, such as UTF-16BE. Typically this is done for simple
156	   character maps when such usage is clear from context.

158	6. A transfer encoding syntax (TES) is a reversible transform of encoded
159	   data which may (or may not) include textual data represented in
160	   one or more character encoding schemes.  Examples: 8bit,
161	   Quoted-Printable, BASE64, UTF-7 (defunct), (UTF-5, and RACE).

163	1.2 Description of the Domain Name System

165	The Domain Name System is defined by [RFC1034] and [RFC1035], with
166	clarifications, extensions and modifications given in [RFC1123],
167	[RFC1996], [RFC2181], and others. Of special importance here is the
168	security extensions described in [RFC2535] and companions.

170	Over the years, many different words have been used to describe the
171	components of resource naming on the Internet (e.g., URI, URN); to make
172	certain that the set of terms used in this document are well-defined and
173	non-ambiguous, the definitions are given here.

175	A master server for a zone holds the main copy of that zone. This copy
176	is sometimes stored in a zone file. A slave server for a zone holds a
177	complete copy of the records for that zone. Slave servers MAY be either
178	authorized by the zone owner (secondary servers) or unauthorized
179	(so-called "stealth secondaries"). Master and authorized slave servers
180	are listed in the NS records for the zone, and are termed
181	"authoritative" servers. In many contexts outside this document, the
182	term "primary" is used interchangeably with "master" and "secondary" is
183	used interchangeably with "slave".

185	A caching server holds temporary copies of DNS records; it uses records
186	to answer queries about domain names. Further explanation of these terms
187	can be found in [RFC1034] and [RFC1996].

189	DNS names can be represented in multiple forms, with different
190	properties for internationalization. The most important ones are:

192	- Domain name: The binary representation of a name used internally in
193	  the DNS protocol. This consists of a series of components of 1-63
194	  octets, with an overall length limited to 255 octets (including the
195	  length fields).

197	- Master file format domain name: This is a representation of the name
198	  as a sequence of characters in some character sets; the common
199	  convention (derived from [RFC1035] section 5.1) is to represent the
200	  octets of the name as ASCII characters where the octet is in the set
201	  corresponding to the ASCII values for [a-zA-Z0-9-], using an escape
202	  mechanism (\x or \NNN) where not, and separating the components of the
203	  name by the dot character (".").

205	The form specified for most protocols using the DNS is a limited form of
206	the master file format domain name. This limited form is defined in
207	[RFC1034] Section 3.5 and [RFC1123]. In most implementations of
208	applications today, domain names in the Internet have been limited to
209	the much more restricted forms used, e.g., in email.  Those names are
210	limited to the upper- and lower-case letters a-z (interpreted in a
211	case-independent fashion), the digits, and the hyphen-minus, all in
212	ASCII.

214	1.3 Definition of "hostname" and "Internationalized Domain Name"

216	In the DNS protocols, a name is referred to as a sequence of octets.
217	However, when discussing requirements for internationalized domain
218	names, what we are looking for are ways to represent characters that
219	are meaningful for humans.

221	In this document, this is referred to as a "hostname". While this term
222	has been used for many different purposes over the years, it is used
223	here in the sense of sequence of characters (not octets) representing a
224	domain name conforming to the limited hostname syntax [RFC952].

226	This document attempts to define the requirements for an
227	"Internationalized Domain Name" (IDN). This is defined as a sequence of
228	characters that can be used in the context of functions where a hostname
229	is used today, but contains one or more characters that are outside the
230	set of characters specified as legal characters for host names
231	[RFC1123].

233	1.4 A multilayer model of the DNS function

235	The DNS can be seen as a multilayer function:

237	- The bottom layer is where the packets are passed across the Internet
238	  in a DNS query and a DNS response. At this level, what matters is
239	  the format and meaning of bits and octets in a DNS packet.

241	- Above that is the "DNS service", created by an infrastructure of DNS
242	  servers, NS records that point to those DNS servers, that is
243	  pointed to by the root servers (listed in the "root cache file" on
244	  each DNS server often called "named.cache"). It is at this level
245	  that the statement "the DNS has a single root" [RFC2826] makes
246	  sense, but still, what are being transferred are octets, not
247	  characters.

249	- Interfacing to the user is a service layer, often called "the resolver
250	  library", and often embedded in the operating system or system
251	  libraries of the client machines. It is at the top of this layer that
252	  the API calls commonly known as "gethostbyname" and "gethostbyaddress"
253	  reside.  These calls are modified to support IPv6 [RFC2553]. A
254	  conceptually similar layer exists in authoritative DNS servers,
255	  comprising the parts that generate "meaningful" strings in DNS files.
256	  Due to the popularity of the "master file" format, this layer often
257	  exists only in the administrative routines of the service maintainers.

259	- The user of this layer (resolver library) is the application programs
260	  that use the DNS, such as mailers, mail servers, Web clients, Web
261	  servers, Web caches, IRC clients, FTP clients, distributed file
262	  systems, distributed databases, and almost all other applications on
263	  TCP/IP.

265	Graphically, one can illustrate it like this:

267	+---------------+                            +---------------------+
268	| Application   |                            | (Base data)         |
269	+---------------+                            +---------------------+
270	      |  Application service interface                 |
271	      |  For ex. GethostbyXXXX interface               | (no standard)
272	+---------------+                            +---------------------+
273	| Resolver      |                            | Auth DNS server     |
274	+---------------+                            +---------------------+
275	      |     <-----   DNS service interface   ----->    |
276	+------------------------------------------------------------------+
277	|  DNS service                                                     |
278	|  +-----------------------+         +--------------------+        |
279	|  | Forwarding DNS server |         | Caching DNS server |        |
280	|  +-----------------------+         +--------------------+        |
281	|                                                                  |
282	|                 +-------------------------+                      |
283	|                 | Parent-zone DNS servers |                      |
284	|                 +-------------------------+                      |
285	|                                                                  |
286	|                 +-------------------------+                      |
287	|                 | Root DNS servers        |                      |
288	|                 +-------------------------+                      |
289	|                                                                  |
290	+------------------------------------------------------------------+

292	1.5 Service model of the DNS

294	The Domain Name Service is used for multiple purposes, each of which is
295	characterized by what it puts into the system (the query) and what it
296	expects as a result (the reply).

298	The most used ones in the current DNS are:

300	- Hostname-to-address service (A, AAAA, A6): Enter a hostname, and get
301	  back an IPv4 or IPv6 address.

303	- Hostname-to-Mail server service (MX): As above, but the expected
304	  return value is a hostname and a priority for SMTP servers.

306	- Address-to-hostname service (PTR): Enter an IPv4 or IPv6 address (in
307	  in-addr.arpa or ip6.arpa form respectively) and get back a hostname.

309	- Domain delegation service (NS). Enter a domain name and get back
310	  nameserver records (designated hosts which provide authoritive
311	  nameservice) for the domain.

313	New services are being defined, either as entirely new services (IPv6 to
314	hostname mapping using binary labels) or as embellishments to other
315	services (DNSSEC returning information about whether a given DNS service
316	is performed securely or not).

318	These services exist, conceptually, at the Application/Resolver
319	interface, NOT at the DNS-service interface. This document attempts to
320	set requirements for an equivalent of the "used services" given above,
321	where "hostname" is replaced by "Internationalized Domain Name". This
322	doesn't preclude the fact that IDN should work with any kind of DNS
323	queries.  IDN is a new service. Since existing protocols like SMTP or
324	HTTP use the old service, it is a matter of great concern how the new
325	and old services work together, and how other protocols can take
326	advantage of the new service.

328	2. General Requirements

330	These requirements address two concerns: The service offered to the
331	users (the application service), and the protocol extensions, if needed,
332	added to support this service.

334	In the requirements, we attempt to use the term "service" whenever a
335	requirement concerns the service, and "protocol" whenever a requirement
336	is believed to constrain the possible implementation.

338	2.1 Compatibility and Interoperability

340	[1] The DNS is essential to the entire Internet. Therefore, the service
341	MUST NOT damage present DNS protocol interoperability. It MUST make the
342	minimum number of changes to existing protocols on all layers of the
343	stack. It MUST continue to allow any system anywhere that implements
344	the IDN specification to resolve any internationalized domain name.

346	[2] The service MUST preserve the basic concept and facilities of domain
347	names as described in [RFC1034]. It MUST maintain a single, global,
348	universal, and consistent hierarchical namespace.

350	[3] The DNS protocol (the packet formats that go on the wire) MUST
351	NOT limit the codepoints that can be used.  A service defined on top of
352	the DNS, for instance the IDN-to-address function, MAY limit the
353	codepoints that can be used.  The service descriptions MUST describe
354	what limitations are imposed.

356	[4] The protocol MUST work for all features of DNS, IPv4, and
357	IPv6.  The protocol MUST NOT allow an IDN to be returned to a requestor
358	that requests the IP-to-(old)-domain-name mapping service.

360	[5] The same name resolution request MUST generate the same response,
361	regardless of the location or localization settings in the resolver, in
362	the master server, and in any slave servers involved in the resolution
363	process.

365	[6] The protocol MUST NOT require that the current DNS cache
366	servers be modified to support IDN.  If a cache server can have
367	additional functionality to support IDN better, this additional
368	functionality MUST NOT cause problems for resolving correctly
369	functioning current domain names.

371	[7] A caching server MUST NOT return data in response to a query that
372	would not have been returned if the same query had been presented to an
373	authoritative server. This applies fully for the cases when:

375	- The caching server does not know about IDN
376	- The caching server implements the whole specification
377	- The caching server implements a valid subset of the specification

379	[8] The service MAY modify the DNS protocol [RFC1035] and other related
380	work undertaken by the [DNSEXT] WG. However, these changes SHOULD be as
381	small as possible and any changes SHOULD be coordinated with the
382	[DNSEXT] WG.

384	[9] The protocol supporting the service SHOULD be as simple as possible
385	from the user's perspective. Ideally, users SHOULD NOT realize that IDN
386	was added on to the existing DNS.

388	[10] The best solution is one that maintains maximum feasible
389	compatibility with current DNS standards as long as it meets the other
390	requirements in this document.

392	[11] The protocol should handle with care new revisions of the CCS.
393	Undefined codepoints should not be allowed unless a new revision of
394	the protocol can handle it.  Protocol revisions should be tagged.

396	2.2 Internationalization

398	[12] Internationalized characters MUST be allowed to be represented and
399	used in DNS names and records. The protocol MUST specify what charset is
400	used when resolving domain names and how characters are encoded in DNS
401	records.

403	[13] Codepoints SHOULD be from the Universal Set as defined in
404	ISO-10646 or Unicode.  The specifics of versions MUST be defined in the
405	proposed solution.  If multiple charsets are allowed, each charset MUST
406	be tagged and conform to [RFC2277].

408	[14] The protocol MUST NOT reject any non-IDN characters (to be
409	defined) in any DNS queries or responses.

411	[15] The protocol SHOULD NOT invent a new CCS for the purpose of IDN
412	only and SHOULD use existing CES. The charset(s) chosen SHOULD also be
413	non-ambiguous.

415	[16] The protocol SHOULD NOT make any assumptions about the location
416	in a domain name where internationalization might appear.  In other
417	words, it SHOULD NOT differentiate between any part of a domain name
418	because this MAY impose restrictions on future internationalization
419	efforts.  For example, the TLDs can be internationalized.

421	[17] The protocol also SHOULD NOT make any localized restrictions in the
422	protocol. For example, an IDN implementation which only allows domain
423	names to use a single local script would immediately restrict
424	multinational organization.

426	[18] While there are a wide range of devices that use the DNS and a wide
427	range of characteristics of international scripts and methods of
428	domain name input and display, IDN is only concerned with the
429	protocol. Therefore, there MUST be a single way of encoding an
430	internationalized domain name within the DNS.

432	2.3 Canonicalization

434	Matching rules are a complicated process for IDN. Canonicalization
435	of characters MUST follow precise and predictable rules to ensure
436	consistency. [CHARREQ] is RECOMMENDED as a guide on canonicalization.

438	The DNS has to match a host name in a request with a host name held
439	in one or more zones. It also needs to sort names into order. It is
440	expected that some sort of canonicalization algorithm will be used as
441	the first step of this process. This section discusses some of the
442	properties which will be REQUIRED of that algorithm.

444	[19] To achieve interoperability, canonicalization MUST be done at a
445	single well-defined place in the DNS resolution process.  The protocol
446	MUST specify canonicalization; it MUST specify exactly where in the
447	DNS that canonicalization happens and does not happen; it MUST specify
448	how additions to ISO 10646 will affect the stability of the DNS and
449	the amount of work done on the root DNS servers.

451	[20] The canonicalization algorithm MAY specify operations for case,
452	ligature, and punctuation folding.

454	[21] In order to retain backwards compatibility with the current DNS,
455	the service MUST retain the case-insensitive comparison for [US-ASCII]
456	as specified in [RFC1035]. For example, Latin capital letter A (U+0041)
457	MUST match Latin small letter a (U+0061). [UTR21] describes some of
458	the issues with case mapping. Case-insensitivity for non [US-ASCII]
459	MUST be discussed in the protocol proposal.

461	[22] Case folding MUST be locale independent. If it were
462	locale-dependent, then different clients would get different results.
463	For example, Latin capital letter I (U+0049) case folded to lower case
464	in the Turkish context will become Latin small letter dotless i
465	(U+0131). But in the English context, it will become Latin small
466	letter i (U+0069).

468	[23] If other canonicalization is done, it MUST be done before the
469	domain name is resolved. Further, the canonicalization MUST be easily
470	upgradable as new languages and writing systems are added.

472	[24] Any conversion (case, ligature folding, punctuation folding, etc)
473	from what the user enters into a client to what the client asks for
474	resolution MUST be done identically on any request from any client.

476	[25] If the charset can be normalized, then it SHOULD be normalized
477	before it is used in IDN. Normalization SHOULD follow [UTR15].

479	[26] The protocol SHOULD avoid inventing a new normalization form
480	provided a technically sufficient one is available.

482	2.4 Operational Issues

484	[27] Zone files SHOULD remain easily editable.

486	[28] An IDN-capable resolver or server SHALL NOT generate more traffic
487	than a non-IDN-capable resolver or server would when resolving an
488	ASCII-only domain name.  The amount of traffic generated when resolving
489	an IDN SHALL be similar to that generated when resolving an ASCII-only
490	name.

492	[29] The service SHOULD NOT add new centralized administration for the
493	DNS. A domain administrator SHOULD be able to create internationalized
494	names as easily as adding current domain names.

496	[30] The protocol MUST work with DNSSEC.  The protocol MAY break
497	language sort order.

499	3. Security Considerations

501	Any solution that meets the requirements in this document MUST NOT be
502	less secure than the current DNS. Specifically, the mapping of
503	internationalized host names to and from IP addresses MUST have the
504	same characteristics as the mapping of today's host names.

506	Specifying requirements for internationalized domain names does not
507	itself raise any new security issues. However, any change to the DNS MAY
508	affect the security of any protocol that relies on the DNS or on
509	DNS names. A thorough evaluation of those protocols for security
510	concerns will be needed when they are developed. In particular, IDNs
511	MUST be compatible with DNSSEC and, if multiple charsets or
512	representation forms are permitted, the implications of this name-spoof
513	MUST be throughly understood.

515	4. References

517	[CHARREQ]   "Requirements for string identity matching and String
518	            Indexing", http://www.w3.org/TR/WD-charreq, July 1998,
519	            World Wide Web Consortium.

521	[DNSEXT]    "IETF DNS Extensions Working Group",
522	            namedroppers@ops.ietf.org, Olafur Gudmundson, Randy Bush.

524	[RFC952]    "DoD Internet Host Table Specification", rfc952.txt,
525	            October 1985, K. Harrenstien, M.K. Stahl, E.J. Feinler.

527	[RFC1034]   "Domain Names - Concepts and Facilities", rfc1034.txt,
528	            November 1987, P. Mockapetris.

530	[RFC1035]   "Domain Names - Implementation and Specification",
531	            rfc1035.txt, November 1987, P. Mockapetris.

533	[RFC1123]   "Requirements for Internet Hosts -- Application and
534	            Support", rfc1123.txt, October 1989, R. Braden.

536	[RFC1996]   "A Mechanism for Prompt Notification of Zone Changes
537	            (DNS NOTIFY)", rfc1996.txt, August 1996, P. Vixie.

539	[RFC2119]   "Key words for use in RFCs to Indicate Requirement
540	            Levels", rfc2119.txt, March 1997, S. Bradner.

542	[RFC2181]   "Clarifications to the DNS Specification", rfc2181.txt,
543	            July 1997, R. Elz, R. Bush.

545	[RFC2277]   "IETF Policy on Character Sets and Languages",
546	            rfc2277.txt, January 1998, H. Alvestrand.

548	[RFC2278]   "IANA Charset Registration Procedures", rfc2278.txt,
549	            January 1998, N. Freed and J. Postel.

551	[RFC2279]   "UTF-8, a transformation format of ISO 10646",
552	            rfc2279.txt, F. Yergeau, January 1998.

554	[RFC2535]   "Domain Name System Security Extensions", rfc2535.txt,
555	            March 1999, D. Eastlake.

557	[RFC2553]   "Basic Socket Interface Extensions for IPv6", rfc2553.txt,
558	            March 1999, R. Gilligan et al.

560	[RFC2825]   "A Tangled Web: Issues of I18N, Domain Names, and the
561	            Other Internet protocols", rfc2825.txt, May 2000,
562	            L. Daigle et al.

564	[RFC2826]   "IAB Technical Comment on the Unique DNS Root",
565	            rfc2826.txt, May 2000, Internet Architecture Board.

567	[IDNCOMP]   "Comparison of Internationalized Domain Name Proposals",
568	            draft-ietf-idn-compare-00.txt, June 2000, P. Hoffman.

570	[ISO10646]  ISO/IEC 10646-1:2000 (note that an amendment 1 is in
571	            preparation), ISO/IEC 10646-2 (in preparation), plus
572	            corrigenda and amendments to these standards.

574	[UNICODE]   The Unicode Consortium, "The Unicode Standard". Described at
575	            http://www.unicode.org/unicode/standard/versions/.

577	[UNICODE30] The Unicode Consortium, "The Unicode Standard -- Version
578	            3.0", ISBN 0-201-61633-5. Same repertoire as ISO/IEC
579	            10646-1:2000. Described at http://www.unicode.org/unicode/
580	            standard/versions/Unicode3.0.html.

582	[US-ASCII]  Coded Character Set -- 7-bit American Standard Code for
583	            Information Interchange, ANSI X3.4-1986; also: ISO/IEC
584	            646 (IRV).

586	[UAX15]     "Unicode Normalization Forms", Unicode Standard Annex #15,
587	            http://www.unicode.org/unicode/reports/tr15/, 2000-08-31,
588	            M. Davis and M. Duerst, Unicode Consortium.

590	[UTR17]     "Character Encoding Model", Unicode Technical Report #17,
591	            http://www.unicode.org/unicode/reports/tr17/, 2000-08-31,
592	            K. Whistler and M. Davis, Unicode Consortium.

594	[UTR21]     "Case Mappings", Unicode Technical Report #21,
595	            http://www.unicode.org/unicode/reports/tr21/, 2000-09-12,
596	            M. Davis, Unicode Consortium.

598	5. Editors' Contact

600	Zita Wenzel, Ph.D.
601	Information Sciences Institute
602	University of Southern California
603	4676 Admiralty Way
604	Marina del Rey, CA
605	90292  USA
606	Tel: +1 310 448 8462
607	Fax: +1 310 823 6714
608	zita@isi.edu

610	James Seng
611	i-DNS.net International Pte Ltd.
612	8 Temesek Boulevand
613	#24-02 Suntec Tower 3
614	Singapore 038988
615	Tel: +65 248 6208
616	Fax: +65 248 6198
617	Email: jseng@pobox.org.sg

619	6. Acknowledgements

621	The editors gratefully acknowledge the contributions of:

623	Harald Tveit Alvestrand <Harald@Alvestrand.no>
624	Mark Andrews <Mark.Andrews@nominum.com>
625	RJ Atkinson <request not to have email>
626	Alan Barret <apb@cequrux.com>
627	Marc Blanchet <blanchet@mailviagenie.qc.ca>
628	Randy Bush <randy@psg.com>
629	Andrew Draper <ADRAPER@altera.com>
630	Martin Duerst <duerst@w3.org>
631	Patrik Faltstrom <paf@swip.net>
632	Ned Freed <ned.freed@innosoft.com>
633	Olafur Gudmundsson <ogud@tislabs.com>
634	Paul Hoffman <phoffman@imc.org>
635	Simon Josefsson <jas+idn@pdc.kth.se>
636	Kent Karlsson <keka@im.se>
637	John Klensin <klensin+idn@jck.com>
638	Tan Juay Kwang <tanjk@i-dns.net>
639	Dongman Lee <dlee@icu.ac.kr>
640	Bill Manning <bmanning@ISI.EDU>
641	Dan Oscarsson <Dan.Oscarsson@trab.se>
642	J. William Semich <bill@mail.nic.nu>
643	Yoshiro Yoneda <yone@nic.ad.jp>