idnits 2.17.1 

draft-hoffman-idn-cidnuc-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 722 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 2 instances of too long lines in the document, the longest one
     being 10 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 382 has weird spacing: '...   bits   char...'

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 10, 2000) is 8813 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'UTR15' is mentioned on line 645, but not defined

  == Missing Reference: 'UTR6' is mentioned on line 669, but not defined

  -- No information found for draft-ietf-idn-requirment - is the name correct?

  -- Possible downref: Normative reference to a draft: ref. 'IDNReq' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  -- Possible downref: Normative reference to a draft: ref. 'Norm' 

  ** Obsolete normative reference: RFC 2278 (Obsoleted by RFC 2978)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode3'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UnicodeData'


     Summary: 6 errors (**), 0 flaws (~~), 6 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Draft                                     Paul Hoffman
2	draft-hoffman-idn-cidnuc-03.txt                    IMC & VPNC
3	March 10, 2000
4	Expires in six months

6	     Compatible Internationalized Domain Names Using Compression

8	Status of this memo

10	This document is an Internet-Draft and is in full conformance with all
11	provisions of Section 10 of RFC2026.

13	Internet-Drafts are working documents of the Internet Engineering Task
14	Force (IETF), its areas, and its working groups. Note that other
15	groups may also distribute working documents as Internet-Drafts.

17	Internet-Drafts are draft documents valid for a maximum of six months
18	and may be updated, replaced, or obsoleted by other documents at any
19	time. It is inappropriate to use Internet-Drafts as reference
20	material or to cite them other than as "work in progress."

22	     The list of current Internet-Drafts can be accessed at
23	     http://www.ietf.org/ietf/1id-abstracts.txt

25	     The list of Internet-Draft Shadow Directories can be accessed at
26	     http://www.ietf.org/shadow.html.

28	Abstract

30	This protocol describes a transformation method for representing non-
31	ASCII characters in domain names in a fashion that is completely
32	compatible with the current DNS. It meets the many requirements for
33	internationalization of domain names.

35	Note: this protocol is quite experimental and should not be deployed in
36	the Internet until it reaches standards track in the IETF.

38	1. Introduction

40	There is a strong world-wide desire to use characters other than plain
41	ASCII in domain names. Domain names have become the equivalent of
42	business or product names for many services on the Internet, so there
43	is a need to make them usable by people whose native scripts are not
44	representable by ASCII. The requirements for internationalizing domain
45	names are described in [IDNReq].

47	The protocol in this document describes how to take almost any
48	character used in human writing and use it in a domain name in a way
49	that is completely compatible with the current DNS. The protocol
50	requires absolutely no changes to the DNS [STD13].

52	The protocol works for both entry and display of internationalized
53	characters. For domain name entry, a user enters the international
54	(that is, non-ASCII) characters of a domain name into a converter, and
55	that converter transforms the name entered into a DNS-compatible
56	format. Each domain part of internationalized domain names is tagged,
57	and some parts may be internationalized while others use today's plain
58	ASCII format. For domain name display, the display utility converts
59	each tagged domain part from its DNS-compatible format into the
60	internationalized characters and displays them inline with any non-
61	internationalized domain part. Users never have to see the converted
62	versions of the internationalized name parts.

64	In formal terms, this protocol describes a character encoding scheme of
65	the ISO 10646 [ISO10646] coded character set and the rules for using
66	that scheme in the DNS. As such, it could also be called a "charset" as
67	defined in [RFC2278].

69	The protocol has the following features:

71	- There are no changes to the DNS protocols or the way that domain
72	names are interpreted. There are also no change to the DNS root
73	servers, nor to zone files. The protocol can start to be used the DNS
74	today with only changes in the non-protocol portions.

76	- There is exactly one way to convert internationalized domain parts to
77	and from DNS-compatible parts. Domain part uniqueness is preserved.
78	Domain parts that have no international characters are not changed.

80	- Essentially all characters written in all common (and many uncommon)
81	human scripts can be put in any name part.

83	- Transformed names can be entered and displayed without ASCII case
84	sensitivity.

86	- Names using this protocol can include more internationalized
87	characters than with other ASCII-converted protocols that have been
88	suggested to date.

90	1.1 Terminology

92	The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
93	"MAY" in this document are to be interpreted as described in RFC 2119
94	[RFC2119].

96	Hexadecimal values are shown preceded with an "0x". For example,
97	"0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are
98	shown preceded with an "0b". For example, a nine-bit value might be
99	shown as "0b101101111".

101	Examples in this document use the notation from the Unicode Standard
102	[Unicode3] as well as the ISO 10646 names. For example, the letter "a"
103	may be represented as either "U+0061" or "LATIN SMALL LETTER A".

105	This protocol converts strings with internationalized characters into
106	strings of US-ASCII that are acceptable as domain name parts in current
107	DNS host naming usage. The former are called "pre-converted" and the
108	latter are called "post-converted".

110	2. Domain Part Transformation

112	Any domain part that contains one or more non-ASCII characters is
113	transformed into a DNS-compatible name before passing it to a DNS
114	resolver or other program that uses traditional domain names. This step
115	is usually done at the time a user enters a domain name into an
116	application. When a domain name is displayed to a user, the display
117	program can covert any domain part that is tagged as holding
118	internationalized characters into a displayable representation that
119	includes the internationalized characters.

121	It is important to note that the following sections contain many
122	normative statements with "MUST" and "MUST NOT". Any implementation
123	that does not follow these statements exactly is likely to cause damage
124	to the Internet by creating non-unique representations of domain names.

126	According to [STD13], domain parts must be case-insensitive, start with
127	a letter, and contain only letters, digits, and the hyphen character
128	("-"). This, of course, excludes any internationalized characters, as
129	well as many other characters in the ASCII character repertoire.
130	Further, domain name parts must be 63 octets or shorter in length.

132	2.1 Name tagging

134	Internationalized domain parts are converted to and from a display
135	representation that include non-ASCII characters. Thus, a program that
136	converts from DNS-compatible name parts to viewable name parts must be
137	able to recognize name parts that need to be converted.

139	All post-converted name parts that contain internationalized characters
140	begin with the string "aq8". (Of course, because domain name parts are
141	case-insensitive, this might also be represented as "Aq8" or "aQ8" or
142	"AQ8".) The string "aq8" was chosen because it is extremely unlikely to
143	exist in domain parts before this specification was produced. As a
144	historical note, in early March 2000, none of the second-level domain
145	parts in any of the .com, .edu, .net, and .org top-level domains began
146	with "aq8"; there are about 9,500 other strings of three legal
147	characters that have this property and could be used instead.

149	Note that a zone administrator can still choose to use "aq8" at the
150	beginning of a domain part even if that part does not contain
151	internationalized characters. Zone administrators SHOULD NOT create
152	domain part names that begin with "aq8" unless those names are post-
153	converted names. Creating domain part names that begin with "aq8" but
154	that are not post-converted names may cause display systems that
155	conform to this document to display the name parts in a possibly-
156	confusing fashion to users. However, creating such names will not cause
157	any DNS resolution problems; it will only cause display problems (and
158	possibly entry problems) for some users.

160	2.2 Converting an internationalized name to a domain name part

162	To convert a string of internationalized characters into a DNS-
163	compatible domain name part, the following steps MUST be preformed in
164	the exact order of the subsections given here. Note that these steps
165	MUST be done by zone administrators who are creating internationalized
166	domain name parts in their zones and MUST be done by clients who are
167	resolving domain names.

169	The input name string consists of characters from the ISO 10646
170	character set in big-endian UTF-16 encoding. This is the pre-converted
171	string.

173	Characters outside the first plane of characters (that is, outside the
174	first 0xFFFF characters) MUST be represented using surrogates, as
175	described in the UTF-16 description in ISO 10646.

177	The characters in Table 1 MUST NOT appear in pre-converted domain name
178	parts. The characters in this list have been chosen for many reasons,
179	mostly to avoid problems with displayed characters. The reasons
180	include:

182	- The character is a period

184	- The character is a separator (space, line, or paragraph)

186	- The character is a control character

188	- The character is a formatting character

190	- The character is a private-use character

192	Table 1: Characters illegal in domain names
193	U+002E (FULL STOP)
194	All characters in the Unicode Character Database [UnicodeData] whose
195	General Category is any of:
196	Zs
197	Zl
198	Zp
199	Cc
200	Cf
201	Co

203	Design note: The above list will proabably change and will probably be
204	taken to a separate document so there can be more focused discussion on
205	it. For example, there appears to be a desire to not allow uppercase
206	and lowercase, and some discussion of not allowing characters that do
207	not "normally" appear in "names". The above list could include all of
208	the characters of the not- chosen case and of type "punctuation" and
209	"symbol", minus those that "normally" appear in "names".

211	Design note: There is no reason to assume that this database must be
212	run by the Unicode Consortium. It is quite believable that, given the
213	importance of the database, that it could be maintained by IANA for the
214	IETF, quite probably with the help of the Unicode Consortium.

216	2.2.1 Check for a name that cannot be transformed

218	An untransformed input strings that is already a legitimate domain name
219	part MUST NOT be converted. Each character in the input string MUST be
220	compared to the following list of characters:

222	U+002D (HYPHEN-MINUS)
223	U+0030 through U+0039 (DIGIT ZERO, ...)
224	U+0041 through U+005A (LATIN CAPITAL LETTER A, ...)
225	U+0061 through U+007A (LATIN SMALL LETTER A, ...)

227	If all the characters in the input string are in the above set of
228	characters, the conversion MUST stop with an error. The input string
229	itself MUST be used as the domain name part.

231	2.2.2 Check for illegal characters in the input string

233	Each character in the input string MUST be checked against Table 1. If
234	any character in the input string matches a character listed in Table
235	1, the conversion MUST stop with an error. The characters in Table 1
236	MUST NOT appear in any internationalized domain name part.

238	Further, each character in the input string MUST be checked to see if
239	it is part of a malformed surrogate pair. If any character is part of a
240	malformed surrogate pair, the conversion MUST stop with an error.
241	Malformed surrogate pairs MUST NOT appear in any internationalized
242	domain name part.

244	2.2.3 Normalize the input string

246	The entire input string MUST be normalized using Normalization Form C
247	as described in [Norm]. The normalization MUST be applied to the entire
248	input string, not to substrings. The result of this step is the
249	normalized string.

251	2.2.4 Compress the normalized string

253	The entire normalized string MUST be compressed using the compression
254	algorithm specified in section 2.4. The result of this step is the
255	compressed string.

257	2.2.5 Check the length of the compressed string

259	The compressed string MUST be 37 octets or shorter. If the compressed
260	string is 38 octets or longer, the conversion MUST stop with an error.

262	2.2.6 Encode the compressed string with Base32

264	The compressed string MUST be converted to a DNS-compatible encoding
265	using the Base32 encoding described in section 2.5. The result of this
266	step is the encoded string.

268	2.2.7 Prepend "aq8" to the encoded string and finish

270	Prepend the characters "aq8" to the encoded string. This is the domain
271	name part that can be used in DNS resolution.

273	2.3 Converting a domain name part to an internationalized name

275	The input string for conversion is a valid domain name part.

277	2.3.1 Strip the "aq8"

279	The input string MUST begin with the characters "aq8". If it does not,
280	the conversion MUST stop and the displaying program MUST NOT treat the
281	domain name part as internationalized characters and the input string
282	is the post-converted string. Otherwise, remove the characters "aq8"
283	from the input string. The result of this step is the stripped string.

285	2.3.2 Decode the stripped string with Base32

287	The entire stripped string MUST be checked to see if it is valid Base32
288	output. The entire stripped string MUST changed to all lower-case
289	letters. If any resulting characters are not in Table 2, the conversion
290	MUST stop and the displaying program MUST NOT treat the domain name
291	part as internationalized characters; the input string is the post-
292	converted string. Otherwise, the entire resulting string MUST be
293	converted to a binary format using the Base32 decoding described in
294	section 2.5. The result of this step is the decoded string.

296	2.3.3 Decompress the decoded string

298	The entire decoded string MUST be converted to ISO 10646 characters
299	using the decompression algorithm described in section 2.4. The result
300	of this is the internationalized string.

302	2.3.4 Verify the internationalized string and finish

304	Each character in the internationalized string MUST be verified before
305	the string can be used. If the string only consists of the characters
306	listed in section 2.2.1, the conversion MUST stop and the input string
307	is the post-converted string. If any of the characters in the string
308	are Table 1 from section 2.2.2, the conversion MUST stop and the input
309	string is the post-converted string.

311	The internationalized string MUST be checked for invalid surrogate
312	pairs, as described in ISO 10646. If an invalid surrogate pair is
313	found, the conversion MUST stop and the input string is the post-
314	converted string.

316	If no errors are found, the verified string is the post-converted
317	string.

319	2.4 Compression algorithm

321	The basic method for compression is to reduce sequences of characters
322	that all have the same upper octet to single octets. Any string that
323	has a character that doesn't have the same upper octet as all the other
324	characters in the string has all the octets of the input string in the
325	output string.

327	The compressed string always has a one-octet header. For one-octet
328	mode, the header octet is the upper octet of the stream. For two-octet
329	mode, the header octet is 0xD8, which is the upper octet of a surrogate
330	pair. Design note: It is impossible to have a legal stream of UTF-16
331	characters that has all the upper octets being 0xD8 because a character
332	whose upper octet is 0xD8 must be followed by one whose upper octet is
333	in the range 0xDC through 0xDF.

335	Although the two-octet mode limits the number of characters to 17, this
336	is still generally enough for almost all names in almost scripts. Also,
337	this limit is close to the limits set by other encoding proposals.

339	Note that all name parts whose characters have the same upper octet
340	MUST be expressed in the one-octet mode. This requirement prevents a
341	single domain name part from having two encodings.

343	2.4.1 Compressing a string

345	Design note: No checking is done on the input to this algorithm. It is
346	assumed that all checking for valid ISO 10646 characters has already
347	been done by a previous step in the conversion process.

349	1) Read each character in the input stream, comparing the upper octet
350	of each. If all of the upper octets match, go to step 3.

352	2) Output 0xD8, followed by the entire input stream. Finish.

354	3) Output the upper octet of the first character. Output the lower
355	octet of each character in the input. Finish.

357	2.4.2 Decompressing a string

359	1) Read the first octet of the input string. If it is 0xD8, go to step
360	3.

362	2) Call the value of this first octet "upper". For each other octet in
363	the input, output "upper", then output the octet from the input.
364	Finish.

366	3) Read the rest of the input stream and put it in the output stream.
367	Finish.

369	2.5 Base32

371	In order to encode non-ASCII characters in DNS-compatible domain parts,
372	they must be converted into legal characters. This is done with Base32
373	encoding, described here.

375	Table 2 shows the mapping between input bits and output characters in
376	Base32. Design note: the digits used in Base32 are "2" through "7"
377	instead of "0" through "6" in order to avoid digits "0" and "1". This
378	helps reduce errors for users who are entering a Base32 stream and may
379	misinterpret a "0" for an "O" or a "1" for an "l".

381	               Table 2: Base32 conversion
382	        bits   char  hex         bits   char  hex
383	        00000   a    0x61        10000   q    0x71
384	        00001   b    0x62        10001   r    0x72
385	        00010   c    0x63        10010   s    0x73
386	        00011   d    0x64        10011   t    0x74
387	        00100   e    0x65        10100   u    0x75
388	        00101   f    0x66        10101   v    0x76
389	        00110   g    0x67        10110   w    0x77
390	        00111   h    0x68        10111   x    0x78
391	        01000   i    0x69        11000   y    0x79
392	        01001   j    0x6a        11001   z    0x7a
393	        01010   k    0x6b        11010   2    0x32
394	        01011   l    0x6c        11011   3    0x33
395	        01100   m    0x6d        11100   4    0x34
396	        01101   n    0x6e        11101   5    0x35
397	        01110   o    0x6f        11110   6    0x36
398	        01111   p    0x70        11111   7    0x37

400	2.5.1 Encoding octets as Base32

402	The input is a stream of octets. However, the octets are then treated
403	as a stream of bits.

405	Design note: The assumption that the input is a stream of octets
406	(instead of a stream of bits) was made so that no padding was needed.
407	If you are reusing this encoding for a stream of bits, you must add a
408	padding mechanism in order to differentiate different lengths of input.

410	1) Set the read pointer to the beginning of the input bit stream.

412	2) Look at the five bits after the read pointer. If there are not five
413	bits, go to step 5.

415	3) Look up the value of the set of five bits in the bits column of
416	Table 2, and output the character from the char column (whose hex value
417	is in the hex column).

419	4) Move the read pointer five bits forward. If the read pointer is at
420	the end of the input bit stream (that is, there are no more bits in the
421	input), stop. Otherwise, go to step 2.

423	5) Pad the bits seen until there are five bits.

425	6) Look up the value of the set of five bits in the bits column of
426	Table 2, and output the character from the char column (whose hex value
427	is in the hex column).

429	2.5.2 Decoding Base32 as octets

431	The input is octets in network byte order. The input octets MUST be
432	values from the second column in Table 2.

434	1) Set the read pointer to the beginning of the input octet stream.

436	2) Look up the character value of the octet in the char column (or hex
437	value in hex column) of Table 2, and output the five bits from the bits
438	column.

440	3) Move the read pointer one octet forward. If the read pointer is at
441	the end of the input octet stream (that is, there are no more octets in
442	the input), stop. Otherwise, go to step 2.

444	2.5.3 Base32 example

446	Assume you want to the value 0x3a270f93. The bit string is:

448	3   a    2   7    0   f    9   3
449	00111010 00100111 00001111 10010011

451	Broken into chunks of five bits, this is:

453	00111 01000 10011 10000 11111 00100 11

455	The output of encoding is:

457	00111 01000 10011 10000 11111 00100 11
458	  h     i     t     q     7     e   y
459	or "hitq7ey".

461	3. Implementing User Interfaces

463	This section gives guidelines to creators of programs that allow entry
464	or display of domain names.

466	The use of internationalized domain name parts in user applications
467	should be as transparent to the user as possible. A user should be able
468	to enter and see internationalized domain names as the pre-converted
469	names if at all possible.

471	For instance, if the user is able to enter Chinese characters anywhere
472	in a program, he or she should also be able to enter Chinese characters
473	into any interface component that would take in a domain name, such as
474	dialog box asking for a URL. Similarly, if any part of a program can
475	display Arabic characters, any domain name that has Arabic characters
476	in it should be able to be displayed with Arabic characters, not as the
477	ASCII transformation of those characters.

479	3.1 Name entry

481	In non-internationalized systems, the user enters a domain name and
482	that name is usually sent unchecked to a domain name resolver, which
483	returns an IPv4 address. With internationalized names, the user
484	application MUST convert the pre-converted name into a post-converted
485	name so that is acceptable to resolvers.

487	Some users might have access to the post-converted format of an
488	internationalized name. Because of this, users SHOULD be able to enter
489	post-converted names directly into an interface component for domain
490	names. This capability should already be in the interface because the
491	post-converted names are already legal. It is important that interfaces
492	not prohibit the entry of long domain names. (Of course, they should
493	not be prohibiting them anyway.)

495	There are a wide variety of user input methods. Keyboard input methods
496	vary widely from script to script, and even within a single script,
497	there are often more than one method. Humorously, people who don't use
498	a particular script often cannot comprehend how someone who uses that
499	script can input it with a keyboard and will often declare such input
500	as impossible.

502	Regardless of the input method, any system that allows input of non-
503	ASCII characters SHOULD allow input of pre-converted domain names in
504	the same fashion.

506	3.2 Name display

508	As a user enters internationalized characters, they are often displayed
509	to the user at the same time. For instance, in a typical entry box for
510	a URL, the characters are displayed as they are entered. Such display
511	should, of course, also happen for internationalized characters in
512	domain names.

514	Choosing what to do with domain names in free text is more difficult
515	because not all scripts are easily displayable. For instance, assume
516	that you are reading a sentence on a page that says "You can reach the
517	company at" followed by a URL. If the domain name portion of the URL is
518	internationalized, each domain name part SHOULD be shown as a pre-
519	converted string if possible. If it is not possible (such as if no font
520	for the script is available), the domain name part SHOULD be shown as
521	post-converted characters.

523	A display program has two choices when displaying an internationalized
524	name part for which there is one or more characters that the program
525	cannot display. The first choice is to display (but not replace with) a
526	"replacement character" that does not look like any other character in
527	the display. The second choice is to display the post-converted name;
528	this is admittedly ugly and does not give the user any useful
529	information other than "the text could not be displayed". In general,
530	the first option is the better choice. However, it is very important
531	that user still be able to copy a domain name part even if it has
532	characters that cannot be displayed. Thus, if a display system chooses
533	to display a "replacement character", the underlying character MUST
534	still be the undisplayable character.

536	Note that some domain name parts that start with "aq8" are not pre-
537	converted parts. Such names may contain characters that are not in the
538	Base32 character set. In such cases, a display program SHOULD display
539	the name part without attempting to convert it to post-converted
540	characters.

542	3.3 Zone files and registration

544	Historically, zone files have been maintained as US-ASCII text. For
545	stability reasons, this practice MUST continue. Thus, zone files MUST
546	only contain post-converted name parts. Zone administrators can use
547	tools to enter and view the internationalized parts of zone files.

549	Registrars for public name spaces such as .com have the same requirements
550	as any zone administrator. They MUST be sure not to register names
551	that are illegal. In the case if this protocol, the registration can
552	continue to be done with US-ASCII characters, but the registrar MUST
553	then check that the conversion to an internationalized name does
554	not result in an error.

556	3.4 Users and post-converted name parts

558	The Internet has a long history of trying to hide technical detail from
559	users only to have that detail exposed, often in a confusing fashion.
560	This protocol attempts to minimize the impact of such exposure.

562	Clearly, no user will be able to understand a post-converted name part.
563	However, they are unlikely to have any significant problems with them.
564	It is likely to become common lore that domain names that have
565	internationalized parts also have an all-text version that looks like
566	gibberish. These long names can be copied (by program or even by hand)
567	just like current domain names are.

569	4. Security Considerations

571	Much of the security of the Internet relies on the DNS. Thus, any
572	change to the characteristics of the DNS can change the security of
573	much of the Internet. Thus, this protocol makes no changes to the DNS
574	itself.

576	Host names are used by users to connect to Internet servers. The
577	security of the Internet would be compromised if a user entering a
578	single internationalized name could be connected to different servers
579	based on different interpretations of the internationalized domain
580	name.

582	This protocol is designed so that every internationalized domain part
583	can be represented as one and only one DNS-compatible string. If there
584	is any way to follow the steps in this document and get two or more
585	different results, it is a severe and fatal error in the protocol.

587	5. References

589	[IDNReq] James Seng, "Requirements of Internationalized Domain Names",
590	draft-ietf-idn-requirment.

592	[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information
593	technology -- Universal Multiple-Octet Coded Character Set (UCS) --
594	Part 1: Architecture and Basic Multilingual Plane.  Five amendments and
595	a technical corrigendum have been published up to now. UTF-16 is
596	described in Annex Q, published as Amendment 1. 17 other amendments are
597	currently at various stages of standardization. [[[ THIS REFERENCE
598	NEEDS TO BE UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]]

600	[Norm] Mark Davis and Martin Duerst, "UCharacter Normalization in ITEF Protocols",
601	draft-duerst-i18n-norm.

603	[RFC2045] Ned Freed and Nathaniel Borenstein, "Multipurpose Internet
604	Mail Extensions (MIME) Part One: Format of Internet Message Bodies",
605	November 1996, RFC 2045.

607	[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
608	Requirement Levels", March 1997, RFC 2119.

610	[RFC2278] Ned Freed and Jon Postel, "IANA Charset Registration
611	Procedures", January 1998, RFC 2278.

613	[STD13] Paul Mockapetris, "Domain names - implementation and
614	specification", November 1987, STD 13 (RFC 1035).

616	[Unicode3] The Unicode Consortium, "The Unicode Standard -- Version
617	3.0", ISBN 0-201-61633-5. Described at
618	<http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.

620	[UnicodeData] The Unicode Character Database,
621	<ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt>. The database
622	is described in
623	<ftp://ftp.unicode.org/Public/UNIDATA/UnicodeCharacterDatabase.html>.

625	A. Acknowledgements

627	Mark Davis contributed many ideas to the initial draft of this
628	document. Graham Klyne and Martin Duerst offered technical comments on
629	the algorithms used.

631	Base32 is quite obviously inspired by the tried-and-true Base64
632	Content-Transfer-Encoding described in [RFC2045].

634	B. Changes from Previous Versions of this Draft

636	B.1 Changes from -02 to -03

638	Throughout: changed "wg4" to "aq8".

640	2.2: Updated the first design note to indicate that the table
641	will probably be moved to its own draft.

643	2.2.3: Changed reference for normalization from [UTR15] to [Norm].

645	5: Updated the reference for [IDNReq]. Removed [UTR15] and replaced
646	it with [Norm].

648	B.2 Changes from -01 to -02

650	Throughout: Changed "ph6" to "wg4".

652	2.1: Updated count of unused three-letter prefixes.

654	2.3: Removed all the error states and clarified that any error in
655	conversion means that the input string is the post-converted
656	string.

658	2.4: Radically changed the compression scheme; the previous one
659	was far too cumbersome.

661	2.5: Renumbered Table 3 to Table 2.

663	2.5.1: Changed the second paragraph (should have been done in
664	the change to -01 to remove padding).

666	3.2: Clarified the paragraph emphasizing the need for users to be able
667	to copy names even if they are not displayable.

669	5: Removed reference to [UTR6].

671	A: Added Martin Duerst. Removed reference to the compression
672	algorithm because it has changed.

674	B.3 Changes from -00 to -01

676	Throughout: Changed references to the character set from Unicode
677	to ISO 10646, even though they are equivalent. Also changed
678	references to the rules for surrogate pairs to ISO 10646.

680	1.1: Clarified last paragraph.

682	2.2: Reworded the first design note to make excluding case stuff
683	more likely.

685	2.5: Removed the "8" padding in the Base32 algorithm because
686	it was superfluous.

688	2.5.1: Removed "in network byte order" from the first sentence
689	because it was redundant.

691	3.3: Made the first paragraph stronger.

693	5: Added reference to ISO 10646. This still needs work.

695	A: Added Graham Klyne.

697	C. IANA Considerations

699	There are no IANA considerations in the current draft. However, if
700	it is decided to have IANA maintain the character database, this
701	section will become much longer.

703	D. Author Contact Information

705	Paul Hoffman
706	Internet Mail Consortium and VPN Consortium
707	127 Segre Place
708	Santa Cruz, CA  95060 USA
709	paul.hoffman@imc.org and paul.hoffman@vpnc.org