idnits 2.17.1 

draft-duerst-dns-i18n-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-24) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (10 December 1996) is 9997 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'ASCII' on line 482 looks like a reference

  -- Missing reference section? 'ISO10646' on line 494 looks like a reference

  -- Missing reference section? 'RFC1522' on line 505 looks like a reference

  -- Missing reference section? 'Unicode' on line 523 looks like a reference

  -- Missing reference section? 'RFCIAB' on line 518 looks like a reference

  -- Missing reference section? 'RFC2044' on line 515 looks like a reference

  -- Missing reference section? 'RFC1642' on line 509 looks like a reference

  -- Missing reference section? 'HTML-I18N' on line 489 looks like a reference

  -- Missing reference section? 'Yer96' on line 526 looks like a reference

  -- Missing reference section? 'RFC1738' on line 512 looks like a reference

  -- Missing reference section? 'Dillon96' on line 485 looks like a reference

  -- Missing reference section? 'RFC1034' on line 499 looks like a reference

  -- Missing reference section? 'RFC1035' on line 502 looks like a reference


     Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 15 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                                                M. Duerst
3	<draft-duerst-dns-i18n-00.txt>                     University of Zurich
4	Expires 10 June 1996                                   10 December 1996

6	                  Internationalization of Domain Names

8	Status of this Memo

10	   This document is an Internet-Draft.  Internet-Drafts are working doc-
11	   uments of the Internet Engineering Task Force (IETF), its areas, and
12	   its working groups. Note that other groups may also distribute work-
13	   ing documents as Internet-Drafts.

15	   Internet-Drafts are draft documents valid for a maximum of six
16	   months. Internet-Drafts may be updated, replaced, or obsoleted by
17	   other documents at any time.  It is not appropriate to use Internet-
18	   Drafts as reference material or to cite them other than as a "working
19	   draft" or "work in progress".

21	   To learn the current status of any Internet-Draft, please check the
22	   1id-abstracts.txt listing contained in the Internet-Drafts Shadow
23	   Directories on ds.internic.net (US East Coast), nic.nordu.net
24	   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
25	   Rim).

27	   Distribution of this document is unlimited.  Please send comments to
28	   the author at <mduerst@ifi.unizh.ch>.

30	Abstract

32	   Internet domain names are currently limited to a very restricted
33	   character set. This document proposes the introduction of a new
34	   "zero-level" domain (ZLD) to allow the use of arbitrary characters
35	   from the Universal Character Set (ISO 10646/Unicode) in domain names.
36	   The proposal is fully backwards compatible and does not need any
37	   changes to DNS.

39	Table of contents

41	   1. Introduction ................................................... 2
42	     1.1 Motivation ...................................................2
43	     1.2 Notational Conventions .......................................3
44	   2. The Hidden Zero Level Domain ................................... 3
45	   3. Encoding International Characters .............................. 4
46	     3.1 Encoding Requirements ........................................4
47	     3.2 Encoding Definition ..........................................4
48	     3.3 Encoding Example .............................................6
49	     3.4 Length Considerations ........................................7
50	   4. Usage Considerations ........................................... 7
51	     4.1 General Usage ................................................7
52	     4.2 Usage Restrictions ...........................................7
53	     4.3 Domain Name Creation .........................................8
54	     4.4 Usage in URLs ................................................9
55	   5. Alternate Proposals ............................................10
56	     5.1 The Dillon Proposal .........................................10
57	     5.2 Using a Separate Lookup Service .............................11
58	   6. Generic Considerations .........................................11
59	     5.1 Security Considerations .....................................11
60	     5.2 Internationalization Considerations .........................11
61	   Acknowledgements ..................................................11
62	   Bibliography ......................................................12
63	   Author's Address ..................................................13

65	1. Introduction

67	1.1 Motivation

69	   The lower layers of the Internet do not discriminate any language or
70	   script. On the application level, however, the historical dominance
71	   of the US and the ASCII character set [ASCII] as a lowest common
72	   denominator have led to limitations. The process of removing these
73	   limitations is called internationalization (abbreviated i18n).  One
74	   example of the abovementioned limitations are domain names [RFC1034,
75	   RFC1035], where only the letters of the basic Latin alphabet (case-
76	   insensitive), the decimal digits, and the hyphen are allowed.

78	   While such restrictions are convenient if a domain name is intended
79	   to be used by arbitrary people around the globe, there may be very
80	   good reasons for using aliases that are more easy to remember or type
81	   in a local context. This is similar to traditional mail addresses,
82	   where both local scripts and conventions and the Latin script can be
83	   used.

85	   There are many good reasons for domain name i18n, and some arguments
86	   that are brought forward against such an extension. This document,
87	   however, does not discuss the pros and cons of domain name i18n. It
88	   proposes and discusses a solution and therefore eliminates one of the
89	   most often heard arguments agains, namely "it cannot be done".

91	   The solution proposed in this document consists of the introduction
92	   of a new "zero-level" domain building the root of a new domain
93	   branch, and an encoding of the Universal Character Set (UCS)
94	   [ISO10646] into the limited character set of domain names.

96	1.2 Notational Conventions

98	   In the domain name examples in this document, characters of the basic
99	   Latin alphabet (expressible in ASCII) are denoted with lower case
100	   letters. Upper case letters are used to represent characters outside
101	   ASCII, such as accented characters of the Latin alphabet, characters
102	   of other alphabets and syllabaries, ideographic characters, and vari-
103	   ous signs.

105	2. The Hidden Zero Level Domain

107	   The domain name system uses the domain "in-addr.arpa" to convert
108	   internet addresses back to domain names. One way to view this is to
109	   say that in-addr.arpa forms the root of a separate hierarchy.  This
110	   hierarchy has been made part of the main domain name hierarchy just
111	   for implementation convenience. While syntactically, in-addr.arpa is
112	   a second level domain (SLD), functionally it is a zero level domain
113	   (ZLD) in the same way as "." is a ZLD.

115	   For domain name i18n to work inside the tight restrictions of domain
116	   name syntax, one has to define an encoding that maps strings of UCS
117	   characters to strings of characters allowable in domain names, and a
118	   means to distinguish domain names that are the result of such an
119	   encoding from ordinary domain names.

121	   This document proposes to create a new ZLD to distinguish encoded
122	   i18n domain names from traditional domain names.  This domain would
123	   be hidden from the user in the same way as a user does not see in-
124	   addr.arpa.  This domain could be called "i18n.arpa" (although the use
125	   of arpa in this context is definitely not appropriate), simply
126	   "i18n", or even just "i". Below, we are using "i" for shortness,
127	   while we leave the decision on the actual name to further discussion.

129	3. Encoding International Characters
130	3.1 Encoding Requirements

132	   Until quite recently, the thought of going beyond ASCII for something
133	   such as domain names failed because of the lack of a single encom-
134	   passing character set for the scripts and languages of the world.
135	   Tagging techniques such as those used in MIME headers [RFC1522] would
136	   be much too clumsy for domain names.

138	   The definition of ISO 10646 [ISO10646], codepoint by codepoint iden-
139	   tical with Unicode [Unicode], provides a single Universal Character
140	   Set (UCS).  A recent report [RFCIAB] clearly recommends to base the
141	   i18n of the Internet on these standards.

143	   An encoding for i18n domain names therefore has to take the charac-
144	   ters of ISO 10646/Unicode as a starting point.  The full four-byte
145	   (31 bit) form of UCS, called UCS4, should be used. A limitation to
146	   the two-byte form (UCS2), which allows only for the encoding of the
147	   Base Multilingual Plane, is too restricting.

149	   For the mapping between UCS4 and the strongly limited character set
150	   of domain names, the following constraints have to be considered:

152	   -  The structure of domain names, and therefore the "dot", have to be
153	      conserved. Encoding is done for individual labels.

155	   -  Individual labels in domain names allow the basic Latin alphabet
156	      (monocase, 26 letters), the "-" inside the label, and the ten dec-
157	      imal digits in all but the initial position. The capacity per
158	      octet is therefore limited to somewhat above 5 bits.

160	   -  There is no need nor possibility to preserve any characters.

162	   -  Frequent characters (i.e. ASCII, alphabetic, UCS2, in that order)
163	      should be encoded relatively compactly. A variable-length encoding
164	      (similar to UTF-8) seems desirable.

166	3.2 Encoding Definition

168	   Several encodings for UCS, so called UCS Transform Formats, exist
169	   already, namely UTF-8 [RFC2044], UTF-7 [RFC1642], and UTF-16 [Uni-
170	   code]. Unfortunately, none of them is suitable for our purposes. We
171	   therefore use the following encoding:

173	   -  To accommodate the slanted probability distribution of characters
174	      in UCS4, a variable-length encoding is used.

176	   -  Each target letter encodes 5 bits. Four bits are used as data
177	      bits, the fifth bit is used to indicate continuation of the vari-
178	      able-length encoding.

180	   -  Continuation is indicated by distinguishing the initial letter
181	      from the subsequent letter [alternative: distinguish leading let-
182	      ters from final. Pros? Cons?].

184	   -  Leading four-bit groups of binary value 0000 of UCS4 characters
185	      are discarded, except for the last TWO groups (i.e. the last
186	      octet).  This means that ASCII and Latin-1 characters need two
187	      target letters, the main alphabets up to and including Tibetan
188	      need three target letters, the rest of the characters in the BMP
189	      need four target letters, all except the last (private) plane in
190	      the UTF-16/Surrogates area [Unicode] need five target letters, and
191	      so on.

193	   -  The letters representing the various bit groups in the various
194	      positions are chosen according to the following table:

196	        Nibble Value   Initial   Subsequent
197	        Hex  Binary
198	        0    0000      G         0
199	        1    0001      H         1
200	        2    0010      I         2
201	        3    0011      J         3
202	        4    0100      K         4
203	        5    0101      L         5
204	        6    0110      M         6
205	        7    0111      N         7
206	        8    1000      O         8
207	        9    1001      P         9
208	        A    1010      Q         A
209	        B    1011      R         B
210	        C    1100      S         C
211	        D    1101      T         D
212	        E    1110      U         E
213	        F    1111      V         F

215	   [Should we try to eliminate "I" and "O" from initial? "I" might be
216	   eliminated because then an algorithm can more easily detect ".i". "O"
217	   could lead to some confusion with "0".  What other protocols are
218	   there that might be able to use a similar solution, but that might
219	   have other restrictions for the initial letters?]

221	   Please note that this solution has the following interesting proper-
222	   ties:

224	   -  For subsequent positions, there is an equivalence between the hex-
225	      adecimal value of the character code and the target letter used.
226	      This assures easy conversion and checking.

228	   -  The absence of digits from the "initial" column, and the fact that
229	      the hyphen is not used, assures that the resulting string conforms
230	      to domain name syntax.

232	   -  Raw sorting of encoded and unencoded domain names is equivalent.

234	   -  The boundaries of characters can always be detected easily.
235	      (While this is important for representations that are used inter-
236	      nally for text editing, it is actually not very important here,
237	      because tools for editing can be assumed to use a more straight-
238	      forward representation internally.)

240	   -  Unless control characters are allowed, the target string will
241	      never actually contain a G.

243	3.3 Encoding Example

245	   As an example, the current domain

247	        is.s.u-tokyo.ac.jp

249	   with the components standing for information science, science, the
250	   University of Tokyo, academic, and Japan, might in future be repre-
251	   sented by

253	        JOUHOU.RI.TOUDAI.GAKU.NIHON

255	   (a transliteration of the kanji that might probably be chosen to rep-
256	   resent the same domain). Writing each character in U+HHHH notation as
257	   in [Unicode], this is

259	        U+60c5U+5831.U+7406.U+6771U+5927.U+5b66.U+65e5U+672c

261	   and will be translated by the software handling internationalized
262	   domain names, according to the above specifications, to
263	        M0C5L831.N406.M771L927.LB66.M5E5M72C.i

265	3.4 Length Considerations

267	   DNS allows for a maximum of 63 positions in each part, and for 255
268	   positions for the overall domain name including dots.  This allows up
269	   to 15 ideographs, or up to 21 letters e.g.  from the Hebrew or Arabic
270	   alphabet, in a label.  While this does not allow for the same margin
271	   as in the case of ASCII domain names, it should still be quite suffi-
272	   cient.  [Problems could only surface for languages that use very long
273	   words or terms and don't know any kind of abbreviations or similar
274	   shortening devices. Do these exist?]  DNS contains a compression
275	   scheme that avoids sending the same trailing portion of a domain name
276	   twice in the same transmission. Long domain names are therefore not
277	   that much of a concern.

279	4. Usage Considerations

281	4.1 General Usage

283	   To implement this proposal, neither DNS servers nor resolvers need
284	   changes.  These programs will only deal with the encoded form of the
285	   domain name with the .i suffix. Software that wants to offer an
286	   internationalized user interface (for example a web browser) is
287	   responsible for the necessary conversions. It will analyze the domain
288	   name, call the resolver directly if the domain name conforms to the
289	   domain name syntax restrictions, and otherwise encode the name
290	   according to the specifications of Section 3.2 and append the .i suf-
291	   fix before calling the resolver.  New implementations of resolvers
292	   will of course offer a companion function to gethostbyname accepting
293	   a ISO10646/Unicode string as input.

295	4.2 Usage Restrictions

297	   While this proposal in theory allows to have control characters such
298	   as BEL or NUL or symbols such as arrows and smilies in domain names,
299	   such characters should clearly be excluded from domain names. Whether
300	   this has to be explicitly specified or whether the difficulty to type
301	   these characters on any keyboard of the world will limit their use
302	   has to be discussed.

304	   A related point is the question of equivalence. For historical rea-
305	   sons, ISO 10646/Unicode contain considerable number of compatibility
306	   characters and allow more than one representation for characters with
307	   diacritics. To guarantee smooth interoperability in these and related
308	   cases, additional restrictions or the definition of some form of nor-
309	   malization seem necessary. However, this is a general problem affect-
310	   ing all areas where ISO 10646/Unicode is used in identifiers, and
311	   should therefore be addressed in a generic way.

313	   Equally related is the problem of case equivalence.  Users can very
314	   well distinguish between upper case and lower case.  Also, casing in
315	   an i18n context is not as straightforward as for ASCII, so that case
316	   equivalence is best avoided.  Problems therefore result not from the
317	   fact that case is distinguished for i18n domain names, but from the
318	   fact that existing domain names do not distinguish case. Where it is
319	   impossible to distinguish between next.com and NeXT.com, the same two
320	   subdomains would easily be distinguishable if subordinate to a i18n
321	   domain.

323	   A problem that also has to be discussed and solved is bidirectional-
324	   ity.  Arabic and Hebrew characters are written right-to-left, and the
325	   mixture with other characters results in a divergence between logical
326	   and graphical sequence. See [HTML-I18N] for more explanations.  The
327	   proposal of [Yer96] for dealing with bidirectionality in URLs could
328	   probably be applied to domain names.

330	4.3 Domain Name Creation

332	   The ".i" ZLD should be created as such to allow the internationaliza-
333	   tion of domain names. Rules for creating subdomains inside ".i"
334	   should follow the established rules for the creation of functionally
335	   equivalent domains in the existing domain hierarchy, and should
336	   evolve in parallel.  However, the peculiarities of i18n domain names
337	   should be carefully considered:

339	   -  Depending on the script, reasonable lengths for domain name parts
340	      may differ greatly. For ideographic scripts, a part may often be
341	      only a one-letter code. Established rules for lengths may need
342	      adaptation.

344	   -  If the number of generic TLDs (.com, .edu, .org, .net) is kept
345	      low, then it may be feasible to restrict i18n TLDs to country
346	      TLDs.

348	   -  There are no ISO 639 two-letter codes in scripts other than Latin.
349	      I18n domain names for countries will have to be designed from
350	      scratch.

352	   -  The names of some countries or regions may pose greater political
353	      problems when expressed in the native script than when expressed
354	      in 2-letter ISO 639 codes.

356	   -  I18n country domain names should in principle only be created in
357	      those scripts that are used locally. There is probably little use
358	      in creating an Arabic domain name for China, for example.

360	   -  In those cases where domain names are open to a wide range of
361	      applicants, a special procedure for accepting applications should
362	      be used so that a reasonable-quality fit between ASCII domain
363	      names and i18n domain names results where desired.  This would
364	      probably be done by establishing a period of about a month for
365	      applications inside a i18n domain newly created as a parallel for
366	      an existing domain, and resolving the detected conflicts.

368	   -  It may be desirable to have internationalized subdomains in non-
369	      internationalized TLDs. As an example, many companies in France
370	      may want to register an accented version of their company name,
371	      while remaining under the .fr TLD. For this, .fr would have to be
372	      reregistered as .M6N2.i. Accented and other internationalized sub-
373	      domains would go below .M6N2.i, whereas unaccented ones would go
374	      below .fr in its plain form.

376	   -  To generalize the above case, one might create a requirement that
377	      any domain name registry would be required to register and manage
378	      a corresponding .i domain upon request to allow registration of
379	      i18n domain names in arbitrary subdomains.

381	4.4 Usage in URLs

383	   According to current definitions, URLs encode sequences of octets
384	   into a sequence of characters from a character set that is almost as
385	   limited as the character set of domain names [RFC1738].  This is
386	   clearly not satisfying for i18n.

388	   Internationalizing URLs, i.e. assigning character semantics to the
389	   encoded octets, can either be done separately for each part and/or
390	   scheme, or in an uniform way. Doing it separately has the serious
391	   disadvantage that software providing user interfaces for URLs in gen-
392	   eral would have to know about all the different i18n solutions of the
393	   different parts and schemes. Many of these solutions may not even be
394	   known yet.

396	   It is therefore definitely more advantageous to decide on a single
397	   and consistent solution for URL internationalization. The most valu-
398	   able candidate [Yer96], for many reasons, is UTF-8 [RFC2044], an
399	   ASCII-compatible encoding of UCS4.

401	   Therefore, an URL containing the domain name of the example of Sec-
402	   tion 3.3 should not be written as:

404	        ftp://M0C5L831.N406.M771L927.LB66.M5E5M72C.i

406	   (although this will also work) but rather

408	        ftp://%e6%83%85%e5%a0%b1.%e7%90%86.%e6%9d%b1%e5%a4%a7.
409	             %e5%ad%a6.%e6%97%a5%e6%9c%ac

411	   In this canonical form, the trailing .i is absent, and the octets can
412	   be reconstructed from the %HH-encoding and interpreted as UTF-8 by
413	   generic URL software. The software part dealing with domain names
414	   will carry out the conversion to the .i form.

416	5. Alternate Proposals

418	5.1 The Dillon Proposal

420	   The proposal of Michael Dillon [Dillon96] is also based on encoding
421	   Unicode into the limited character set of domain names. Distinction
422	   is done for each part, using the hyphen in initial position. Because
423	   this does not fully conform to the syntax of existing domain names,
424	   it is questionable whether it is backwards-compatible. On the other
425	   hand, this has the advantage that local i18n domain names can be
426	   installed easily without cooperation by the manager of the superdo-
427	   main.

429	   A variable-length scheme with base 36 is used that can encode up to
430	   1610 characters, absolutely insufficient for Chinese or Japanese.
431	   Characters assumed not to be used in i18n domain names are excluded,
432	   i.e. only one case is allowed for basic Latin characters.  This means
433	   that large tables have to be worked out carefully to convert between
434	   ISO 10646/Unicode and the actual number that is encoded with base 36.

436	5.2 Using a Separate Lookup Service

438	   Instead of using a special encoding and burdening DNS with i18n, one
439	   could build and use a separate lookup service for i18n domain names.
440	   Instead of converting to UCS4 and encoding according to Section 3.2,
441	   and then calling the DNS resolver, a program would contact this new
442	   service when seeing a domain name with characters outside the allowed
443	   range.

445	   Such a solution has various problems. A separate service does not yet
446	   exist, whereas DNS is readily usable. Solving the problems of unique-
447	   ness, etc., again for this separate service creates a lot of work. On
448	   the other side, there are no savings in terms of implementation
449	   costs. DNS also does not have a serious capacity problem that might
450	   be addressed by using a separate lookup service, nor is such a prob-
451	   lem created by i18n domain names.

453	6. Generic Considerations

455	6.1 Security Considerations

457	   This proposal is believed not to raise any other security considera-
458	   tions than the current use of the domain name system.

460	6.2 Internationalization Considerations

462	   This proposal addresses internationalization as such. The main addi-
463	   tional consideration with respect to internationalization may be the
464	   indication of language. However, for concise identifiers such as
465	   domain names, language tagging would be too much of a burden and
466	   would create complex dependencies with semantics.

468	        NOTE -- This section is introduced based on a recommenda-
469	        tion in [RFCIAB]. A similar section addressing internation-
470	        alization should be included in all application level
471	        internet drafts and RFCs.

473	Acknowledgements

475	   I am grateful in particular to the following persons:

477	   Bert Bos, Lori Brownell, Michael Dillon, David Goldsmith, Larry Mas-
478	   inter, Keith Moore, and Francois Yergeau

480	Bibliography

482	   [ASCII]        Coded Character Set -- 7-Bit American Standard Code
483	                  for Information Interchange, ANSI X3.4-1986.

485	   [Dillon96]     M. Dillon, "Multilingual Domain Names", Memra Software
486	                  Inc., November 1996 (circulated Dec. 6, 1996 on iahc-
487	                  discuss@iahc.org).

489	   [HTML-I18N]    F. Yergeau, G. Nicol, G. Adams, and M. Duerst, "Inter-
490	                  nationalization of the Hypertext Markup Language",
491	                  Work in progress (draft-ietf-html-i18n-05.txt), August
492	                  1996.

494	   [ISO10646]     ISO/IEC 10646-1:1993. International standard -- Infor-
495	                  mation technology -- Universal multiple-octet coded
496	                  character Set (UCS) -- Part 1: Architecture and basic
497	                  multilingual plane.

499	   [RFC1034]      P. Mockapetris, "Domain Names - Concepts and Facili-
500	                  ties", ISI, Nov. 1987.

502	   [RFC1035]      P. Mockapetris, "Domain Names - Implementation and
503	                  Specification", ISI, Nov. 1987.

505	   [RFC1522]      K. Moore, "MIME (Multipurpose Internet Mail Exten-
506	                  sions) Part Two: Message Header Extensions for Non-
507	                  ASCII Text", University of Tennessee, September 1993.

509	   [RFC1642]      D. Goldsmith, M. Davis, "UTF-7: A Mail-safe Transfor-
510	                  mation Format of Unicode", Taligent Inc., July 1994.

512	   [RFC1738]      T. Berners-Lee, L. Masinter, and M. McCahill,
513	                   "Uniform Resource Locators (URL)", CERN, Dec. 1994.

515	   [RFC2044]      F. Yergeau, "UTF-8, A Transformation Format of Unicode
516	                  and ISO 10646", Alis Technologies, October 1996.

518	   [RFCIAB]       C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R.
519	                  Atkinson, M. Crispin, P. Svanberg, "Report from the
520	                  IAB Character Set Workshop", October 1996 (currently
521	                  available as draft-weider-iab-char-wrkshop-00.txt).

523	   [Unicode]      The Unicode Consortium, "The Unicode Standard, Version
524	                  2.0", Addison-Wesley, Reading, MA, 1996.

526	   [Yer96]        F. Yergeau, "Internationalization of URLs", Alis Tech-
527	                  nologies,
528	                  <http://www.alis.com:8085/~yergeau/url-00.html>.

530	Author's Address

532	   Martin J. Duerst
533	   Multimedia-Laboratory
534	   Department of Computer Science
535	   University of Zurich
536	   Winterthurerstrasse 190
537	   CH-8057 Zurich
538	   Switzerland

540	   Tel: +41 1 257 43 16
541	   Fax: +41 1 363 00 35
542	   E-mail: mduerst@ifi.unizh.ch

544	     NOTE -- Please write the author's name with u-Umlaut wherever
545	     possible, e.g. in HTML as D&uuml;rst.