idnits 2.17.1 draft-duerst-dns-i18n-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (10 December 1996) is 9997 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'ASCII' on line 482 looks like a reference -- Missing reference section? 'ISO10646' on line 494 looks like a reference -- Missing reference section? 'RFC1522' on line 505 looks like a reference -- Missing reference section? 'Unicode' on line 523 looks like a reference -- Missing reference section? 'RFCIAB' on line 518 looks like a reference -- Missing reference section? 'RFC2044' on line 515 looks like a reference -- Missing reference section? 'RFC1642' on line 509 looks like a reference -- Missing reference section? 'HTML-I18N' on line 489 looks like a reference -- Missing reference section? 'Yer96' on line 526 looks like a reference -- Missing reference section? 'RFC1738' on line 512 looks like a reference -- Missing reference section? 'Dillon96' on line 485 looks like a reference -- Missing reference section? 'RFC1034' on line 499 looks like a reference -- Missing reference section? 'RFC1035' on line 502 looks like a reference Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft M. Duerst 3 University of Zurich 4 Expires 10 June 1996 10 December 1996 6 Internationalization of Domain Names 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working doc- 11 uments of the Internet Engineering Task Force (IETF), its areas, and 12 its working groups. Note that other groups may also distribute work- 13 ing documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six 16 months. Internet-Drafts may be updated, replaced, or obsoleted by 17 other documents at any time. It is not appropriate to use Internet- 18 Drafts as reference material or to cite them other than as a "working 19 draft" or "work in progress". 21 To learn the current status of any Internet-Draft, please check the 22 1id-abstracts.txt listing contained in the Internet-Drafts Shadow 23 Directories on ds.internic.net (US East Coast), nic.nordu.net 24 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific 25 Rim). 27 Distribution of this document is unlimited. Please send comments to 28 the author at . 30 Abstract 32 Internet domain names are currently limited to a very restricted 33 character set. This document proposes the introduction of a new 34 "zero-level" domain (ZLD) to allow the use of arbitrary characters 35 from the Universal Character Set (ISO 10646/Unicode) in domain names. 36 The proposal is fully backwards compatible and does not need any 37 changes to DNS. 39 Table of contents 41 1. Introduction ................................................... 2 42 1.1 Motivation ...................................................2 43 1.2 Notational Conventions .......................................3 44 2. The Hidden Zero Level Domain ................................... 3 45 3. Encoding International Characters .............................. 4 46 3.1 Encoding Requirements ........................................4 47 3.2 Encoding Definition ..........................................4 48 3.3 Encoding Example .............................................6 49 3.4 Length Considerations ........................................7 50 4. Usage Considerations ........................................... 7 51 4.1 General Usage ................................................7 52 4.2 Usage Restrictions ...........................................7 53 4.3 Domain Name Creation .........................................8 54 4.4 Usage in URLs ................................................9 55 5. Alternate Proposals ............................................10 56 5.1 The Dillon Proposal .........................................10 57 5.2 Using a Separate Lookup Service .............................11 58 6. Generic Considerations .........................................11 59 5.1 Security Considerations .....................................11 60 5.2 Internationalization Considerations .........................11 61 Acknowledgements ..................................................11 62 Bibliography ......................................................12 63 Author's Address ..................................................13 65 1. Introduction 67 1.1 Motivation 69 The lower layers of the Internet do not discriminate any language or 70 script. On the application level, however, the historical dominance 71 of the US and the ASCII character set [ASCII] as a lowest common 72 denominator have led to limitations. The process of removing these 73 limitations is called internationalization (abbreviated i18n). One 74 example of the abovementioned limitations are domain names [RFC1034, 75 RFC1035], where only the letters of the basic Latin alphabet (case- 76 insensitive), the decimal digits, and the hyphen are allowed. 78 While such restrictions are convenient if a domain name is intended 79 to be used by arbitrary people around the globe, there may be very 80 good reasons for using aliases that are more easy to remember or type 81 in a local context. This is similar to traditional mail addresses, 82 where both local scripts and conventions and the Latin script can be 83 used. 85 There are many good reasons for domain name i18n, and some arguments 86 that are brought forward against such an extension. This document, 87 however, does not discuss the pros and cons of domain name i18n. It 88 proposes and discusses a solution and therefore eliminates one of the 89 most often heard arguments agains, namely "it cannot be done". 91 The solution proposed in this document consists of the introduction 92 of a new "zero-level" domain building the root of a new domain 93 branch, and an encoding of the Universal Character Set (UCS) 94 [ISO10646] into the limited character set of domain names. 96 1.2 Notational Conventions 98 In the domain name examples in this document, characters of the basic 99 Latin alphabet (expressible in ASCII) are denoted with lower case 100 letters. Upper case letters are used to represent characters outside 101 ASCII, such as accented characters of the Latin alphabet, characters 102 of other alphabets and syllabaries, ideographic characters, and vari- 103 ous signs. 105 2. The Hidden Zero Level Domain 107 The domain name system uses the domain "in-addr.arpa" to convert 108 internet addresses back to domain names. One way to view this is to 109 say that in-addr.arpa forms the root of a separate hierarchy. This 110 hierarchy has been made part of the main domain name hierarchy just 111 for implementation convenience. While syntactically, in-addr.arpa is 112 a second level domain (SLD), functionally it is a zero level domain 113 (ZLD) in the same way as "." is a ZLD. 115 For domain name i18n to work inside the tight restrictions of domain 116 name syntax, one has to define an encoding that maps strings of UCS 117 characters to strings of characters allowable in domain names, and a 118 means to distinguish domain names that are the result of such an 119 encoding from ordinary domain names. 121 This document proposes to create a new ZLD to distinguish encoded 122 i18n domain names from traditional domain names. This domain would 123 be hidden from the user in the same way as a user does not see in- 124 addr.arpa. This domain could be called "i18n.arpa" (although the use 125 of arpa in this context is definitely not appropriate), simply 126 "i18n", or even just "i". Below, we are using "i" for shortness, 127 while we leave the decision on the actual name to further discussion. 129 3. Encoding International Characters 130 3.1 Encoding Requirements 132 Until quite recently, the thought of going beyond ASCII for something 133 such as domain names failed because of the lack of a single encom- 134 passing character set for the scripts and languages of the world. 135 Tagging techniques such as those used in MIME headers [RFC1522] would 136 be much too clumsy for domain names. 138 The definition of ISO 10646 [ISO10646], codepoint by codepoint iden- 139 tical with Unicode [Unicode], provides a single Universal Character 140 Set (UCS). A recent report [RFCIAB] clearly recommends to base the 141 i18n of the Internet on these standards. 143 An encoding for i18n domain names therefore has to take the charac- 144 ters of ISO 10646/Unicode as a starting point. The full four-byte 145 (31 bit) form of UCS, called UCS4, should be used. A limitation to 146 the two-byte form (UCS2), which allows only for the encoding of the 147 Base Multilingual Plane, is too restricting. 149 For the mapping between UCS4 and the strongly limited character set 150 of domain names, the following constraints have to be considered: 152 - The structure of domain names, and therefore the "dot", have to be 153 conserved. Encoding is done for individual labels. 155 - Individual labels in domain names allow the basic Latin alphabet 156 (monocase, 26 letters), the "-" inside the label, and the ten dec- 157 imal digits in all but the initial position. The capacity per 158 octet is therefore limited to somewhat above 5 bits. 160 - There is no need nor possibility to preserve any characters. 162 - Frequent characters (i.e. ASCII, alphabetic, UCS2, in that order) 163 should be encoded relatively compactly. A variable-length encoding 164 (similar to UTF-8) seems desirable. 166 3.2 Encoding Definition 168 Several encodings for UCS, so called UCS Transform Formats, exist 169 already, namely UTF-8 [RFC2044], UTF-7 [RFC1642], and UTF-16 [Uni- 170 code]. Unfortunately, none of them is suitable for our purposes. We 171 therefore use the following encoding: 173 - To accommodate the slanted probability distribution of characters 174 in UCS4, a variable-length encoding is used. 176 - Each target letter encodes 5 bits. Four bits are used as data 177 bits, the fifth bit is used to indicate continuation of the vari- 178 able-length encoding. 180 - Continuation is indicated by distinguishing the initial letter 181 from the subsequent letter [alternative: distinguish leading let- 182 ters from final. Pros? Cons?]. 184 - Leading four-bit groups of binary value 0000 of UCS4 characters 185 are discarded, except for the last TWO groups (i.e. the last 186 octet). This means that ASCII and Latin-1 characters need two 187 target letters, the main alphabets up to and including Tibetan 188 need three target letters, the rest of the characters in the BMP 189 need four target letters, all except the last (private) plane in 190 the UTF-16/Surrogates area [Unicode] need five target letters, and 191 so on. 193 - The letters representing the various bit groups in the various 194 positions are chosen according to the following table: 196 Nibble Value Initial Subsequent 197 Hex Binary 198 0 0000 G 0 199 1 0001 H 1 200 2 0010 I 2 201 3 0011 J 3 202 4 0100 K 4 203 5 0101 L 5 204 6 0110 M 6 205 7 0111 N 7 206 8 1000 O 8 207 9 1001 P 9 208 A 1010 Q A 209 B 1011 R B 210 C 1100 S C 211 D 1101 T D 212 E 1110 U E 213 F 1111 V F 215 [Should we try to eliminate "I" and "O" from initial? "I" might be 216 eliminated because then an algorithm can more easily detect ".i". "O" 217 could lead to some confusion with "0". What other protocols are 218 there that might be able to use a similar solution, but that might 219 have other restrictions for the initial letters?] 221 Please note that this solution has the following interesting proper- 222 ties: 224 - For subsequent positions, there is an equivalence between the hex- 225 adecimal value of the character code and the target letter used. 226 This assures easy conversion and checking. 228 - The absence of digits from the "initial" column, and the fact that 229 the hyphen is not used, assures that the resulting string conforms 230 to domain name syntax. 232 - Raw sorting of encoded and unencoded domain names is equivalent. 234 - The boundaries of characters can always be detected easily. 235 (While this is important for representations that are used inter- 236 nally for text editing, it is actually not very important here, 237 because tools for editing can be assumed to use a more straight- 238 forward representation internally.) 240 - Unless control characters are allowed, the target string will 241 never actually contain a G. 243 3.3 Encoding Example 245 As an example, the current domain 247 is.s.u-tokyo.ac.jp 249 with the components standing for information science, science, the 250 University of Tokyo, academic, and Japan, might in future be repre- 251 sented by 253 JOUHOU.RI.TOUDAI.GAKU.NIHON 255 (a transliteration of the kanji that might probably be chosen to rep- 256 resent the same domain). Writing each character in U+HHHH notation as 257 in [Unicode], this is 259 U+60c5U+5831.U+7406.U+6771U+5927.U+5b66.U+65e5U+672c 261 and will be translated by the software handling internationalized 262 domain names, according to the above specifications, to 263 M0C5L831.N406.M771L927.LB66.M5E5M72C.i 265 3.4 Length Considerations 267 DNS allows for a maximum of 63 positions in each part, and for 255 268 positions for the overall domain name including dots. This allows up 269 to 15 ideographs, or up to 21 letters e.g. from the Hebrew or Arabic 270 alphabet, in a label. While this does not allow for the same margin 271 as in the case of ASCII domain names, it should still be quite suffi- 272 cient. [Problems could only surface for languages that use very long 273 words or terms and don't know any kind of abbreviations or similar 274 shortening devices. Do these exist?] DNS contains a compression 275 scheme that avoids sending the same trailing portion of a domain name 276 twice in the same transmission. Long domain names are therefore not 277 that much of a concern. 279 4. Usage Considerations 281 4.1 General Usage 283 To implement this proposal, neither DNS servers nor resolvers need 284 changes. These programs will only deal with the encoded form of the 285 domain name with the .i suffix. Software that wants to offer an 286 internationalized user interface (for example a web browser) is 287 responsible for the necessary conversions. It will analyze the domain 288 name, call the resolver directly if the domain name conforms to the 289 domain name syntax restrictions, and otherwise encode the name 290 according to the specifications of Section 3.2 and append the .i suf- 291 fix before calling the resolver. New implementations of resolvers 292 will of course offer a companion function to gethostbyname accepting 293 a ISO10646/Unicode string as input. 295 4.2 Usage Restrictions 297 While this proposal in theory allows to have control characters such 298 as BEL or NUL or symbols such as arrows and smilies in domain names, 299 such characters should clearly be excluded from domain names. Whether 300 this has to be explicitly specified or whether the difficulty to type 301 these characters on any keyboard of the world will limit their use 302 has to be discussed. 304 A related point is the question of equivalence. For historical rea- 305 sons, ISO 10646/Unicode contain considerable number of compatibility 306 characters and allow more than one representation for characters with 307 diacritics. To guarantee smooth interoperability in these and related 308 cases, additional restrictions or the definition of some form of nor- 309 malization seem necessary. However, this is a general problem affect- 310 ing all areas where ISO 10646/Unicode is used in identifiers, and 311 should therefore be addressed in a generic way. 313 Equally related is the problem of case equivalence. Users can very 314 well distinguish between upper case and lower case. Also, casing in 315 an i18n context is not as straightforward as for ASCII, so that case 316 equivalence is best avoided. Problems therefore result not from the 317 fact that case is distinguished for i18n domain names, but from the 318 fact that existing domain names do not distinguish case. Where it is 319 impossible to distinguish between next.com and NeXT.com, the same two 320 subdomains would easily be distinguishable if subordinate to a i18n 321 domain. 323 A problem that also has to be discussed and solved is bidirectional- 324 ity. Arabic and Hebrew characters are written right-to-left, and the 325 mixture with other characters results in a divergence between logical 326 and graphical sequence. See [HTML-I18N] for more explanations. The 327 proposal of [Yer96] for dealing with bidirectionality in URLs could 328 probably be applied to domain names. 330 4.3 Domain Name Creation 332 The ".i" ZLD should be created as such to allow the internationaliza- 333 tion of domain names. Rules for creating subdomains inside ".i" 334 should follow the established rules for the creation of functionally 335 equivalent domains in the existing domain hierarchy, and should 336 evolve in parallel. However, the peculiarities of i18n domain names 337 should be carefully considered: 339 - Depending on the script, reasonable lengths for domain name parts 340 may differ greatly. For ideographic scripts, a part may often be 341 only a one-letter code. Established rules for lengths may need 342 adaptation. 344 - If the number of generic TLDs (.com, .edu, .org, .net) is kept 345 low, then it may be feasible to restrict i18n TLDs to country 346 TLDs. 348 - There are no ISO 639 two-letter codes in scripts other than Latin. 349 I18n domain names for countries will have to be designed from 350 scratch. 352 - The names of some countries or regions may pose greater political 353 problems when expressed in the native script than when expressed 354 in 2-letter ISO 639 codes. 356 - I18n country domain names should in principle only be created in 357 those scripts that are used locally. There is probably little use 358 in creating an Arabic domain name for China, for example. 360 - In those cases where domain names are open to a wide range of 361 applicants, a special procedure for accepting applications should 362 be used so that a reasonable-quality fit between ASCII domain 363 names and i18n domain names results where desired. This would 364 probably be done by establishing a period of about a month for 365 applications inside a i18n domain newly created as a parallel for 366 an existing domain, and resolving the detected conflicts. 368 - It may be desirable to have internationalized subdomains in non- 369 internationalized TLDs. As an example, many companies in France 370 may want to register an accented version of their company name, 371 while remaining under the .fr TLD. For this, .fr would have to be 372 reregistered as .M6N2.i. Accented and other internationalized sub- 373 domains would go below .M6N2.i, whereas unaccented ones would go 374 below .fr in its plain form. 376 - To generalize the above case, one might create a requirement that 377 any domain name registry would be required to register and manage 378 a corresponding .i domain upon request to allow registration of 379 i18n domain names in arbitrary subdomains. 381 4.4 Usage in URLs 383 According to current definitions, URLs encode sequences of octets 384 into a sequence of characters from a character set that is almost as 385 limited as the character set of domain names [RFC1738]. This is 386 clearly not satisfying for i18n. 388 Internationalizing URLs, i.e. assigning character semantics to the 389 encoded octets, can either be done separately for each part and/or 390 scheme, or in an uniform way. Doing it separately has the serious 391 disadvantage that software providing user interfaces for URLs in gen- 392 eral would have to know about all the different i18n solutions of the 393 different parts and schemes. Many of these solutions may not even be 394 known yet. 396 It is therefore definitely more advantageous to decide on a single 397 and consistent solution for URL internationalization. The most valu- 398 able candidate [Yer96], for many reasons, is UTF-8 [RFC2044], an 399 ASCII-compatible encoding of UCS4. 401 Therefore, an URL containing the domain name of the example of Sec- 402 tion 3.3 should not be written as: 404 ftp://M0C5L831.N406.M771L927.LB66.M5E5M72C.i 406 (although this will also work) but rather 408 ftp://%e6%83%85%e5%a0%b1.%e7%90%86.%e6%9d%b1%e5%a4%a7. 409 %e5%ad%a6.%e6%97%a5%e6%9c%ac 411 In this canonical form, the trailing .i is absent, and the octets can 412 be reconstructed from the %HH-encoding and interpreted as UTF-8 by 413 generic URL software. The software part dealing with domain names 414 will carry out the conversion to the .i form. 416 5. Alternate Proposals 418 5.1 The Dillon Proposal 420 The proposal of Michael Dillon [Dillon96] is also based on encoding 421 Unicode into the limited character set of domain names. Distinction 422 is done for each part, using the hyphen in initial position. Because 423 this does not fully conform to the syntax of existing domain names, 424 it is questionable whether it is backwards-compatible. On the other 425 hand, this has the advantage that local i18n domain names can be 426 installed easily without cooperation by the manager of the superdo- 427 main. 429 A variable-length scheme with base 36 is used that can encode up to 430 1610 characters, absolutely insufficient for Chinese or Japanese. 431 Characters assumed not to be used in i18n domain names are excluded, 432 i.e. only one case is allowed for basic Latin characters. This means 433 that large tables have to be worked out carefully to convert between 434 ISO 10646/Unicode and the actual number that is encoded with base 36. 436 5.2 Using a Separate Lookup Service 438 Instead of using a special encoding and burdening DNS with i18n, one 439 could build and use a separate lookup service for i18n domain names. 440 Instead of converting to UCS4 and encoding according to Section 3.2, 441 and then calling the DNS resolver, a program would contact this new 442 service when seeing a domain name with characters outside the allowed 443 range. 445 Such a solution has various problems. A separate service does not yet 446 exist, whereas DNS is readily usable. Solving the problems of unique- 447 ness, etc., again for this separate service creates a lot of work. On 448 the other side, there are no savings in terms of implementation 449 costs. DNS also does not have a serious capacity problem that might 450 be addressed by using a separate lookup service, nor is such a prob- 451 lem created by i18n domain names. 453 6. Generic Considerations 455 6.1 Security Considerations 457 This proposal is believed not to raise any other security considera- 458 tions than the current use of the domain name system. 460 6.2 Internationalization Considerations 462 This proposal addresses internationalization as such. The main addi- 463 tional consideration with respect to internationalization may be the 464 indication of language. However, for concise identifiers such as 465 domain names, language tagging would be too much of a burden and 466 would create complex dependencies with semantics. 468 NOTE -- This section is introduced based on a recommenda- 469 tion in [RFCIAB]. A similar section addressing internation- 470 alization should be included in all application level 471 internet drafts and RFCs. 473 Acknowledgements 475 I am grateful in particular to the following persons: 477 Bert Bos, Lori Brownell, Michael Dillon, David Goldsmith, Larry Mas- 478 inter, Keith Moore, and Francois Yergeau 480 Bibliography 482 [ASCII] Coded Character Set -- 7-Bit American Standard Code 483 for Information Interchange, ANSI X3.4-1986. 485 [Dillon96] M. Dillon, "Multilingual Domain Names", Memra Software 486 Inc., November 1996 (circulated Dec. 6, 1996 on iahc- 487 discuss@iahc.org). 489 [HTML-I18N] F. Yergeau, G. Nicol, G. Adams, and M. Duerst, "Inter- 490 nationalization of the Hypertext Markup Language", 491 Work in progress (draft-ietf-html-i18n-05.txt), August 492 1996. 494 [ISO10646] ISO/IEC 10646-1:1993. International standard -- Infor- 495 mation technology -- Universal multiple-octet coded 496 character Set (UCS) -- Part 1: Architecture and basic 497 multilingual plane. 499 [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facili- 500 ties", ISI, Nov. 1987. 502 [RFC1035] P. Mockapetris, "Domain Names - Implementation and 503 Specification", ISI, Nov. 1987. 505 [RFC1522] K. Moore, "MIME (Multipurpose Internet Mail Exten- 506 sions) Part Two: Message Header Extensions for Non- 507 ASCII Text", University of Tennessee, September 1993. 509 [RFC1642] D. Goldsmith, M. Davis, "UTF-7: A Mail-safe Transfor- 510 mation Format of Unicode", Taligent Inc., July 1994. 512 [RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, 513 "Uniform Resource Locators (URL)", CERN, Dec. 1994. 515 [RFC2044] F. Yergeau, "UTF-8, A Transformation Format of Unicode 516 and ISO 10646", Alis Technologies, October 1996. 518 [RFCIAB] C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R. 519 Atkinson, M. Crispin, P. Svanberg, "Report from the 520 IAB Character Set Workshop", October 1996 (currently 521 available as draft-weider-iab-char-wrkshop-00.txt). 523 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 524 2.0", Addison-Wesley, Reading, MA, 1996. 526 [Yer96] F. Yergeau, "Internationalization of URLs", Alis Tech- 527 nologies, 528 . 530 Author's Address 532 Martin J. Duerst 533 Multimedia-Laboratory 534 Department of Computer Science 535 University of Zurich 536 Winterthurerstrasse 190 537 CH-8057 Zurich 538 Switzerland 540 Tel: +41 1 257 43 16 541 Fax: +41 1 363 00 35 542 E-mail: mduerst@ifi.unizh.ch 544 NOTE -- Please write the author's name with u-Umlaut wherever 545 possible, e.g. in HTML as Dürst.