idnits 2.17.1 draft-klensin-idn-tld-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 864. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 841. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 848. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 854. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 13, 2005) is 6891 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166' ** Obsolete normative reference: RFC 1341 (ref. 'MIME') (Obsoleted by RFC 1521) ** Downref: Normative reference to an Informational RFC: RFC 1480 ** Downref: Normative reference to an Informational RFC: RFC 1591 ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034, RFC 4035) ** Obsolete normative reference: RFC 2672 (Obsoleted by RFC 6672) ** Obsolete normative reference: RFC 3454 (Obsoleted by RFC 7564) ** Downref: Normative reference to an Informational RFC: RFC 3467 ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (Obsoleted by RFC 5891) ** Downref: Normative reference to an Informational RFC: RFC 3696 Summary: 14 errors (**), 0 flaws (~~), 2 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft June 13, 2005 4 Expires: December 15, 2005 6 National and Local Characters for DNS Top Level Domain (TLD) Names 7 draft-klensin-idn-tld-05.txt 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on December 15, 2005. 34 Copyright Notice 36 Copyright (C) The Internet Society (2005). 38 Abstract 40 In the context of work on internationalizing the Domain Name System 41 (DNS), there have been extensive discussions about "multilingual" or 42 "internationalized" top level domain names (TLDs), especially for 43 countries whose predominant language is not written in a Roman-based 44 script. This document reviews some of the motivations for such 45 domains, several suggestions that have been made to provide needed 46 functionality, and the constraints that the DNS imposes. It then 47 suggests an alternative, local translation, that may solve a superset 48 of the problem while avoiding protocol changes, serious deployment 49 delays, and other difficulties. The suggestion utilizes a 50 localization technique in applications to permit any TLD to be 51 accessed using the vocabulary and characters of any language. It is 52 not restricted to language- or country-specific "multilingual" TLDs 53 in the language(s) and script(s) of that country. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.2 Background on the "Multilingual Name" Problem . . . . . . 3 60 1.2.1 Approaches to the Requirement . . . . . . . . . . . . 3 61 1.2.2 Writing the Name of One's Country in its Own 62 Characters . . . . . . . . . . . . . . . . . . . . . . 4 63 1.2.3 Countries with Multiple Languages and Countries 64 with Multiple Names . . . . . . . . . . . . . . . . . 5 65 1.2.4 Availability of Non-ASCII Characters in Programs . . . 5 66 1.3 Domain Name System Constraints . . . . . . . . . . . . . . 5 67 1.3.1 Administrative Hierarchy . . . . . . . . . . . . . . . 6 68 1.3.2 Aliases . . . . . . . . . . . . . . . . . . . . . . . 6 69 1.4 Internationalization and Localization . . . . . . . . . . 7 70 2. Client-side Solutions . . . . . . . . . . . . . . . . . . . . 7 71 2.1 IDNA and the Client . . . . . . . . . . . . . . . . . . . 8 72 2.2 Local Translation Tables for TLD Names . . . . . . . . . . 8 73 3. Advantages and Disadvantages of Local Translation . . . . . . 9 74 3.1 Every TLD Appears in the Local Language and Character 75 Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 76 3.2 Unification of Country Code Domains . . . . . . . . . . . 10 77 3.3 User Understanding of Local and Global References . . . . 11 78 3.4 Limits on Expansion of the Number of TLDs . . . . . . . . 11 79 3.5 Standardization of the Translations . . . . . . . . . . . 12 80 3.6 Implications for Future New Domain Names . . . . . . . . . 12 81 3.7 Mapping for TLDs, Not Domain Names or Keywords . . . . . . 13 82 4. Information Interchange, IDNs, Comparisons, and 83 Translations . . . . . . . . . . . . . . . . . . . . . . . . . 13 84 5. Internationalization Considerations . . . . . . . . . . . . . 15 85 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 86 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 87 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 88 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 89 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 18 90 Intellectual Property and Copyright Statements . . . . . . . . 19 92 1. Introduction 94 1.1 Terminology 96 This document assumes the conventional terminology used to discuss 97 the domain name system (DNS) and its hierarchical arrangements. 98 Terms such as "top level domain" (or just "TLD"), "subdomain", 99 "subtree", and "zone file" are used without further explanation. In 100 addition, the term "ccTLD" is used to denote a "country code top 101 level domain" and "gTLD" is used to denote a "generic top level 102 domain" as described in [RFC1591] and in common usage. 104 1.2 Background on the "Multilingual Name" Problem 106 People who share a language usually prefer to communicate in it, 107 using whatever characters are normally used to write that language, 108 rather than in some "foreign" one. There have been standards for 109 using mutually-agreed characters and languages in electronic mail 110 message bodies and selected headers since the introduction of MIME in 111 1992 [MIME] and the Web has permitted multilingual text since its 112 inception, also using MIME. Actual use of non-Roman-character 113 content came even earlier, using private conventions. However, 114 domain names are exposed to users in email addresses and URLs. 115 Corresponding arrangements, typically also exposing domain names, are 116 made for other application protocols. The combination of exposed 117 domain names with internationalization requirements led rapidly to 118 demands to permit domain names in applications that used characters 119 other than those of the very restrictive, ASCII-subset, "hostname" or 120 "letter-digit-hyphen" ("LDH") conventions recommended in the DNS 121 specifications [RFC1035]. The effort to do this soon became known as 122 "multilingual domain names". That was actually a misnomer, since the 123 DNS deals only with characters and identifier strings, and not, 124 except by accident or local registration conventions, with what 125 people usually think of as "names". There has also been little 126 actual interest in what would actually be a "multilingual name", 127 i.e., a name that contains components from more than one language. 128 Instead, interest has focused on the use, in the context of the DNS, 129 of strings that conform to specific individual languages. 131 1.2.1 Approaches to the Requirement 133 If the requirement is seen, not as "modifying the DNS", but as 134 "providing users with access to the DNS from a variety of languages 135 and character sets", three sets of proposals emerged in the IETF and 136 elsewhere. They were: 138 1. Perform processing in client software that recodes a user-visible 139 string into an ASCII-compatible form that can safely be passed 140 through the DNS protocols and stored in the DNS. This is the 141 approach used, for example, in the IETF's "IDNA" protocol 142 [RFC3490]. 144 2. Modify the DNS to be more hospitable to non-ASCII names and 145 strings. There have been a variety of proposals to do this, 146 using several different techniques. Some of these have been 147 implemented on a proprietary basis by various vendors. None of 148 them have gained acceptance in the IETF community, primarily 149 because they would take a long time to deploy, would leave many 150 problems unsolved, and have been shown to cause problems with 151 deployed approaches that had not yet been upgraded. 153 3. Move the problem out of the DNS entirely, relying instead on a 154 "directory" or "presentation" layer to handle 155 internationalization. The rationale for this approach is 156 discussed in [RFC3467]. 158 This document proposes a fourth approach, applicable to the top level 159 domains (TLDs) only (see Section 1.3.1 for a discussion of the 160 special issues that make TLDs both problematic and a special 161 opportunity). That approach involves having the user interface of 162 applications map non-ASCII names for TLDs to existing TLDs and could 163 be used as an alternate or supplement to the strategies summarized 164 above. 166 1.2.2 Writing the Name of One's Country in its Own Characters 168 An early focus of the "multilingual domain name" efforts was 169 expressed in statements such as "users in my country, in which ASCII 170 is rarely used, should be able to write an entire domain name in 171 their own character set". In particular, since all top-level domain 172 names, at present, follow the LDH rules, the modified naming rules 173 discussed in [RFC1123], and the coding conventions specified in 174 [RFC1591], all fully-qualified DNS names were effectively required to 175 contain at least one ASCII label (the TLD name). Some advocates for 176 internationalized names have considered the presence of any ASCII 177 labels inappropriate. One should, instead, be able to write the name 178 of the ccTLD for China in Chinese, the name of the ccTLD for Saudi 179 Arabia in Arabic, the name for Spain in Spanish, and so on. 181 That much could be accomplished, given updated applications, by using 182 a new TLD name with IDNA encoding. Of course, adding such a TLD 183 would raise new questions: what to do about gTLDs, how to handle 184 countries with several official languages (perhaps even using 185 different scripts), how should name strings be chosen, and whether 186 there should be an attempt to coordinate the contents of the local- 187 language TLD zone and the traditional ISO 3166-coded one. A few of 188 these issues are addressed below. But, if one examines (or even 189 thinks about) user behavior and preferences, it is almost as 190 important that one be able to write the name of the ccTLD for China 191 in Arabic and that of Saudi Arabia in Chinese: true 192 internationalization implies that, at least to the extent to which 193 ambiguity and conflicts can be avoided, people should be able to use 194 the languages and character sets they prefer. For the same reasons 195 that one would like to have all-Chinese domain names available in 196 China, it is important to have the capability to have an apparent 197 Chinese-language TLD for a domain whose second level and beyond are 198 Chinese characters, even when the TLD itself serves predominantly 199 non-Chinese-speaking registrants and users. 201 1.2.3 Countries with Multiple Languages and Countries with Multiple 202 Names 204 From a user interface standpoint, writing ccTLD names in local 205 characters is a problem. As discussed below in Section 1.3.2, the 206 DNS itself does not easily permit a domain to be referred to by more 207 than one name (or spelling or translation of a name). Countries with 208 more than one official language would require that the country name 209 be represented in each of those languages. And, just as it is 210 important that a user in China be able to represent the name of the 211 Chinese ccTLD in Chinese characters, she should be able to access a 212 Chinese-language site in France using Chinese characters. That would 213 require that she be able to write the name of the French ccTLD in 214 those characters rather than in a form based on a Roman character 215 set. 217 1.2.4 Availability of Non-ASCII Characters in Programs 219 Over the years, computer users have gotten used to the fact that not 220 every computer has a full set of characters available to every 221 program. An extreme example is an Arabic speaker using a public 222 kiosk computer in an airport in the United States: there is only a 223 small chance that the web browser there will be able to input and 224 render Arabic correctly. This has a direct effect on the 225 multilingual TLD problem in that it is not possible to simply change 226 a name of the ccTLDs in the DNS to be one of a given country's non- 227 ASCII names without possibly preventing people from entering those 228 names throughout the world. 230 1.3 Domain Name System Constraints 231 1.3.1 Administrative Hierarchy 233 The domain name system is firmly rooted in the idea of an 234 "administrative hierarchy", with the entity responsible for a given 235 node of the hierarchy responsible for policies applicable to its 236 subhierarchies (Cf. [RFC1034], [RFC1035], and [RFC1591]). The model 237 works quite well for the domain and subdomains of a particular 238 enterprise. In an enterprise situation, the hierarchy can be 239 organized to match the organizational structure; there are 240 established ways to set policies; and there are, at least presumably, 241 shared assumptions about overall goals and objectives among all 242 registrants in the domain. It is more problematic when a domain is 243 shared by unrelated entities that lack common policy assumptions 244 because it is difficult to reach agreement on rules that should apply 245 to all of the entities and subdomains of such a domain. The 246 unrelated entities situation always prevails for the labels 247 registered in a TLD (second-level names) except in those TLDs for 248 which the second level is structural (e.g., the .CO, .AC, .GOV 249 conventions in many ccTLDs or in the historical geographical 250 organization of .US [RFC1480]) in which case, it exists for the 251 labels within that structural level. 253 TLDs may, but need not, have consistent registration policies for 254 those second (or third) level names. Countries (or ccTLD 255 administrators) have often adopted rules about what entities may 256 register in their ccTLDs, and what forms the names may take. RFC 257 1591 outlined registration norms for most of the then-extant gTLDs, 258 even though those norms have been largely ignored in recent years. 259 Some recent "sponsored" and purpose-specific domains are based on 260 quite specific rules about appropriate registrations. Homogeneous 261 registration rules for the root are, by contrast, impossible: almost 262 by definition, the subdomains registered in the root (TLDs) are 263 diverse and no single policy about types and formats of names 264 applying to all root subdomains is feasible. 266 1.3.2 Aliases 268 In an environment different from the DNS, a rational way to permit 269 assigning local-language names to a country code (or other) domain 270 would be to set up an alias for the name, or to use some sort of "see 271 instead" reference. But the DNS does not have facilities for either. 272 Instead, it supports a "CNAME" record, whose label can refer only to 273 a particular label and not to a subtree. For example, if A.B.C is a 274 fully-qualified name, then a CNAME reference in B.C from X to A would 275 make X.B.C appear to have the same values as A.B.C. However, a CNAME 276 reference from Y to C in the root would not make A.B.Y referenceable 277 (or even defined) at all. A second record type, DNAME [RFC2672], can 278 provide an alias for a portion of the tree. But many believe that it 279 is problematic technically. At a minimum, it can cause 280 synchronization issues when references across zones occur, and its 281 use has been discouraged within the IETF except as a means of 282 enabling a transition from one domain to another. Even if the design 283 of yet another alias-type record type were contemplated, DNS 284 technical constraints of query-response integrity and DNSSec zone 285 signing (cf. [RFC2535] and its successors) make it extremely unlikely 286 that one could be defined that would met the desired requirements for 287 "see instead" or true synonym references. 289 1.4 Internationalization and Localization 291 It has often been observed that, while many people talk about 292 "internationalization", they often really mean, and want, 293 "localization". "Internationalization" in this context, suggests 294 making something globally accessible while incorporating a broad- 295 range "universal" character set and conventions appropriate to all 296 languages and cultures. "Localization", by contrast, involves having 297 things work well in a particular locality or for a broad range of 298 localities, although aspects of the style of operation might differ 299 for each locality. Anything that actually involves the DNS must be 300 global, and hence internationalized, since the DNS cannot 301 meaningfully support different responses or query and matching models 302 based, e.g., on the location of the user making a query. While the 303 DNS cannot support localization internally, many of the features 304 discussed earlier in this section are much more easily thought about 305 in local terms --whether localized to a geographical area, users of a 306 language, or using some other criteria -- than in global ones. 308 2. Client-side Solutions 310 Traditionally, the IETF avoided becoming involved in standardization 311 for actions that take place strictly on individual hosts on the 312 network, instead confining itself to behavior that is observable "on 313 the wire", i.e., in protocols between network hosts. Exceptions to 314 this general principle have been made when different clients were 315 required to utilize data or interpret values in compatible ways to 316 preserve interoperability: the standards for email and web body 317 formats, and IDNA itself, are examples of these exceptions. 318 Regardless of what is required to be standardized, it is almost never 319 required, and often unwise, that a user interface present "on the 320 wire" formats to the user, at least by default (debugging options 321 that show the wire formats are common and often quite useful). 322 However, in most cases when the presentation format and the wire 323 format differ, the client program must take precautions that the wire 324 format can be reconstructed from user input, or to keep the wire 325 format, while hidden, bound to the presentation mechanism so that it 326 can be reconstructed. While it is rarely a goal in itself, it is 327 often necessary that the user be at least vaguely aware that the wire 328 ("real") format is different from the presentation one and that the 329 wire format be available for debugging. 331 In fact, the DNS itself is an excellent example of the difference 332 between the wire format and the user presentation format. Most 333 Internet users do not realize that the wire format for DNS queries 334 and responses does not include the "." character. Instead, each 335 label is represented by a length in bytes of the label, followed by 336 the label itself. 338 2.1 IDNA and the Client 340 As mentioned above, IDNA itself is entirely a client-side protocol. 341 It works by performing some mappings and then encoding labels to be 342 placed into the DNS in a special format called "punycode" [RFC3492]. 343 When labels in that format are encountered, they are transformed, by 344 the client, back into internationalized (normally Unicode [ISO10646]) 345 characters. In the context of this document, the important 346 observation about IDNA is that any application program that supports 347 it is already doing considerable transformation work in the client; 348 it is not simply presenting the "on the wire" formats to the user. 349 It is also the case that, if an application implementation make 350 different mappings than those called for by IDNA, that is likely to 351 be detected only when and if users complain about unexpected 352 behavior: as long as the punycode strings sent to it are valid, the 353 server cannot tell what mappings were applied to develop those 354 strings. 356 2.2 Local Translation Tables for TLD Names 358 We suggest that, in addition to maintaining the code and tables 359 required to support IDNA, authors of application programs may want to 360 maintain a table that contains a list of TLDs and locally-desirable 361 names for each one. For ccTLDs, these might be the names (or 362 locally-standard abbreviations) by which the relevant countries are 363 known locally (whether in ASCII characters or others). With some 364 care on the part of the application designer (e.g., to ensure that 365 local forms do not conflict with the actual TLD names), a particular 366 TLD name input from the user could be either in local or standard 367 form without special tagging or problems. When DNS names are 368 received by these client programs, the TLD labels would be mapped to 369 local form before IDNA is applied to the rest of the name; when names 370 are received from users, local TLD names would be mapped to the 371 global ones before applying IDNA or being used in other DNS 372 processing. 374 3. Advantages and Disadvantages of Local Translation 376 3.1 Every TLD Appears in the Local Language and Character Set 378 The notion of a top-level domain whose name matches, e.g., the name 379 that is used for a country in that country or the name of a language 380 in that language as, as mentioned above, immediately appealing. But 381 most of the reasons for it argue equally strongly for other TLDs 382 being accessible from that language. A user in Korea who can access 383 the national ccTLD in the Korean language and character set has every 384 reason to expect that both generic top level domains and domains 385 associated with other countries would be similarly accessible, 386 especially if the second-level domains bear Korean names. A user 387 native to Spain or Portugal, or in Latin America, would presumably 388 have similar expectations, but would expect to use Spanish or 389 Portuguese names, not Korean ones. 391 That level of local optimization is not realistic -- some would argue 392 not possible -- with the DNS since it would ultimately require that 393 every top level domain be replicated for each of the world's 394 languages. That replication process would involve not just the top 395 level domain itself: in principle, all of its subtrees would need to 396 be completely replicated as well. Perhaps, in practice, not all 397 subtrees would require replication, but only those for which a 398 language variation or translation was significant. But, while that 399 restriction would change the scale of the problem, it would not alter 400 its basic nature. The administrative hierarchy characteristics of 401 the DNS (see Section 1.3.1) turn the replication process into an 402 administrative nightmare: every administrator of a second-level 403 domain in the world would be forced to maintain dozens, probably 404 hundreds, of similar zone files for the replicates of the domain. 405 Even if only the zones relevant to a particular country or language 406 were replicated, the administrative and tracking problems to bind 407 these to the appropriate top-level domain and keep all of the 408 replicas synchronized would be extremely difficulty at best. And 409 many administrators of third- and fourth-level domains, and beyond, 410 would be faced with similar problems. 412 By contrast, dealing with the names of TLDs as a localization 413 problem, using local translation, is fairly simple although it places 414 some burden of understanding on the user (see Section 4). Each 415 function represented by a TLD -- a country, generic registrations, or 416 purpose-specific registrations -- could be represented in the local 417 language and character set as needed. And, for countries with many 418 languages -- or users living, working in, or visiting countries where 419 their language was not dominant -- "local" could be defined in terms 420 of the needs or wishes of each particular user. 422 An additional benefit is that, if two countries called themselves by 423 the same name in their local languages -- if, e.g., Western Slobbovia 424 and Eastern Slobbovia both called themselves "Slobland" -- local 425 conventions could be followed as long as users understood that only 426 internal forms (in this case, the ISO 3166-based ccTLD name) could be 427 exported outside the country (see Section 3.3). 429 Note that this proposal is to allow mapping of native-language 430 strings to existing TLDs. It would almost certainly be ill-advised 431 to stretch this idea too far and try to map strings that local users 432 would be unlikely to guess into TLDs. For example, there are 433 probably no languages in which the country known in English as 434 "Finland" is called "FI". Thus, one would not want to create a 435 mapping from two characters that look or sound like a Roman "F" and a 436 Roman "I" to the ccTLD ".fi". 438 3.2 Unification of Country Code Domains 440 It follows from some of the comments above that, while there appears 441 to be some immediate appeal from having (at least) two domains for 442 each country, one using the ISO 3166-1 code [ISO3166] and another one 443 using a name based on the national name in the national language, 444 such a situation would create considerable problems for registrants 445 in both domains. For registrants maintaining enterprise or 446 organizational subdomains, ease of administration of a single family 447 of zone files will usually make a registration in a single top-level 448 domain preferable to replicated sets of them, at least as long as 449 their functional requirements (such a local-language access) are met 450 by the unified structure. For those registrants with no interest in 451 any Internet function or protocols other than use of the HTTP/ 452 HTTPS-based web, this problem can be dealt with at the applications 453 level by the use of redirects but, in the general case, that is not a 454 feasible solution. 456 For countries with multiple national languages that are considered 457 equal and legally equivalent, the advantages of a translation-based 458 approach, rather than multiple registrations and replicated trees, 459 would be even more significant. Actually installing and maintaining 460 a separate TLD for each language would be an administrative 461 nightmare, especially if it was intended that the associated zones be 462 kept synchronized. The oft-suggested proposal to adopt an "exactly 463 one extra domain for each country" rule would essentially require 464 some of the multiple-official-language countries to violate their own 465 constitutions. Conversely, having multiple domains for a given 466 country, based on the number of official languages and without any 467 expectation of synchronization, would give some countries an 468 additional allocation of TLDs that others would certainly consider 469 unfair. 471 Of course, having replicated domains might be popular with some 472 registries and registrars, since replication would almost inevitably 473 increase the total number of domains to be registered. Helping that 474 group of registries and registrars, while hurting Internet users by 475 adding administrative overhead and confusion, is not a goal of this 476 document. 478 3.3 User Understanding of Local and Global References 480 While the IDNA tables (actually Nameprep [RFC3491] and Stringprep 481 [RFC3454]) must be identical globally for IDNA to work reliably, the 482 tables for mapping between local names and TLD names could be locally 483 determined, and differ from one locale to another, as long as users 484 understood that international interchange of names required using the 485 standard forms. That understanding puts some additional burden of 486 learning on users although part of it could be assisted by software 487 (see Section 4). 489 In any event, at least in the foreseeable future, it is likely that 490 DNS names being passed among users in different countries, or using 491 different languages, will be forced to be in punycode form to 492 guarantee compatibility, since those users would not, in general, 493 have the ability to read each other's scripts or have appropriate 494 input facilities (keyboards, etc.) for then. So the marginal 495 knowledge or effort needed to put TLD names into standard form and 496 transmit them that way would actually be very fairly small. 498 3.4 Limits on Expansion of the Number of TLDs 500 The concept of using local translation does have one side effect 501 which some portions of the Internet community might consider 502 undesirable. The size and complexity of translation tables, and 503 maintaining those tables, will be, to a considerable extent, a 504 function of the number of top-level domains of interest, the 505 frequency with which new domains are added, and the number of domains 506 that are added at a time. A country or other locale that wished to 507 maintain a complete set of translations (i.e., so that every TLD had 508 a representation in the local language) would presumably find setting 509 up a table for the current collection of a few hundred domains to be 510 a task that would take some days. If the number of TLDs were 511 relatively stable, with a relatively small number being added at 512 infrequent intervals, the updates could probably be dealt with on an 513 ad hoc basis. But, if large numbers of domains were added 514 frequently, or if the total number of TLDs became very large, 515 maintaining the table might require dedicated staff if each new TLD 516 is to be accommodated. Worse, updating the tables stored on client 517 machines might require update and synchronization protocols and all 518 of the complexities that tend to go with them (see [RFC3696] for a 519 discussion of some related issues in applications). 521 In practice, there will be little requirement to translate every TLD 522 into a local language. There are already existing TLDs for which 523 there is no obvious translations in many languages (most notably, 524 ".arpa") or where the translation will be far from obvious to typical 525 users (for example, ".int" and ".aero"). Of course, these could be 526 translated by function: ".arpa" to the local term for 527 "infrastructure", ".int" with "international" or "international 528 organization", ".aero" with "aeronautical" or "airlines", and so on, 529 but it is not clear whether doing so would have significant value. 530 For almost every language, there are dozens of ccTLDs for which there 531 are no translations of the country names into the local language that 532 would be known by anyone other than geographers. If new TLDs are 533 added, there might not be a strong need (or even capability) to have 534 language-specific equivalents for each. 536 3.5 Standardization of the Translations 538 An immediate question when proposals such as this one are considered 539 is whether the names for the various TLDs that do not match the 540 strings that are actually in the DNS should be standardized and, if 541 so, by what mechanism. Standardization would promote communication 542 within a country or among people sharing a language. However, it is 543 likely to be very difficult to reach appropriate international 544 agreements to which wide conformance could be expected. Exceptions 545 might arise within particular countries or language groups but, even 546 then, there might be advantages to users being able to specify 547 additional synomymous names that are easy for them to remember. As 548 with IDNA-based IDNs, users who wish to transmit information about 549 domain names to people whose exact capabilities and software are 550 unknown, and to do so with minimal risk of confusion, will probably 551 confine themselves to the names that actually appear in the DNS, 552 i.e., the "punycode" representations. 554 In any event, neither standardization nor uniform use of either the 555 system outlined here or of a specific collection of names is required 556 to make the system work for those who would find it useful. 557 Similarly, mechanisms for country-wide coordination, and examination 558 of the appropriateness or inappropriateness of such mechanisms, is 559 out of scope for this document. 561 3.6 Implications for Future New Domain Names 563 Applications that implement the proposal in this document are likely 564 to make the subsequent creation and acceptance of new IDNA-based TLDs 565 significantly more difficult. If this proposal becomes widely 566 adopted, local language names mapped as it suggests will be generally 567 expected by users of those languages to mean the same as a current 568 TLD. Creating a new, stand-alone IDNA-based TLD will then require 569 more deliberation and care to avoid conflicts and, when executed, 570 will require all the application software that maps the name to the 571 existing TLD to change the mapping tables. 573 For several reasons, this problem may not be as serious in practice 574 as it might first appear. For ccTLDs allocated according to the ISO 575 3166-1 list, there will presumably be no problem at all: not only are 576 the 3166-1 alpha-2 codes strictly in ASCII, but general trends, such 577 as those embodied in ICANN's "GAC Recommandations", against using 578 country names or codes for any purpose not associated with those 579 specific countries makes conflicts with internationalized names 580 extremely unlikely. Because the DNS does not currently have a usable 581 aliasing function (see Section 1.3.2), it is likely that new IDNA- 582 based TLDs will be allocated only after there is considerable 583 opportunity for countries and other individual entities to identify 584 any problems they see with proposed new names. 586 3.7 Mapping for TLDs, Not Domain Names or Keywords 588 It should be clear to anyone who has read this far that the mapping 589 described in this document is limited to TLDs, not full domain names 590 or keywords. In particular, nothing here should be construed as 591 applying to anything other than TLDs, due at least in part to the 592 limitations described in Section 3.1. Further, this document is only 593 about the domain name system (DNS), not about any keyword system. 594 The interactions between particular keyword systems and the proposals 595 here are left as a (possibly very difficult) exercise for the reader 596 or implementer of such systems. However, for the subset of such 597 systems whose intent is to entirely hide DNS names or URIs from the 598 user, their output would presumably be the LDH names that actually 599 appeared in the DNS, i.e., in punycode form for IDNA names and 600 without any application processing of the type contemplated here. 602 4. Information Interchange, IDNs, Comparisons, and Translations 604 This specification is based on a pair of fairly explicit assumptions. 605 The first is that the greatest and most important impact and value of 606 any internationalization or localization technique is to permit users 607 who share a language or culture to communicate with others who also 608 share that language or culture. Communication among users from 609 different cultures, using different languages or different scripts is 610 inherently more difficult, and still more difficult if they cannot 611 easily identify languages and scripts in common. The reason for 612 those difficulties are age-old issues in language translation and 613 differences among languages and scripts, not problems associated with 614 the DNS or IDNs, however they are represented. That is the second 615 assumption: when communication across language or cultural groups is 616 required, the users who need to do it --typically a much smaller 617 number than those communicating within the same language and 618 culture-- are going to need to rely on commonly-understood languages 619 and scripts and will need to exert somewhat more care and effort than 620 within their own groups. 622 As outlined in the sections above, the suggestions made in this 623 document could clearly be turned into major problems by misuse or 624 misunderstanding. For example, if two applications on the same host 625 used different translation tables, a situation could easily result 626 that would be very confusing to the user. However, in some cases, 627 this would be only slightly worse than some of the alternatives. For 628 example, if, on a given system, IDNs are expressed in native script 629 but ASCII TLD names are used, cutting and pasting from one 630 application to another may not work as expected, at least unless both 631 applications and the underlying operating system are all Unicode- 632 based and use the same encoding model for Unicode. Some applications 633 writers have already discovered, even without significant use of 634 IDNs, that they need to support separate "copy string" and "copy link 635 location", and the corresponding "paste" operations. Any use of IDNs 636 or Internationalized Resource Identifiers (IRIs, see [RFC3987]) may 637 require similar operations, or extensions to those operations, to 638 force strings into internal ("punycode" or URI) form on the copy 639 operation and to translate them back on paste. Were that done, the 640 appropriate translations could be performed as part of the same 641 process. If this author's hypothesis is correct -- that these 642 operations are likely to be required on many systems whether this 643 proposal is adopted or not -- then the additional translation 644 operations are likely to be invisible to the user. 646 In particular, precisely because the translated names proposed here 647 are part of a presentation form, rather than the internal form names, 648 they are inappropriate in a number of circumstances in which a 649 globally-unique, internal-form name is actually required. It would 650 be a poor, indeed dangerous, idea to use these names in security 651 contexts such as names in certificates, access lists, or other 652 contexts in which accurate comparisons are necessary. 654 A more general issue exists when DNS or IRI references are 655 transferred among users whose systems may be localized for different 656 languages or conventions. In general, a user in one part of the 657 world will not actually know how another user's systems are set up, 658 precisely what software is being used, etc., nor should users be 659 expected or forced to learn that information. But, if the user 660 transmitting an internationalized reference doesn't know that the 661 receiving system supports the same characters and fonts, and that the 662 receiving user is prepared to deal with them, the prudent user will 663 transmit the internal form of the reference in addition to, or even 664 instead of, the native-character form. And, of course, if the 665 reference is transmitted on paper, on a sign, in some coded character 666 set other than Unicode, or even as an image, rather than as a Unicode 667 string, the importance of supplementing it with the internal form 668 becomes even more important. The addition of a translation 669 requirement for TLD labels makes availability of internal forms in 670 interchange significantly more important, but does not actually 671 change the requirement to do so. 673 It may be helpful to note that, in a different networking model than 674 that used in the Internet, both this proposal and IDNA itself are 675 essentially "presentation layer" approaches rather than contructions 676 that can be expected to work well in interchange. 678 5. Internationalization Considerations 680 This entire specification addresses issues in internationalization 681 and especially the boundaries between internationalization and 682 locationalization and between network protocols and client/user 683 interface actions. 685 6. IANA Considerations 687 This specification does not contemplate any IANA registrations or 688 other actions. 690 7. Security Considerations 692 IDNA provides a client-based mechanism for presenting Unicode names 693 in applications while passing only ASCII-based names on the wire. As 694 such, it constitutes a major step along the path of introducing a 695 client-based presentation layer into the Internet. Client-based 696 presentation layer transformations introduce risks from non- 697 conforming tables that can change meaning without external 698 protection. For example, if a mapping table normally maps A onto C 699 and that table is altered by an attacker so that A maps onto D 700 instead, much mischief can be committed. On the other hand, these 701 are not the usual sort of network attacks: they may be thought of as 702 falling into the "users can always cause harm to themselves" 703 category. The local translation model outlined here does not 704 significantly increase the risks over those associated with IDNA, but 705 may provide some new avenues for exploiting them. 707 Both this approach and IDNA rely on having updated programs present 708 information to the user in a very different form than the one in 709 which it is transmitted on the wire. Unless the internal (wire) form 710 is always used in interchange, or at least made available when DNS 711 names are exchanged, there are possibilities for ambiguity and 712 confusion about references. As with IDNA itself, if only the "wire" 713 form is presented, the user will perceive that nothing of value has 714 been done, i.e., that no internationalization or localization has 715 occurred. So presentation of the "wire" form to eliminate the 716 potential ambiguities is unlikely to be considered an acceptable 717 solution, regardless of its security advantages. 719 If the translation tables associated with the technique suggested 720 here are obtained from a server, or translations are obtained from a 721 remote machine using some protocol, the mechanisms used should ensure 722 that the values received are authentic, i.e., that neither they, nor 723 the query for them, have been intercepted and tampered with in any 724 way. 726 8. Acknowledgments 728 This document was inspired by a number of conversations in ICANN, 729 IETF, MINC, and private contexts about the future evolution and 730 internationalization of top level domains. Unknown to the author, 731 but unsurprisingly (the general concept should be obvious to anyone 732 even slightly skilled in the relevant technologies), the concept has 733 been apparently developed independently in other groups but, as far 734 as this author knows, not written up for general comment. 735 Discussions within, and about, the ICANN IDN Committee were 736 particularly helpful, although several of the participants in that 737 committee may be surprised about where those discussions led. Email 738 correspondence with several people after the first version of this 739 document was posted, notably Richard Hill, Paul Hoffman, Lee 740 XiaoDong, and Soobok Lee, led to considerable clarification in the 741 subsequent versions. The author is particularly grateful to Paul 742 Hoffman for extensive comments and additional text for the third 743 version and to Patrik Faltstrom, Joel Halpern, Sam Hartman, and Russ 744 Housley for suggestions incorporated into the final one. 746 The first draft version of this document was posted on October 21, 747 2002. 749 9. References 751 [ISO10646] 752 International Organization for Standardization, 753 "Information Technology - Universal Multiple-octet coded 754 Character Set (UCS) - Part 1: Architecture and Basic 755 Multilingual Plane", ISO Standard 10646-1, May 1993. 757 [ISO3166] International Organization for Standardization, "Codes for 758 the representation of names of countries and their 759 subdivisions -- Part 1: Country codes", ISO Standard 3166- 760 1:1977, 1997. 762 [MIME] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet 763 Mail Extensions): Mechanisms for Specifying and Describing 764 the Format of Internet Message Bodies", RFC 1341, 765 June 1992. 767 Updated and replaced by Freed, N. and N. Borenstein, 768 "Multipurpose Internet Mail Extensions (MIME) Part One: 769 Format of Internet Message Bodies", RFC2045, November 770 1996. Also, Moore, K., "Representation of Non-ASCII Text 771 in Internet Message Headers", RFC 1342, June 1992. 772 Updated and replaced by Moore, K., "MIME (Multipurpose 773 Internet Mail Extensions) Part Three: Message Header 774 Extensions for Non-ASCII Text", RFC 2047, November 1996. 776 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 777 STD 13, RFC 1034, November 1987. 779 [RFC1035] Mockapetris, P., "Domain names - implementation and 780 specification", STD 13, RFC 1035, November 1987. 782 [RFC1123] Braden, R., "Requirements for Internet Hosts - Application 783 and Support", STD 3, RFC 1123, October 1989. 785 [RFC1480] Cooper, A. and J. Postel, "The US Domain", RFC 1480, 786 June 1993. 788 [RFC1591] Postel, J., "Domain Name System Structure and Delegation", 789 RFC 1591, March 1994. 791 [RFC2535] Eastlake, D., "Domain Name System Security Extensions", 792 RFC 2535, March 1999. 794 [RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection", 795 RFC 2672, August 1999. 797 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 798 Internationalized Strings ("stringprep")", RFC 3454, 799 December 2002. 801 [RFC3467] Klensin, J., "Role of the Domain Name System (DNS)", 802 RFC 3467, February 2003. 804 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 805 "Internationalizing Domain Names in Applications (IDNA)", 806 RFC 3490, March 2003. 808 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 809 Profile for Internationalized Domain Names (IDN)", 810 RFC 3491, March 2003. 812 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 813 for Internationalized Domain Names in Applications 814 (IDNA)", RFC 3492, March 2003. 816 [RFC3696] Klensin, J., "Application Techniques for Checking and 817 Transformation of Names", RFC 3696, February 2004. 819 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 820 Identifiers (IRIs)", RFC 3987, January 2005. 822 Author's Address 824 John C Klensin 825 1770 Massachusetts Ave, #322 826 Cambridge, MA 02140 827 USA 829 Phone: +1 617 491 5735 830 Email: john-ietf@jck.com 832 Intellectual Property Statement 834 The IETF takes no position regarding the validity or scope of any 835 Intellectual Property Rights or other rights that might be claimed to 836 pertain to the implementation or use of the technology described in 837 this document or the extent to which any license under such rights 838 might or might not be available; nor does it represent that it has 839 made any independent effort to identify any such rights. Information 840 on the procedures with respect to rights in RFC documents can be 841 found in BCP 78 and BCP 79. 843 Copies of IPR disclosures made to the IETF Secretariat and any 844 assurances of licenses to be made available, or the result of an 845 attempt made to obtain a general license or permission for the use of 846 such proprietary rights by implementers or users of this 847 specification can be obtained from the IETF on-line IPR repository at 848 http://www.ietf.org/ipr. 850 The IETF invites any interested party to bring to its attention any 851 copyrights, patents or patent applications, or other proprietary 852 rights that may cover technology that may be required to implement 853 this standard. Please address the information to the IETF at 854 ietf-ipr@ietf.org. 856 Disclaimer of Validity 858 This document and the information contained herein are provided on an 859 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 860 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 861 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 862 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 863 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 864 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 866 Copyright Statement 868 Copyright (C) The Internet Society (2005). This document is subject 869 to the rights, licenses and restrictions contained in BCP 78, and 870 except as set forth therein, the authors retain all their rights. 872 Acknowledgment 874 Funding for the RFC Editor function is currently provided by the 875 Internet Society.