idnits 2.17.1 draft-klensin-reg-guidelines-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1214. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1191. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1198. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1204. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 810 has weird spacing: '...strants would...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 17, 2005) is 6891 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.hoffman-registration' is defined on line 1167, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (Obsoleted by RFC 5891) ** Downref: Normative reference to an Informational RFC: RFC 3743 -- Possible downref: Non-RFC (?) normative reference: ref. 'IESG-IDN' -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-IDN' -- Possible downref: Non-RFC (?) normative reference: ref. 'IANA-language-registry' ** Downref: Normative reference to an Unknown state RFC: RFC 952 ** Obsolete normative reference: RFC 3066 (Obsoleted by RFC 4646, RFC 4647) -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode32' -- Possible downref: Non-RFC (?) normative reference: ref. 'Drucker' ** Obsolete normative reference: RFC 3536 (Obsoleted by RFC 6365) -- Possible downref: Normative reference to a draft: ref. 'I-D.hoffman-registration' Summary: 10 errors (**), 0 flaws (~~), 4 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft May 17, 2005 4 Expires: November 18, 2005 6 Suggested Practices for Registration of Internationalized Domain Names 7 (IDN) 8 draft-klensin-reg-guidelines-08.txt 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on November 18, 2005. 35 Copyright Notice 37 Copyright (C) The Internet Society (2005). 39 Abstract 41 This document explores the issues in registration of 42 internationalized domain names (IDNs). The basic IDN definition 43 potentially allows a very large number of possible characters in 44 domain names, and this richness may lead to serious user confusion 45 about similar-looking names. To avoid this confusion, it is 46 necessary for the IDN registration process to impose rules that 47 disallow some otherwise-valid name combinations. This document 48 suggests a set of mechanisms that registries might use to define and 49 implement such rules, including adaptation of methods developed for 50 Chinese, Japanese, and Korean domain names to other languages and 51 scripts. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 3 57 1.2 The Nature and Status of these Recommendations . . . . . . 4 58 1.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 59 1.3.1 Languages and Scripts . . . . . . . . . . . . . . . . 5 60 1.3.2 Characters, Variants, Registrations, and Other 61 Issues . . . . . . . . . . . . . . . . . . . . . . . . 6 62 1.3.3 Confusion, Fraud, and Cybersquatting . . . . . . . . . 8 63 1.4 A Review of the JET Guidelines . . . . . . . . . . . . . . 8 64 1.4.1 JET Model . . . . . . . . . . . . . . . . . . . . . . 8 65 1.4.2 Reserved Names and Label Packages . . . . . . . . . . 9 66 1.5 Languages, Scripts, and Variants . . . . . . . . . . . . . 9 67 1.5.1 Languages and Scripts . . . . . . . . . . . . . . . . 9 68 1.5.2 Variant Selection . . . . . . . . . . . . . . . . . . 11 69 1.6 Variants are not a Universal Remedy . . . . . . . . . . . 13 70 1.7 Reservations and Exclusions . . . . . . . . . . . . . . . 13 71 1.7.1 Sequence Exclusions for Valid Characters . . . . . . . 13 72 1.7.2 Character Pairing Issues . . . . . . . . . . . . . . . 13 73 1.8 The Registration Bundle . . . . . . . . . . . . . . . . . 14 74 1.8.1 Definitions and Structure . . . . . . . . . . . . . . 14 75 1.8.2 Application of the Registration Bundle . . . . . . . . 14 76 2. Some Implications of This Approach . . . . . . . . . . . . . . 15 77 3. Required Modifications to JET Model Needed Under Some of 78 the Models Above . . . . . . . . . . . . . . . . . . . . . . . 16 79 4. Conclusions and Recommendations About the General Approach . . 17 80 5. A Model Table Format . . . . . . . . . . . . . . . . . . . . . 18 81 6. A Model Label Registration Procedure: "CreateBundle" . . . . . 19 82 6.1 Description of the CreateBundle Mechanism . . . . . . . . 19 83 6.2 The "no-variants" Case . . . . . . . . . . . . . . . . . . 20 84 6.3 CreateBundle and Nameprep Mapping . . . . . . . . . . . . 21 85 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 86 8. Internationalization Considerations . . . . . . . . . . . . . 22 87 9. Security Considerations . . . . . . . . . . . . . . . . . . . 23 88 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23 89 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 90 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 25 91 Intellectual Property and Copyright Statements . . . . . . . . 26 93 1. Introduction 95 1.1 Background 97 The IDNA (Internationalized Domain Names in Applications) 98 specification [RFC3490] defines the basic model for encoding non- 99 ASCII strings in the DNS, and additional specifications ([RFC3491], 100 [RFC3492]) define the mechanisms and tables needed to support it. As 101 work on these specifications neared completion, it became apparent 102 that it would be desirable for registries to impose additional 103 restrictions on the names that could actually be registered (e.g., 104 see [IESG-IDN] and [ICANN-IDN]) as a means of reducing potential 105 confusion among characters that were similar in some way. This 106 document explores these IDN (international domain name) registration 107 issues and suggests a set of mechanisms that IDN registries might 108 use. Registration restrictions are part of a long tradition. For 109 example, while the original DNS specifications [RFC1035] permitted 110 any string of octets to be used in a DNS label, they also recommended 111 the use of a much more restricted subset, one that was derived from 112 the much older "hostname" rules [RFC0952] and defined by the "LDH" 113 convention (for the three permitted types of characters, letters, 114 digits, and the hyphen). Enforcement of those restricted rules in 115 registrations was the responsibility of the registry or domain 116 administrator. They were not embedded in the DNS protocol itself, 117 although some applications protocols, notably those concerned with 118 electronic mail, did impose and then enforce similar rules. 120 If there are no constraints on registration in a zone, people can 121 register characters that increase the risk of misunderstandings, 122 cybersquatting, and other forms of confusion. That a similar 123 situation existed even before the introduction of IDNA is exemplified 124 by domain names such as example.com and examp1e.com (note that the 125 latter domain contains the digit "1" instead of the letter "l"). 127 For non-ASCII names (so-called "internationalized domain names" or 128 "IDNs"), the problem was more complicated than that which led to the 129 LDH (hostname) rules. In the earlier situation, all protocols, 130 hosts, and DNS zones used ASCII exclusively in practice, so the LDH 131 restriction could reasonably be applied uniformly across the 132 Internet. With the introduction of a very large character 133 repertoire, and with different geographical and political locations 134 and languages having requirements for different collections of 135 characters, the optimal registration restrictions became, not a 136 global matter, but ones that were different in different areas and, 137 hence, in different DNS zones. 139 For some human languages, there are characters and/or strings that 140 have equivalent or near-equivalent usages. If someone is allowed to 141 register a name with such a character or string, the registry might 142 want to automatically associate all of the names that have the same 143 meaning with the registered name. The registry might also decide 144 whether the names that are associated with, or generated by, one 145 registration should, as a group or individually, go into the zone or 146 be blocked from registration by different parties. 148 To date, the best-developed system for handling registration 149 restrictions for IDNs is the JET Guidelines for Chinese, Japanese, 150 and Korean [RFC3743], the so-called "CJK" languages. That system is 151 limited to those languages and, in particular, to their common script 152 base. Those languages are also the best-known and most widely-used 153 ones in the world whose writing system is constructed on 154 "ideographic" or "pictographic" principles. This document explores 155 the principles behind the JET guidelines. It then examines some of 156 the issues that might arise in trying to adapt them to alphabetic 157 languages, i.e., ones who characters primarily represent sounds, 158 rather than meanings. 160 This document describes five things: 162 1. The general background and considerations for non-ASCII scripts 163 in names. Just as the JET Guidelines contain some suggestions 164 that may not be applicable to alphabetic scripts, some of the 165 suggestions here, especially the more specific ones, may be 166 applicable to some scripts and not others 168 2. Suggested practices for describing character variants 170 3. A method for using a zone's character variants to determine which 171 names should be associated with a registration 173 4. A format for publishing a zone's table of character variants. 174 Such tables are referred to below simply as "the table". 176 5. A model algorithm for name registration given the presence of 177 language tables. 179 1.2 The Nature and Status of these Recommendations 181 The document makes recommendations for consideration by registries 182 and, where relevant, those who coordinate them and use their 183 services. None of the recommendations are intended to be normative. 184 Indeed, the intent of the document is to illustrate a framework from 185 which variations to meet the needs of particular registries and their 186 processing of particular languages can be developed. Of course, if 187 registries make similar decisions and utilize similar tools, it may 188 reduce costs and confusion -- both between registries and for users 189 and registrars who have relationships with more than one domain. 191 1.3 Terminology 193 1.3.1 Languages and Scripts 195 This document uses the term "language" in what may be, to many 196 readers, an odd way. Neither this specification, nor IDNA, nor the 197 DNS are directly concerned with natural language, but only about the 198 characters that make up a given label. In some respects, the term 199 "script", as used in the character coding community, might be more 200 appropriate. However, different subsets of the same script may be 201 used with different languages and the same language may be written 202 using different characters (or even completely different scripts) in 203 different locations, so that term is not precisely correct either. 204 Long-standing confusion has also resulted by the fact that most 205 scripts are, informally at least, named after one of the languages 206 written in them: "Chinese" describes both a language and a collection 207 of characters also used in writing Japanese, Korean, and, at least 208 historically, some other languages; "Latin" describes both a 209 language, the characters used to write that language, and, often 210 characters used to write a number of contemporary languages that are 211 derived from or similar to those used to write Latin; the script used 212 to write the Arabic language is called "Arabic" but is also used 213 (typically with some additions or deletions) to write a number of 214 other languages, and so on. Situations in which a script has a 215 clearly-defined name independent of the name of a language are the 216 exception, rather than the rule; examples include Hangul, used to 217 write Korean, Katakana and Hiragana, used to write Japanese, and a 218 few others. And some scholars have historically used "Roman" or 219 "Roman-derived" in an attempt to distinguish between a script and the 220 Latin language. 222 The term "language" is hence used in this document in the informal 223 sense of a written language and is defined, for this purpose, by the 224 characters used to write it. In this context, a "language" is 225 defined by the combination of a code (see Section 1.4.1) and an 226 authority that has chosen to use that code and establish a character- 227 listing for it. Authorities are normally TLD registries (see 228 Section 7 and [IANA-language-registry]), but it is expected that they 229 will find appropriate experts and that advice from language and 230 script experts selected by international neutral bodies will also 231 become part of the registration system. In addition, as discussed 232 below in Section 7, registries may conclude that the best interests 233 of registrants, stakeholders, and the Internet community would be 234 served by constructing "language tables" that mix scripts and 235 characters in ways that conform to no known language. Conventions 236 should be developed for such registrations that do not misleadingly 237 reflect specific language codes. 239 1.3.2 Characters, Variants, Registrations, and Other Issues 241 1. Characters in this document are given as their Unicode codepoints 242 in U+xxxx format, with their official names, or both. 244 2. The following terms are used in this document. 246 * A "string" is an sequence of one or more characters. 248 * This document discusses characters that may have equivalent or 249 near-equivalent characters or strings. The "base character" 250 is the character that has zero or more equivalents. In the 251 JET Guidelines, base characters are referred to as "valid 252 characters". In a table with variants, as described in 253 Section 5, the base characters occupy the first column. 254 Normally (and always if the recommendation of Section 6.3 is 255 adopted) the base characters will be the characters that 256 appear in registration requests from registrants; all other 257 character will be considered to make the registration attempt 258 invalid. 260 * The "variant(s)" are the character(s) and/or string(s) that 261 are treated as equivalent to the base character. Note that 262 these might not be true equivalent characters: a particular 263 original character may be a base character with a mapping to a 264 particular variant character, but that variant character may 265 not have a mapping to the original base character and, indeed, 266 the variant character may not appear in the base character 267 list, and hence may not be valid for use in a registration. 268 Usually, characters or strings to be designated as variants 269 are considered either equivalent or sufficiently similar (by 270 some registry-specific definition) that confusion between them 271 and the base character might occur. 273 * The "base registration" is the single name that the registrant 274 requested from the registry. The JET Guidelines use the term 275 "label string" for this name. 277 * A label (or "name") is described as "registered" if it is 278 actually entered into a domain (i.e., a zone file) by the 279 registry, so that it can be accessed and resolved using 280 standard DNS tools. The JET Guidelines describe a 281 "registered" label as "activated". However, some domains use 282 a slightly different registration logic in which a name can be 283 registered with the registrar, if one is involved, and with 284 the registry but not actually entered into the zone file until 285 an additional activation or delegation step occurs. This 286 document does not make that distinction, but is compatible 287 with it. 289 As specified in the IDNA Standard, the name actually placed in 290 the zone file is always the internal ("punycode") form. There 291 is no provision for actually entering any other form of an IDN 292 into the DNS. It remains controversial, with different 293 registrars and registries having adopted different policies, 294 as to whether the registration, as submitted by the 295 registrant, is in the form of 296 The native-script name, either in UTF-8 or in some coding 297 specified by the registrar. 298 The internal-form ("punycode") name. 299 Both forms of the name together, so that the registrar and 300 registry can verify the intended translation. 301 If some variant system is used, it is almost certain to be 302 necessary that the native-script form of the requested name be 303 available to the registry. 305 * A "registration bundle" is the set of all labels that comes 306 from expanding the base characters for a single name into 307 their variants. The presence of a label in a registration 308 bundle does not imply that it is registered. In the JET 309 Guidelines, a registration bundle is called an "IDN Package". 311 * A "reserved label" is a label in a registration bundle that is 312 not actually registered. 314 * A "registry" is the administrative authority for a DNS zone. 315 That is, the registry is the body that enforces, and typically 316 makes, policies that are used in a particular zone in the DNS. 318 * "Coded Character Set" ("CCS") is a term for a list of 319 characters and the code positions assigned to them. ASCII and 320 Unicode are CCSs. 322 * A "language" is something spoken by humans, independent of how 323 it is written or coded. ISO Standard 639 and IETF BCP 47 (RFC 324 3066) [RFC3066] list and define codes for identifying 325 languages. 327 * A "script" is a collection of characters (glyphs, independent 328 of coding) that are used together, typically to represent one 329 or more languages. Note that the script for one language may 330 heavily overlap the script for another. This does not imply 331 that they have identical scripts. 333 * "Charset" is an IETF-invented term to describe, more or less, 334 the combination of a script, a CCS that encodes that script, 335 and rules for serializing the bytes when those are stored on a 336 computer or transmitted over the network. 338 The last four of these definitions are redundant with, but 339 deliberately somewhat less precise than, the definitions in 340 [RFC3536], which also provides sources. The two sets of definitions 341 are intended to be consistent. 343 1.3.3 Confusion, Fraud, and Cybersquatting 345 The term "confusion" is used very generically in this document to 346 cover the entire range from accidental user misperception of the 347 relationship between characters with some characteristic in common 348 (typically appearance, sound, or meaning) to cybersquatting and 349 [other] deliberate fraudulent attempts to exploit those relationships 350 or others based on the nature of the characters. 352 1.4 A Review of the JET Guidelines 354 1.4.1 JET Model 356 In the JET Guidelines model, a prospective registrant approaches the 357 registry for a zone (perhaps through an intermediate registrar) with 358 a candidate base registration -- a proposed name to be registered -- 359 and a list of languages in which that name is to be interpreted. The 360 languages are defined according to the fairly high-resolution coding 361 of [RFC3066] -- Chinese as used on the mainland of the People's 362 Republic of China ("zh-cn") can, at registry option, consist of a 363 somewhat different list of characters (code points) and be 364 represented by a separate table compared to Chinese as used in Taiwan 365 ("zh-tw"). 367 The design of the JET Guidelines took one important constraint as a 368 basis: IDNA was treated as a firm standard. A procedure that 369 modified some portion of the IDNA functions, or was a variant on 370 them, was considered a violation of those standards and should not be 371 encouraged (or, probably, even permitted). 373 Each registry is expected to construct (or obtain) a table for each 374 language it considers relevant and appropriate. These tables list, 375 for the particular zone, the characters permitted for that language. 376 If a character does not appear as a "valid code point" (called a 377 "base character" in the rest of this document) in that table, then a 378 name containing it cannot be registered. If multiple languages are 379 listed for the registration, then the character must appear in the 380 tables for each of those languages. 382 The tables may also contain columns that specify alternate or variant 383 forms of the valid character. If these variants appear, they are 384 used to synthesize labels that are alternatives to the original one. 385 These labels are all reserved and can be registered or "activated" 386 (placed into the DNS) only by the action or request of the original 387 registrant; some (the "preferred variant labels") are typically 388 registered automatically. The zone is expected to establish 389 appropriate policies for situations in which the variant forms of one 390 label conflict with already-reserved or already-registered labels. 392 Most of these concepts were introduced because of concerns about 393 specific issues with CJK characters, beginning from the requirement 394 that the use of Simplified Chinese by some registrants and 395 Traditional Chinese by others not be permitted to create confusion or 396 opportunities for fraud. While they may be applicable to registry 397 tables constructed for alphabetic scripts, the translation should be 398 done with care, since many analogies are not exact. 400 Some of the important issues are discussed in the sections that 401 follow. The JET model may be considered as a carefully-worked-out, 402 but specialized to CJK characters, variation on the model and method 403 presented by the rest of this document. Other languages or scripts 404 may require other variations. 406 1.4.2 Reserved Names and Label Packages 408 A basic assumption of the JET model is that, if the evolution of 409 specific characters or the properties of Unicode ([Unicode], 410 [Unicode32]) or IDNA cause two strings to appear similar enough to 411 cause confusion, either or both should be registered by the same 412 party or one of them should become unregisterable. The definition of 413 "appear similar enough" will differ for different cultures and 414 circumstances -- and hence DNS zones -- but the principle is fairly 415 general. In the JET model, all of the "variant" strings are 416 identified, some are registered into the DNS automatically, and 417 others are simply reserved and can be registered, if at all, only by 418 the original registrant. Other zones might find other policies 419 appropriate. For example, a zone might conclude that having similar 420 strings registered in the DNS was undesirable. If so, the list of 421 variant labels would be used only to build a list of names that would 422 be reserved and prohibited from being registered. 424 1.5 Languages, Scripts, and Variants 426 1.5.1 Languages and Scripts 428 Conversations about scripts -- collections of characters associated 429 with particular languages -- are common when discussing character 430 sets and codes. But the boundaries between one script and another 431 are not well-defined. The Unicode Standard ([Unicode], [Unicode32]), 432 for example, does not define them at all, even though it is 433 structured in terms of usually-related blocks of characters. The 434 issue is complicated by the common origin of most alphabetic scripts 435 in contemporary use in the world today (see, for example, [Drucker]). 436 Because of that history, certain characters (or, more precisely, 437 symbols representing characters) appear in the scripts associated 438 with multiple languages, sometimes with very different sounds or 439 meanings. This differs from the CJK situation in which, if a 440 character appears in more than one of the relevant languages, it will 441 usually have the same interpretation in each one. For the subset of 442 characters that actually are ideographs or pictographs, pronunciation 443 is expected to vary widely while meaning is preserved. At least in 444 part because of that similarity of meaning, it made sense in the JET 445 case to permit a registration to specify multiple languages, to 446 verify that the characters in the label string (the requested "Base 447 registration") were valid for each, and then to generate variant 448 labels using each language in turn. For many alphabetic languages, 449 it may be more sensible to prohibit the label string submitted for 450 registration from being associated with more than one language. 451 Indeed, "one label, one language" has been suggested as an important 452 barrier against common sources of "look-alike" confusion. For 453 example, the imposition of that rule in a zone would prevent the 454 insertion of a few Greek or Cyrillic characters with shapes identical 455 to the Latin ones into what was otherwise a Latin-based string. For 456 a particular table, the list of valid characters may be thought of as 457 the script associated with the relevant language, with the 458 understanding that the table design does not prevent the same 459 character from appearing in the tables for multiple languages. 461 Indeed, this notion of a locally, and specifically-identified, script 462 can be turned around: while the tables are referred to as "language 463 tables", they are associated with languages only insofar as thinking 464 about the character structure and word forms associated with a given 465 language helps to inform the construction of a table. A country like 466 Finland, for example, might select among 468 o One table each for Finnish, Swedish, and English characters and 469 conventions, permitting a string to be registered in one, two, or 470 all three languages (although a three-language registration would 471 necessarily prohibit any characters that did not appear in all 472 three languages since the label would make little sense 473 otherwise). 475 o One table each, but with a "one label, one language" rule for the 476 zone. 478 o A combined table based on the observation that all three writing 479 systems were based on Roman characters and that the possibilities 480 for confusion that were of interest to the registry would not be 481 reduced by "language" differentiation. This option raises an 482 interesting issue about language labeling as described in 483 Section 1.4.1, see the discussion in Section 7, below. 485 Regardless of what decisions were made about those languages and 486 scripts, if they also decided to permit registrations of labels 487 containing Cyrillic characters, they might have a separate table for 488 them. That table might contain some Roman-derived characters (either 489 as base characters or as variants) just as some CJK tables do. See 490 also Section 2, below. 492 As the JET Guidelines stress, no tables or systems of this type -- 493 even if identified with a language as a means of defining or 494 describing the table -- can assure linguistic or even syntactic 495 correctness of labels with regard to that language. That level of 496 assurance may not be possible without human intervention or at least 497 dictionary lookups of complete proposed labels. It may even not be 498 desirable to attempt that level of correctness (see Section 2). 500 Of course, if any language-based tests or constraints, including "one 501 label, one language", are to be applied to limit the associated 502 sources of confusion, each zone must have a table for each language 503 in which it expects to accept registrations; the notion of a single 504 combined table for the zone is, in the general case, simply 505 unworkable. One could use a single table for the zone if the intent 506 were to impose only minimal restrictions, e.g., to force alphabetic 507 and numeric characters only and exclude symbols and punctuation. 508 That type of restriction might be useful in eliminating some 509 problems, such as those of unreadable labels, but would be unlikely 510 to be very helpful with, e.g., confusion caused by similar-looking 511 characters. 513 1.5.2 Variant Selection 515 The area of character variants is rife with difficulties (and perhaps 516 opportunities). There is no universal agreement about which base 517 characters have variants, or if they do, what those variants are. 518 For example, in some regions of the world and in some languages, 519 LATIN SMALL LETTER O WITH DIAERESIS (U+00F6) and LATIN SMALL LETTER O 520 WITH STROKE (U+00F8) are variants of each other, while in other 521 regions, most people would think that LATIN SMALL LETTER O WITH 522 STROKE has no variants. In some cases, the list of variants is 523 difficult to enumerate. For example, it required several years for 524 the Chinese language community to create variant tables for use with 525 IDNA, and it remains, at the time of this writing, questionable how 526 widely those tables will be accepted among users of Chinese from 527 areas of the world other than those represented by the groups that 528 created them. 530 Thus, the first thing a registry should ask is whether or not any of 531 the characters that they want to permit to be used have variants. If 532 not, the registry's work is much simpler. This is not to say that a 533 registry should ignore variants if they exist: adding variants after 534 a registry has started to take registrations will be nearly as 535 difficult administratively as removing characters from the list of 536 acceptable characters. That is, if a registry later decides that two 537 characters are variants of each other, and there are actively-used 538 names in the zones that differ only on the new variants, the registry 539 might have to transfer ownership of one of the names to a different 540 owner, using some process that is certain to be controversial. 542 This situation in likely to be much easier for areas and zones that 543 use characters that previously did not occur in the DNS at all than 544 it will be for zones in which non-English labels have been registered 545 in ASCII characters for some time, presumably because the language of 546 interest uses additional "Latin" characters with some conventions 547 when only ASCII is available. In the former case, the rules and 548 conventions can be established before any registrations occur. In 549 the latter, there may be conflicts or opportunities for confusion 550 between existing registrations and now-permitted Roman-based 551 characters that do not appear in ASCII. For example, a domain name 552 might exist today that uses the name of a city in Canada spelled as 553 "Montreal". If the zone in which it occurs changes its rules to 554 permit the use of the character LATIN SMALL LETTER E WITH ACUTE 555 (U+00E9), does the name of the city, spelled (correctly) using that 556 character, conflict with the existing domain name registration? 557 Certainly, if both are permitted, and permitted to be registered by 558 separate parties, there are many opportunities for confusion. 560 Of course, zone managers should inform all current registrants when 561 the registration policy for the zone changes. This includes the time 562 at which IDN characters are allowed in the zone the first time, when 563 additional characters are permitted later, and, if it is necessary to 564 change character variant tables, when that occurs. 566 In many languages there are two variants for a character, but one 567 variant is strongly preferred. A registry might only allow the base 568 registration in the preferred form, or it might allow any form for 569 the base registration. If the variant tables are created carefully, 570 the resulting bundles will be the same, but some registries will give 571 special status to the base registration such as its appearance in 572 "Whois" databases. 574 1.6 Variants are not a Universal Remedy 576 It is worth stressing that there are many obvious opportunities for 577 confusion which variant systems, by virtue of being based on 578 processing of individual characters, cannot address. For example, if 579 a language can be written with more than one script, or 580 transliterations of the language into another script are common, 581 variant models are insufficient to prevent conflicting registration 582 of the related forms. Avoiding those types of problems would require 583 different mechanisms, perhaps based on phonetic or natural language 584 processing techniques for the entire proposed base registration. 586 1.7 Reservations and Exclusions 588 1.7.1 Sequence Exclusions for Valid Characters 590 The JET Guidelines are based on processing only single characters. 591 Any processing of pairs or longer sequences of characters are left to 592 what that document describes as "additional processing" -- procedures 593 specifically permitted by the Guidelines but defined by a registry in 594 addition to the variant table processing specified in the Guidelines 595 themselves. A different zone, with different needs, could use a 596 modified version of the table structure, or different types of 597 additional processing, to prohibit, as well as accept, particular 598 sequences of characters by marking them as invalid. Other 599 modifications or extensions might be designed to prevent certain 600 letters from appearing at the beginning or end of labels. The use of 601 regular expressions in the "valid characters" column might be one way 602 to implement these types of restrictions, but there has been no 603 experience so far with that approach. 605 In particular, in some scripts derived from Roman characters, 606 sequences that have historically been typographically represented by 607 single "ligature" or "digraph" characters may also be represented by 608 the separate characters (e.g., "ae" for U+00E6 or "ij" for U+0133). 609 If it is desired to either prohibit these, or to treat them as 610 variants, some extensions to the single-character JET model may be 611 needed (as may be some careful thinking about IDNA (especially 612 nameprep), since some of these combinations are excluded there). 614 1.7.2 Character Pairing Issues 616 Some character pairings -- the use of a character form (glyph) in one 617 language and a different form with the same properties in a related 618 one -- closely approximate the issues with mapping between 619 Traditional and Simplified Chinese although the history is different. 620 For example, it might be useful to have "o" with a stroke (U+00F8) as 621 a variant for "o" with diaeresis above it (U+00F6) (and the 622 equivalent upper-case pair) in a Swedish table, and vice versa in a 623 Norwegian one, or to prohibit one of these characters entirely in 624 each table. In a German table, U+00F8 would presumably be 625 prohibited, while U+00F6 might have "oe" as a variant. Obviously, if 626 the relevant language of registration is unknown, this type of 627 variant matching cannot be applied in any sensible way. 629 1.8 The Registration Bundle 631 1.8.1 Definitions and Structure 633 As one of its critical innovations, the JET model defines an "IDN 634 package", known in this document as a "registration bundle", which 635 consists of the primary registered string (which is used as the name 636 of the bundle), the information about the language table(s) used, the 637 variant labels for that string, and indications of which of those 638 labels are registered in the relevant zone file ("activated" in the 639 JET terminology). Registration bundles are also atomic -- one can 640 not add or remove variant labels from one without unregistering the 641 entire package. A label exists in only one registration bundle at a 642 time; if a new label is registered that would generate a variant that 643 matches one that appears in an existing package, that variant simply 644 is not included in the second package. A subsequent deregistration 645 of the first package does not cause the variant to be added to the 646 second. While it might be possible to change this in other models, 647 the JET conclusion was that other options would be far too complex to 648 implement and operate and would cause many new types of name 649 conflicts. 651 1.8.2 Application of the Registration Bundle 653 A registry has three options for how to handle the case where the 654 registration bundle is non-trivial, i.e., contains more than one 655 label. The policy options are: 657 o Register and resolve all labels in the zone, making the zone 658 information identical to that of the registered label. This 659 option will cause end users to be able to find names with variants 660 more easily, but will result in larger zone files. For some 661 language tables, the zone file could become so large that it could 662 negatively affect the ability of the registry to perform name 663 resolution. If the base registration contains several characters 664 that have equivalents, the owner could end up having to take care 665 of large numbers of zones. For instance, if DIGIT ONE is a 666 variant of LATIN SMALL LETTER L, the owner of the domain name all- 667 lollypops.example.com will have to manage 32 zones. If the intent 668 is to keep the contents of those zones identical, the owner may 669 then face a significant administrative problem. If other concerns 670 dictate short times to live and absolute consistency of DNS 671 responses, the challenges may be nearly impossible. 673 o Block all labels other than the registered label so they cannot be 674 registered in the future. This option does not increase the size 675 of the zone file and provides maximum safety against false 676 positives, but it may cause end users to not be able to find names 677 with variants that they would expect. If the base registration 678 contains characters that have equivalents, Internet users who do 679 not know what base characters were used in the registration will 680 not know what character to type in to get a DNS response. For 681 instance, if DIGIT ONE is a variant of LATIN SMALL LETTER L, and 682 LATIN SMALL LETTER L is a variant of DIGIT ONE, the user who sees 683 "pale.example.com" will not know whether to type a "1" or a "l" 684 after the "pa" in the first label. 686 o Resolve some labels and block some other labels. This option is 687 likely to cause the most confusion with users because including 688 some variants will cause a name to be found, but using other 689 variants will cause the name to be not found. For example, even 690 if people understood that DIGIT ONE and LATIN SMALL LETTER L were 691 variants, a typical DNS user wouldn't know which character to type 692 because they wouldn't know whether this pair were used to register 693 or block the labels. However, this option can be used to balance 694 the desires of the name owner (that every possible attempt to 695 enter their name will work) with the desires of the zone 696 administrator (to make the zone more manageable and possibly to be 697 compensated for greater amounts of work needed for a single 698 registration). For many circumstances, it may be the most 699 attractive option. 701 In all cases, at least the registered label should appear in the 702 zone. It would be almost impossible to describe to name owners why 703 the name that they asked for is not in the zone, but some other name 704 that they now control is. By implication, if the requested label is 705 already registered, the entire registration request must be rejected. 707 2. Some Implications of This Approach 709 Historically, DNS labels were considered to be arbitrary identifier 710 strings, without any inherent meaning. Even in ASCII, there was no 711 requirement that labels form words. Labels that could not possibly 712 represent words in any Romance or Germanic language (the languages 713 that have been written in "Latin" scripts since medieval times or 714 earlier) have actually been quite common. In general, in those 715 languages, words contain at least one vowel and do not have embedded 716 numbers. As a result, a string such as "bc345df" cannot possibly be 717 a "word" in these languages. More generally, the more one moves 718 toward "language"-based registry restrictions, the less it is going 719 to be possible to construct labels out of fanciful strings. While 720 fanciful strings are terrible candidates for "words", they may make 721 very good identifiers. To take a trivial example using only ASCII 722 characters, "rtr32w", "rtr32x", and "rtr32z" might be very good DNS 723 labels for a particular zone and application. However, given the 724 embedded digits and lack of vowels, they, like the "bc345df" example 725 given above, would fail even the most superficial of tests for valid 726 English (or German or French (etc.)) word forms. 728 It is worth noting that several DNS experts have suggested that a 729 number of problems could be solved by prohibiting meaningful names in 730 labels, requiring instead that the labels be random or nonsense 731 strings. If methods similar to those discussed in this document were 732 used to force identifiers to be closer to meaningful words in real 733 languages, the result would be directly contradictory to those 734 "random name" approaches. 736 Interestingly, if one were trying to develop an "only words" system, 737 a rather different -- but very restrictive -- model could be 738 developed using lookups in a dictionary for the relevant language and 739 a listing of valid business names for the relevant area. If a string 740 did not appear in either, it would not be permitted to be registered. 741 Models that require a prior national business listing (or 742 registration) that is identical to the proposed domain name label 743 have historically been used to restrict registrations in some 744 country-code top level domains, so this is not a new idea. On the 745 other hand, if look-alike characters are a concern, even that type of 746 rule (or restriction) would still not avoid the need to consider 747 character variants. 749 Consequently, registries applying the principles outlined in this 750 document should be careful not to apply more severe restrictions than 751 are reasonable and appropriate while, at the same time, being aware 752 of how difficult it usually is to add restrictions at a later time. 754 3. Required Modifications to JET Model Needed Under Some of the Models 755 Above 757 The JET model was designed for CJK characters. The discussion above 758 implies that some extensions to it may be needed to handle the 759 characteristics of various alphabetic scripts and the decisions that 760 might be made about them in different zones. Those extensions might 761 include facilities to process: 763 o Two-character (or more) sequences, such as ligatures and 764 typographic spelling conventions, as variants. 766 o Regular expressions or some other mechanism for dealing with 767 string positions of characters (e.g., characters that must, or 768 must not, appear at the beginning or end of strings). 770 o Delimiter breaks to permit multiple languages to be used, 771 separately, within the same label. E.g., is it possible to define 772 a label as consisting of two or more components, each in a 773 different language, with some particular delimiter used to define 774 the boundaries of the components? 776 4. Conclusions and Recommendations About the General Approach 778 After examining the implications of the potential use of the full 779 range of characters permitted by IDNA in DNS labels, multiple groups, 780 included IESG [IESG-IDN] and ICANN [ICANN-IDN] have concluded that 781 some restrictions are needed to prevent many forms of user confusion 782 about the actual structure of a name or the word, phrase, or term 783 that it appears to spell out. The best way to approach such 784 restrictions appears to draw from the language and culture of the 785 community of registrants and users in the relevant zone: if 786 particular characters are likely to be surprising or unintelligible 787 to both of those groups, it is probably wise to not permit them to be 788 used in registrations. Registration restrictions can be carried much 789 further than restricting permitted characters to a selected Unicode 790 subset. The idea of a reserved "bundle" of related labels permits 791 probably-confusing combinations or sets of characters to be bound 792 together, under the control of a single registrant. While that 793 registrant might still use the package in a way that confused his or 794 her own users (the approach outlined here will not prevent either 795 ill-though-out ideas or stupidity), the possibility of turning 796 potential confusion into a hostile attack would be considerably 797 reduced. 799 At the same time, excessive restrictions may make DNS identifiers 800 less useful for their original, intended, purpose: identifying 801 particular hosts and similar resources on the network in an orderly 802 way. Registries creating rules and policies about what can be 803 registered in particular zones -- whether those are based on the JET 804 Guidelines or the suggestions in this document -- should balance the 805 need for restrictions against the need for flexibility in 806 constructing identifiers. 808 The discussion above provides many options that could be selected, 809 defined, and applied in different ways in different registries 810 (zones). Registrars and registrants would almost certainly prefer 811 systems in which they can predict, at least to a first order 812 approximation, the implications of a particular potential 813 registration to ones in which they cannot. Predictability of that 814 sort probably requires more standards, and less flexibility, than the 815 model itself might suggest. 817 5. A Model Table Format 819 The format of the table is meant to be machine-readable but not 820 human-readable. It is fairly trivial to convert the table into one 821 that can be read by people. 823 Each character in the table is given in the "U+" notation for Unicode 824 characters. The lines of the table are terminated with either a 825 carriage return character (ASCII 0x0D), a linefeed character (ASCII 826 0x0A), or a sequence of carriage return followed by linefeed (ASCII 827 0x0D 0x0A). The order of the lines in the table may or may not 828 matter, depending on how the table is constructed. 830 Comment lines in the table are preceded with a "#" character (ASCII 831 0x2C). 833 Each non-comment line in the table starts with the character that is 834 allowed in the registry and expected to be used in registrations, 835 which is also called the "base character". If the base character has 836 any variants, the base character is followed by a vertical bar 837 character ("|", ASCII 0x7C) and the variant string. If the base 838 character has more than one variant, the variants are separated by a 839 colon (":", ASCII 0x3A). Strings are given with a hyphen ("-", ASCII 840 0x2D) between each character. Comments beginning with a "#" (ASCII 841 0x2C), and may be preceded by spaces (" ", ASCII 0x20). 843 The following is an example of how a table might look. The entries 844 in this table are purposely silly and should not be used by any 845 registry as the basis for choosing variants. For the example, assume 846 that the registry: 848 o allows the FOR ALL character (U+2200) with no variants 850 o allows the COMPLEMENT character (U+2201) which has a single 851 variant of LATIN CAPITAL LETTER C (U+0043) 853 o allows the PROPORTION character (U+2237) which has one variant 854 which is the string COLON (U+003A) COLON (U+003A) 856 o allows the PARTIAL DIFFERENTIAL character (U+2202) which has two 857 variants: LATIN SMALL LETTER D (U+0064) and GREEK SMALL LETTER 858 DELTA (U+03B4) 860 The table contents (after any required header information, see [IANA- 861 language-registry] and the discussion in Section 7 below) would look 862 like: 864 # An example of a table 865 U+2200 866 U+2201|U+0043 867 U+2237|U+003A-U+003A # Note that the variant is a string 868 U+2202|U+0064:U+03B4 # Two variants for the same character 870 Implementers of table processors should remember that there are tens 871 of thousands of characters whose codepoints are greater than 0xFFFF. 872 Thus, any program that assumes that each character in the table is 873 represented in exactly six octets ("U", "+", and four octets 874 representing the character value) will fail with tables that use 875 characters whose value is greater than 0xFFFF. 877 6. A Model Label Registration Procedure: "CreateBundle" 879 This procedure has three inputs: 881 1. the proposed base registration 883 2. the language for the proposed base registration 885 3. the processing table associated with that language 887 The output of the process is either failure (the base registration 888 cannot be registered at all), or a registration bundle that contains 889 one or more labels ( always including the base registration). As 890 described earlier, the registration bundle should be stored with its 891 date of creation so that issues with overlapping elements between 892 bundles can later be resolved on a first-come, first-served basis. 894 There are two steps to processing the registration: 896 1. Check whether the proposed base registration exists in any 897 bundle. If it does, stop immediately with a failure. 899 2. Process the base registration with the mechanism described as 900 "CreateBundle" in Section 6.1, below. 902 Note that the process must be executed only once. The process must 903 not be performed on any output of the process, only on the proposed 904 base registration. 906 6.1 Description of the CreateBundle Mechanism 908 The CreateBundle mechanism determines whether a registration bundle 909 can be created and, if so, populates that bundle with valid labels. 911 During the processing, an "temporary bundle" contains partial labels, 912 that is, labels that are being built and are not complete labels. 913 The partial labels in the temporary bundle consist of strings. 915 The steps are: 917 1. Split the base registration into individual characters, called 918 "candidate characters". Compare every candidate character 919 against the base characters in the table. If any candidate 920 character does not exist in the set of base characters, the 921 system must stop and not register any names (that is, it must not 922 register either the base registration or any labels that would 923 have come from character variants). 925 2. Perform the steps in IDNA's ToASCII sequence for the base 926 registration. If ToASCII fails for the base registration, the 927 system must stop and not register any label (that is, it must not 928 register either the base registration or labels that might have 929 been created from variants of characters contained in it). If 930 ToASCII succeeds, place the base registration into the 931 registration bundle. 933 3. For every candidate character in the base registration, do the 934 following: 935 1. Create the set of characters that consists of the candidate 936 character and any variants. 937 2. For each character in the set from the previous step, 938 duplicate the temporary bundle that resulted from the 939 previous candidate character, and add the new character to 940 the end of each partial label. 941 4. The temporary bundle now contains zero or more labels that 942 consist of Unicode characters. For every label in the temporary 943 bundle, do the following: 945 Process the label with ToASCII to see if ToASCII succeeds. If it 946 does, add the label to the registration bundle. Otherwise, do 947 not process this label from the temporary bundle any further; it 948 will not go into the registration bundle. 950 The result of the processing outlined above is the registration 951 bundle with the base registration and possibly other labels. 953 6.2 The "no-variants" Case 955 It is clear that, for many scripts, registries will choose to create 956 tables without variants, either because variants are clearly not 957 necessary or because they are determined to cause more confusion and 958 overhead than is justified by the circumstances. For those 959 situations the table model of Section 5 becomes a trivial listing of 960 base characters and only the first two steps of CreateBundle 961 (verifying that all candidate character are in the base ("valid") 962 character list and verifying that the resulting characters will 963 succeed in the ToASCII operation) are applicable. Even the second of 964 those steps becomes pro forma if the advice in the next subsection is 965 followed. 967 6.3 CreateBundle and Nameprep Mapping 969 One of the functions of Nameprep, and IDNA more generally, is to map 970 a large number of Unicode characters (code points) into a smaller 971 number to avoid a different, but overlapping, set of confusion 972 problems. For example, when a non-ASCII script makes distinctions 973 between "upper case" and "lower case", nameprep maps the upper case 974 characters to the lower case ones in order to simulate the DNS 975 protocol's rule that ASCII characters are interpreted in a case- 976 insensitive way. Unicode also contains many code points that are 977 typographic variants on each other and the Unicode standard 978 explicitly identifies them that way, e.g., forms with different 979 widths and code points that designate font variations for 980 mathematical uses, and Nameprep maps these onto base characters. 982 While having these mapping functions available during lookup may be 983 quite helpful to users who type equivalent forms, registrations are 984 probably best performed in terms of the IDNA base characters only, 985 i.e., those characters that nameprep will not change. This will have 986 two advantages. 988 o Registrants will never find themselves in the rather confusing 989 position of having submitted one string for registration and 990 finding a different string in the registry database (which could 991 otherwise occur even if the relevant language table does not 992 contain variants). 994 o Those who are interested in what characters are permitted by a 995 given registry will only need to examine the relevant tables, 996 rather than simulating the IDNA algorithm to determine the result 997 of processing particular characters. 999 7. IANA Considerations 1001 Under ICANN (not IETF) direction and management, the IANA has created 1002 a registry for language variant tables. The authoritative 1003 documentation for that registry is in [IANA-language-registry]. 1004 Since the registry exists and is being managed under ICANN direction, 1005 the material that follows is a review of the theory of this registry, 1006 rather than new instructions for IANA. 1008 As described above and suggested in the JET Guidelines, the 1009 registration rules generally require only that 1011 o The application be submitted or endorsed by a TLD registry, to 1012 ensure that someone cares about the particular table. 1014 o The table be identified by the submitting (or endorsing) registry, 1015 a language designation consistent with [RFC3066] or other language 1016 or script designation acceptable to IANA for this purpose, a 1017 version number, and a date. 1019 o Characters listed in the table be identified by Unicode code 1020 points, as discussed above. 1022 o The table format may correspond to that identified in [RFC3743], 1023 or in Section 5 above, or may be some variation on those themes 1024 appropriate to the local processing model (with or without 1025 variants). 1027 This raises some issues that will need to be worked out as 1028 experiences accumulate. For example, more standardization of table 1029 formats would be desirable to make processing by the same computer 1030 tools for different registries and languages possible. But it seems 1031 premature at this time due to differences in languages, processing, 1032 and requirements and lack of experience with them. Similarly, if a 1033 registry concludes that it should use a table that contains 1034 characters from several scripts, it is not clear how such a table 1035 should be designated. If it is identified with a language code 1036 (either according to [RFC3066] or an independent one registered with 1037 IANA) that is likely to just introduce more confusion, especially 1038 given other Internet uses of the language codes. It appears that 1039 some other convention will be needed for those cases and should be 1040 developed (if it has not already been established by the time this 1041 document is published). 1043 8. Internationalization Considerations 1045 This document specifies a model mechanism for registering 1046 Internationalized Domain Names (IDNs) that can be used to reduce 1047 confusion among similar-appearing names. The proposal is designed to 1048 facilitate internationalization while permitting a balance between 1049 internationalization concerns and concerns about keeping the Internet 1050 global and domain name system references unique in the perception of 1051 the user as well as in practice. 1053 9. Security Considerations 1055 Registration of labels in the DNS that contain essentially 1056 unrestricted sequences of arbitrary Unicode characters may introduce 1057 several opportunities for either attacks or simple confusion. Some 1058 of these risks, such as confusion about which character (of several 1059 that look alike) is actually intended, may be associated with the 1060 presentation form of DNS names. Others may be linked to databases 1061 associated with the DNS, e.g., with the difficulty of finding an 1062 entry in a "Whois file" when it is not clear how to enter, or search 1063 for, the characters that make up a name. This document discusses a 1064 family of restrictions on the names that can be registered. 1065 Restrictions of the type described can be imposed by a DNS zone 1066 ("registry"). The document also describes some possible tools for 1067 implementing such restrictions. 1069 While the increased number and types of character made available by 1070 Unicode considerably increases the scale of the potential problems, 1071 the problems addressed by this document are not new. No plausible 1072 set of restrictions will eliminate all problems and sources of 1073 confusion: for example, it has often been pointed out that, even in 1074 ASCII, the characters digit-one ("1") and lower case L ("l") can 1075 easily be confused in some display fonts. But, to the degree to 1076 which security may be aided by sensible risk reduction, these 1077 techniques may be helpful. 1079 10. Acknowledgements 1081 Discussions in the process of developing the JET Guidelines were 1082 vital in developing this document and all of the JET participants are 1083 consequently acknowledged. Attempts to explain some of the issues 1084 uncovered there to, and feedback from, Vint Cerf, Wendy Rickard, and 1085 members of the ICANN IDN Committee were also helpful in the thinking 1086 leading up to this document. 1088 An effort by Paul Hoffman to create a generic specification for 1089 registration restrictions of this type helped to inspire this 1090 document, which takes a somewhat different, more language-oriented, 1091 approach than his initial draft. While the initial version of that 1092 draft indicated that multiple languages (or multiple language tables) 1093 for a single zone were infeasible, more recent versions [I-D.hoffman- 1094 registration] shifted to inclusion of language-based approaches. The 1095 current version of this document incorporates considerable text, and 1096 even more ideas, from those drafts, with Paul Hoffman's generous 1097 permission. 1099 Feedback from several registry operators (of both country code and 1100 generic TLDs) including Edmon Chung and Ram Mohan of Afilias, and 1101 from ICANN and IANA staff (notably Tina Dam and Theresa Swinehart) 1102 about issues encountered in registering tables and designing IDN 1103 implementations resulted in the addition of significant clarifying 1104 text to the current version of the document. 1106 The opinions expressed here are, of course, the sole responsibility 1107 of the author. Some of those whose ideas and comments are reflected 1108 in this document may disagree with the conclusions the author has 1109 drawn from them. 1111 11. References 1113 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 1114 "Internationalizing Domain Names in Applications (IDNA)", 1115 RFC 3490, March 2003. 1117 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 1118 Profile for Internationalized Domain Names (IDN)", 1119 RFC 3491, March 2003. 1121 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 1122 for Internationalized Domain Names in Applications 1123 (IDNA)", RFC 3492, March 2003. 1125 [RFC1035] Mockapetris, P., "Domain names - implementation and 1126 specification", RFC 1035, STD 13, November 1987. 1128 [RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint 1129 Engineering Team (JET) Guidelines for Internationalized 1130 Domain Names (IDN) Registration and Administration for 1131 Chinese, Japanese, and Korean", RFC 3743, April 2004. 1133 [IESG-IDN] 1134 Internet Engineering Steering Group, IETF, "IESG Statement 1135 on IDN", IESG Statement available from 1136 http://www.ietf.org/IESG/STATEMENTS/IDNstatement.txt, 1137 February 2003. 1139 [ICANN-IDN] 1140 Internet Corporation for Assigned Names and Numbers, 1141 "Guidelines for the Implementation of Internationalized 1142 Domain Names, Version 1.0", June 2003. 1144 [IANA-language-registry] 1145 Internet Assigned Numbers Authority, "IDN Language Table 1146 Registry", April 2004. 1148 [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet 1149 host table specification", RFC 952, October 1985. 1151 [RFC3066] Alvestrand, H., "Tags for the Identification of 1152 Languages", BCP 47, RFC 3066, January 2001. 1154 [Unicode] The Unicode Consortium, "The Unicode Standard -- Version 1155 3.0", January 2000. 1157 [Unicode32] 1158 The Unicode Consortium, "Unicode Standard Annex #28: 1159 Unicode 3.2"", March 2002. 1161 [Drucker] Drucker, J., "The Alphabetic Labyrinth: The Letters in 1162 History and Imagination", 1995. 1164 [RFC3536] Hoffman, P., "Terminology Used in Internationalization in 1165 the IETF", RFC 3536, May 2003. 1167 [I-D.hoffman-registration] 1168 Hoffman, P., "A Method for Registering Internationalized 1169 Domain Names", draft-hoffman-idn-reg-02.txt (work in 1170 progress), October 2003. 1172 Author's Address 1174 John C Klensin 1175 1770 Massachusetts Ave, #322 1176 Cambridge, MA 02140 1177 USA 1179 Phone: +1 617 491 5735 1180 Email: john-ietf@jck.com 1182 Intellectual Property Statement 1184 The IETF takes no position regarding the validity or scope of any 1185 Intellectual Property Rights or other rights that might be claimed to 1186 pertain to the implementation or use of the technology described in 1187 this document or the extent to which any license under such rights 1188 might or might not be available; nor does it represent that it has 1189 made any independent effort to identify any such rights. Information 1190 on the procedures with respect to rights in RFC documents can be 1191 found in BCP 78 and BCP 79. 1193 Copies of IPR disclosures made to the IETF Secretariat and any 1194 assurances of licenses to be made available, or the result of an 1195 attempt made to obtain a general license or permission for the use of 1196 such proprietary rights by implementers or users of this 1197 specification can be obtained from the IETF on-line IPR repository at 1198 http://www.ietf.org/ipr. 1200 The IETF invites any interested party to bring to its attention any 1201 copyrights, patents or patent applications, or other proprietary 1202 rights that may cover technology that may be required to implement 1203 this standard. Please address the information to the IETF at 1204 ietf-ipr@ietf.org. 1206 Disclaimer of Validity 1208 This document and the information contained herein are provided on an 1209 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1210 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1211 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1212 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1213 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1214 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1216 Copyright Statement 1218 Copyright (C) The Internet Society (2005). This document is subject 1219 to the rights, licenses and restrictions contained in BCP 78, and 1220 except as set forth therein, the authors retain all their rights. 1222 Acknowledgment 1224 Funding for the RFC Editor function is currently provided by the 1225 Internet Society.