idnits 2.17.1 draft-jseng-idn-admin-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-03-29) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 15 longer pages, the longest (page 1) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'IDN-WG' is mentioned on line 1324, but not defined == Missing Reference: 'STRINGPREP' is mentioned on line 124, but not defined == Missing Reference: 'ISO639' is mentioned on line 1332, but not defined == Missing Reference: 'WIPO-UDRP' is mentioned on line 394, but not defined == Missing Reference: 'IESG-IDN' is mentioned on line 1328, but not defined == Unused Reference: 'RFC3066' is defined on line 1285, but no explicit reference was found in the text == Unused Reference: 'UNIHAN' is defined on line 1310, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2234 (ref. 'ABNF') (Obsoleted by RFC 4234) ** Obsolete normative reference: RFC 3066 (Obsoleted by RFC 4646, RFC 4647) ** Obsolete normative reference: RFC 3490 (ref. 'IDNA') (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (ref. 'NAMEPREP') (Obsoleted by RFC 5891) -- Possible downref: Non-RFC (?) normative reference: ref. 'IS10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNIHAN' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO7098' Summary: 10 errors (**), 0 flaws (~~), 9 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT Editors: James SENG 2 draft-jseng-idn-admin-05.txt John C KLENSIN, Wendy RICKARD 3 17 October 2003 Authors: K. KONISHI 4 Expires April 2004 K. HUANG, H. QIAN, Y. KO 6 Internationalized Domain Names Registration and Administration 7 Guideline for Chinese, Japanese, and Korean 9 Status of This Memo 11 This document is an Internet Draft and is in full conformance 12 with all provisions of Section 10 of RFC2026 except that the 13 right to produce derivative works is not granted. 15 Internet Drafts are working documents of the Internet 16 Engineering Task Force (IETF), its areas, and its working 17 groups. Note that other groups may also distribute working 18 documents as Internet Drafts. 20 Internet Drafts are draft documents valid for a maximum of 21 six months and may be updated, replaced, or rendered obsolete by 22 other documents at any time. It is inappropriate to use Internet 23 Drafts as reference material or to cite them other than as 24 "works in progress." 26 The list of current Internet Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 Abstract 34 Achieving internationalized access to domain names raises many complex 35 issues. These are associated not only with basic protocol design--such 36 as how names are represented on the network, compared, and converted to 37 appropriate forms--but also with issues and options for deployment, 38 transition, registration, and administration. 40 The IETF Standards for Internationalized Domain Names, known as "IDNA", 41 focuses on access to domain names in a range of scripts that is broader 42 in scope than the original ASCII. The development process made it clear 43 that use of characters with similar appearances and/or interpretations 44 created potential for confusion, as well as difficulties in deployment 45 and transition. The conclusion was that, while those issues were 46 important, they, could best be addressed administratively rather than 47 through restrictions embedded in the protocols. This document defines a 48 set of guidelines for applying restrictions of that type for CJK scripts 49 and the zones that use them and, perhaps, the beginning of a framework 50 for thinking about other zones, languages, and scripts. 52 Table of Contents 54 1. Introduction 56 2. Definitions, Context, and Notation 57 2.1. Definitions and Context 58 2.2. Notation for Ideographs and Other Non-ASCII CJK Characters 60 3. Scope of the Administrative Guidelines 61 3.1. Principles Underlying These Guidelines 62 3.2. Registration of IDL 63 3.2.1. Using the Language Variant Table 64 3.2.2. IDL Package 65 3.2.3. Procedure for Registering IDLs 66 3.3. Deletion and Transfer of IDL and IDL Package 67 3.4. Activation and Deactivation of IDL Variants 68 3.4.1. Activation Algorithm 69 3.4.2. Deactivation Algorithm 70 3.5. Managing Changes in Language Associations 71 3.6. Managing Changes to Language Variant Tables 73 4. Examples of Guideline Use in Zones 75 5. Syntax Description for the Language Variant Table 76 5.1 ABNF Syntax 77 5.2. Comments and Explanation of Syntax 79 6. Security Considerations 81 7. Index to Terminology 83 8. Acknowledgments 85 9. Authors' Addresses 87 10. Normative References 89 11. Nonnormative References 91 1. Introduction 93 Domain names form the fundamental naming architecture of the Internet. 94 Countless Internet protocols and applications rely on them, not just for 95 stability and continuity, but also to avoid ambiguity. They were 96 designed to be identifiers without any language context. However, as 97 domain names have become visible to end users through Web URLs and e- 98 mail addresses, the strings in domain-name labels are being increasingly 99 interpreted as names, words, or phrases. It is likely that users will do 100 the same with languages of differing character sets--such as Chinese, 101 Japanese and Korean (CJK)--in which many words or concepts are 102 represented using short sequences of characters. 104 The introduction of what are called Internationalized Domain Names (IDN) 105 amplifies both the difficulty of putting names into identifiers and the 106 confusion that exists between scripts and languages. Character symbols 107 that appear (or actually are) identical, or that have similar or 108 identical semantics, but that are assigned the different code points, 109 further increase the potential for confusion. DNS internationalization 110 also affects a number of Internet protocols and applications and creates 111 additional layers of complexity in terms of technical administration and 112 services. Given the added complications of using a much broader range of 113 characters than the original small ASCII subset, precautions are 114 necessary in the deployment of IDNs in order to minimize confusion and 115 fraud. 117 The IETF IDN Working Group [IDN-WG] addressed the problem of handling 118 the encoding and decoding of Unicode strings into and out of Domain Name 119 System (DNS) labels with the goal that its solution would not put the 120 operational DNS at any risk. Its work resulted in one primary protocol 121 and three supporting ones, respectively: 123 1. Internationalizing Host Names in Applications [IDNA] 124 2. Preparation of Internationalized Strings [STRINGPREP] 125 3. A Stringprep Profile for Internationalized Domain Names 126 [NAMEPREP] 127 4. Punycode [PUNYCODE] 129 IDNA--which calls on the others--normalizes and transforms strings that 130 are intended to be used as IDNs. In combination, the four provide the 131 minimum functions required for internationalization, such as performing 132 case mappings, eliminating character differences that would cause severe 133 problems, and specifying matching (equality). They also convert between 134 the resulting Unicode code points and an ASCII-based form that is more 135 suitable for storing in actual DNS labels. In this way, the IDNA 136 transformations improve a user's chances of getting to the correct IDN. 138 Addressing the issues around differing character sets, a primary 139 consideration and administrative challenge involves region-specific 140 definitions, interpretations, and the semantics of strings to be used in 141 IDNs. A Unicode string may have a specific meaning as a name, word, or 142 phrase in a particular language but that meaning could vary depending on 143 the country, region, culture, or other context in which the string is 144 used. It might also have different interpretations in different 145 languages that share some or all of the same characters. Therefore, 146 individual zones and zone administrators may find it necessary to impose 147 restrictions and procedures to reduce the likelihood of confusion--and 148 instabilities of reference--within their own environments. 150 Over the centuries, the evolution of CJK characters--and the differences 151 in their use in different languages and even in different regions where 152 the same language is spoken--has given rise to the idea of "variants", 153 wherein one conceptual character can be identified with several 154 different Code Points in character sets for computer use. This document 155 provides a framework for handling such variants while minimizing the 156 possibility of serious user confusion in the obtaining or use of domain 157 names. However, the concept of variants is complex and may require many 158 different layers of solution, this guideline offers only one of the 159 solution components. It is not sufficient by itself to solve the whole 160 problem, even with zone-specific tables as described below. 162 Additionally, because of local language or writing-system differences, 163 it is impossible to create universally accepted definitions for which 164 potential variants are the same and which are not the same. It is even 165 more difficult to define a technical algorithm to generate variants that 166 are linguistically accurate--that is, that the variant forms produced 167 make as much sense in the language as the originally specified forms. It 168 is also possible that variants generated may have no meaning in the 169 associated language or languages. The intention is not to generate 170 meaningful "words" but to generate similar variants to be reserved. So 171 even though the method described in this document may not always be 172 linguistically accurate--or need to be--it increases the chances of 173 getting the right variants while accepting the inherent limitations of 174 the DNS and the complexities of human language. 176 This document outlines a model for such conventions for zones in which 177 labels that contain CJK characters are to be registered and a system for 178 implementing that model. It provides a mechanism that allows each zone 179 to define its own local rules for permitted characters and sequences and 180 the handling of IDNs and their variants. 182 The document is an effort of the Joint Engineering Team (JET), a group 183 composed of members of CNNIC, TWNIC, KRNIC, and JPNIC as well as other 184 individual experts. It offers guidelines for zone administrators-- 185 including but not limited to registry operators and registrars?and 186 information for all domain names holders on the administration of domain 187 names that contain characters drawn from Chinese, Japanese, and Korean 188 scripts. Other language groups are encouraged to develop their own 189 guidelines as needed, based on these guidelines if that is helpful. 191 2. Definitions, Context, and Notation 193 2.1. Definitions and Context 195 This document uses a number of special terms. In this section, 196 definitions and explanations are grouped topically. Some readers may 197 prefer to skip over this material, returning, perhaps via the index to 198 terminology in section 7, when needed. 200 2.1.1. IDN: The term "IDN" has a number of different uses: (a) as an 201 abbreviation for "Internationalized Domain Name"; (b) as a fully 202 qualified domain name that contains at least one label that contains 203 characters not appearing in ASCII, specifically not in the subset of 204 ASCII recommended for domain names (the so-called "hostname" or "LDH" 205 subset, see RFC1035 [STD13]); (c) as a label of a domain name that 206 contains at least one character beyond ASCII; (d) as a Unicode string to 207 be processed by Nameprep; (e) as a string that is an output from 208 Nameprep; (f) as a string that is the result of processing through both 209 Nameprep and conversion into Punycode; (g) as the abbreviation of an IDN 210 (more properly, IDL) Package, in the terminology of this document; (h) 211 as the abbreviation of the IETF IDN Working Group; (g) as the 212 abbreviation of the ICANN IDN Committee; and (h) as standing for other 213 IDN activities in other companies/organizations. 215 Because of the potential confusion, this document uses the term "IDN" as 216 an abbreviation for Internationalized Domain Name and, specifically, in 217 the second sense described in (b) above. It uses "IDL," defined 218 immediately below, to refer to Internationalized Domain Labels. 220 2.1.2. IDL: This document provides a guideline to be applied on a per- 221 zone basis, one label at a time. Therefore, the term "Internationalized 222 Domain Label" or "IDL" will be used instead of the more general term 223 "IDN" or its equivalents. The processing specifications of this document 224 may be applied, in some zones, to ASCII characters also, if those 225 characters are specified as valid in a Language Variant Table (see 226 below). Hence, in some zones, an IDL may contain or consist entirely of 227 "LDH" characters. 229 2.1.3. FQDN: A fully qualified domain name, one that explicitly contains 230 all labels, including a Top-Level Domain (TLD) name. In this context, a 231 TLD name is one whose label appears in a nameserver record in the root 232 zone. The term "Domain Name Label" refers to any label of a FQDN. 234 2.1.4. Registration: In this document, the term "registration" refers to 235 the process by which a potential domain name holder requests that a 236 label be placed in the DNS either as an individual name within a domain 237 or as a subdomain delegation from another domain name holder. In the 238 case of a successful registration, the label or delegation records are 239 placed in the relevant zone file, or, more specifically, they are 240 "activated" or made "active" and additional IDLs may be reserved as part 241 of an "IDL Package" (see below). The guidelines presented here are 242 recommended for all zones--at any hierarchy level--in which CJK 243 characters are to appear and not just domains at the first or second 244 level. 246 2.1.5. RFC3066: A system, widely used in the Internet, for coding and 247 representing names of languages. It is based on an International 248 Organization for Standardization (ISO) standard for coding language 249 names [ISO639], but expands it to provide additional precision. 251 2.1.6. ISO/IEC 10646: The international standard universal multiple- 252 octet coded character set ("UCS") [IS10646]. The Code Point definitions 253 of this standard are identical to those of corresponding versions of the 254 Unicode standard (see below). Consequently, the characters and their 255 coding are often referred to as "Unicode characters." 257 2.1.7. Unicode Character: The term "Unicode character" is used here in 258 reference to characters chosen from the Unicode Standard Version 3.2 259 [UNICODE] (and hence from ISO/IEC 10646). In this document, the 260 characters are identified by their positions, or "Code Points." The 261 notation U+12AB, for example, indicates the character at the position 262 12AB (hexadecimal) in the Unicode 3.2 table. For characters in positions 263 above FFFF, i.e., requiring more than sixteen bits to represent--a five 264 to eight-character string is used, such as U+112AB for the character in 265 position 12AB of plane 1. 267 2.1.8. Unicode String: "Unicode string" refers to a string of Unicode 268 characters. The Unicode string is identified by the sequence of the 269 Unicode characters regardless of the encoding scheme. 271 2.1.9. CJK Characters: CJK characters are characters commonly used in 272 the Chinese, Japanese, or Korean languages, including but not limited to 273 those defined in the Unicode Standard as ASCII (U+0020 to U+007F), Han 274 ideographs (U+3400 to U+9FAF and U+20000 to U+2A6DF), Bopomofo (U+3100 275 to U+312F and U+31A0 to U+31BF), Kana (U+3040 to U+30FF), Jamo (U+1100 276 to 11FF and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF and U+3130 to 277 U+318F), and the respective compatibility forms. The particular 278 characters that are permitted in a given zone are specified in the 279 Language Variant Table(s) for that zone. 281 2.1.10. Label String: A generic term referring to a string of characters 282 that is a candidate for registration in the DNS or such a string, once 283 registered. A label string may or may not be valid according to the 284 rules of this specification and may even be invalid for IDNA use. The 285 term "label", by itself, refers to a string that has been validated and 286 may be formatted to appear in a DNS zone file. 288 2.1.11. Language Variant Table: The key mechanisms of this specification 289 utilize a three-column table, called a Language Variant Table, for each 290 language permitted to be registered in the zone. Those columns are 291 known, respectively, as "Valid Code Point", "Preferred Variant", and 292 "Character Variant", which are defined separately below. The Language 293 Variant Tables are critical to the success of the guideline described in 294 this document. However, the principles to be used to generate the tables 295 are not within the scope of this document and should be worked out by 296 each registry separately (perhaps by adopting or adapting the work of 297 some other registry). In this document, "Table" and "Variant Table" are 298 used as short forms for Language Variant Table. 300 2.1.12. Valid Code Point: In a Language Variant Table, the list of Code 301 Points that is permitted for that language. Any other Code Points, or 302 any string containing them, will be rejected by this specification. The 303 Valid Code Point list appears as the first column of the Language 304 Variant Table. 306 2.1.13. Preferred Variant: In a Language Variant Table, a list of Code 307 Points corresponding to each Valid Code Point and providing possible 308 substitutions for it. These substitutions are "preferred" in the sense 309 that the variant labels generated using them are normally registered in 310 the zone file, or "activated." The Preferred Code Points appear in 311 column 2 of the Language Variant Table. "Preferred Code Point" is used 312 interchangeably with this term. 314 2.1.14. Character Variant: In a Language Variant Table, a second list of 315 Code Points corresponding to each Valid Code Point and providing 316 possible substitutions for it. Unlike the Preferred Variants, 317 substitutions based on Character Variants are normally reserved but not 318 actually registered (or "activated"). Character Variants appear in 319 column 3 of the Language Variant Table. The term "Code Point Variants" 320 is used interchangeably with this term. 322 2.1.15. Preferred Variant Label: A label generated by use of Preferred 323 Variants (or Preferred Code Points). 325 2.1.16. Character Variant Label: A label generated by use of Character 326 Variants. 328 2.1.17. Zone Variant: A Preferred or Character Variant Label that is 329 actually to be entered (registered) into the DNS--that is, into the zone 330 file for the relevant zone. Zone Variants are also referred to as Zone 331 Variant Labels or Active (or Activated) Labels. 333 2.1.18. IDL Package: A collection of IDLs as determined by these 334 Guidelines. All labels in the package are "reserved", meaning they 335 cannot be registered by anyone other than the holder of the Package. 336 These reserved IDLs may be "activated", meaning they are actually 337 entered into a zone file as a "Zone Variant". The IDL Package also 338 contains identification of the language(s) associated with the 339 registration process. The IDL and its variant labels form a single, 340 atomic unit. 342 2.2 Notation for Ideographs and Other Non-ASCII CJK Characters. 344 For purposes of clarity, particularly in regard to examples, Han 345 ideographs appear in several places in this document. However, they do 346 not appear in the ASCII version of this document. For the convenience of 347 readers of the ASCII version--and some readers not familiar with 348 recognizing and distinguishing Chinese characters--most uses of these 349 characters will be associated with both their Unicode Code Points and an 350 "asterisk tag" with its corresponding Chinese Romanization [ISO7098], 351 with the tone mark represented by a number from 1 to 4. Those tags have 352 no meaning outside this document; they are a quick visual and reading 353 reference to help facilitate the combinations and transformations of 354 characters in the guideline and table excerpts. 356 3. Scope of the Administrative Guidelines 358 Zone administrators are responsible for the administration of the domain 359 name labels under their control. A zone administrator might be 360 responsible for a large zone, such as a top-level domain (TLD)--whether 361 generic or country code--or a smaller one, such as a typical second- or 362 third-level domain. A large zone is often more complex than its smaller 363 counterpart. However, actual technical administrative tasks--such as 364 addition, deletion, delegation, and transfer of zones between domain 365 name holders--are similar for all zones. 367 This document provides guidelines for the ways CJK characters should be 368 handled within a zone, for how language issues should be considered and 369 incorporated, and for how Domain Name Labels containing CJK characters 370 should be administered (including registration, deletion, and transfer 371 of labels). 373 Other IDN policies--such as the creation of new top-level domains 374 (TLDs), the cost structure for registrations, and how the processes 375 described here get allocated between registrar and registry if the zone 376 makes that distinction--also are outside the scope of this document. 378 Technical implementation issues are not discussed here either. For 379 example, deciding which guidelines should be implemented as registry 380 actions and which should be registrar actions is left to zone 381 administrators, with the possibility that it will differ from zone to 382 zone. 384 3.1. Principles Underlying These Guidelines 386 In many places, in the event of a dispute over rights to a name (or, 387 more accurately, DNS label string), this document assumes "first-come, 388 first-served" (FCFS) as a resolution policy even though FCFS is not 389 listed below as one of the principles for this document. If policies are 390 already in place governing priorities and "rights", one can use the 391 guidelines here by replacing uses of FCFS in this document with policies 392 specific to the zone. Some of the guidelines here may not be applicable 393 to other policies for determining rights to labels. Still other 394 alternatives--such as use of UDRP [WIPO-UDRP] or mutual exclusion--might 395 have little impact on other aspects of these guidelines. 397 (a) Although some Unicode strings may be pure identifiers made up of an 398 assortment of characters from many languages and scripts, IDLs are 399 likely to be "words" or "names" or "phrases" that have specific meaning 400 in a language. While a zone administration might or might not require 401 "meaning" as a registration criterion, meaning could prove to be a 402 useful tool for avoiding user confusion. 404 Each IDL to be registered should be associated administratively 405 with one or more languages. 407 Language associations should either be predetermined by the zone 408 administrator and applied to the entire zone or be chosen by the 409 registrants on a per-IDL basis. The latter may be necessary for some 410 zones, but it will make administration more difficult and will increase 411 the likelihood of conflicts in variant forms. 413 A given zone might have multiple languages associated with it or it may 414 have no language specified at all. Omitting specification of a language 415 may provide additional opportunities for user confusion and is therefore 416 NOT recommended. 418 (b) Each language uses only a subset of Unicode characters. Therefore, 419 if an IDL is associated with a language, it is not permitted to contain 420 any Unicode character that is not within the valid subset for that 421 language. 423 Each IDL to be registered must be verified against the valid 424 subset of Unicode for the language(s) associated with the IDL. 425 That subset is specified by the list of characters appearing in 426 the first column of the language and zone-specific tables as 427 described later in this document. 429 If the IDL fails this test for any of its associated languages, the IDL 430 is not valid for registration. 432 Note that this verification is not necessarily linguistically accurate, 433 because some languages have special rules. For example, some languages 434 impose restrictions on the order in which particular combinations of 435 characters may appear. Characters that are valid for the language--and 436 hence permitted by this specification--might still not form valid words 437 or even strings in the language. 439 (c) When an IDL is associated with a language, it may have Character 440 Variants that depend on that language associated with it in addition to 441 any Preferred Variants. These variants are potential sources of 442 confusion with the Code Points in the original label string. 443 Consequently, the labels generated from them should be unavailable to 444 registrants of other names, words, or phrases. 446 During registration, all labels generated from the Character 447 Variants for the associated language(s) of the IDL should be 448 reserved. 450 IDL reservations of the type described here normally do not appear in 451 the distributed DNS zone file. In other words, these reserved IDLs may 452 not resolve. Domain name holders could request that these reserved IDLs 453 be placed in the zone file and made active and resolvable. 455 Zones will need to establish local policies about how they are to be 456 made active. Specifically, many zones, especially at the top level, have 457 prohibited or restricted the use of "CNAME"s--DNS aliases--especially 458 CNAMEs that point to nameserver delegation records (NS records). And 459 long-term use of long-term aliases for domain hierarchies, rather than 460 single names ("DNAME records") are considered problematic because of the 461 recursion they can introduce into DNS lookups. 463 (d) When an IDL is a "name", "word", or "phrase", it will have Character 464 Variants depending on the associated language. Furthermore, one or more 465 of those Character Variants will be used more often than others for 466 linguistic, political, or other reasons. These more commonly used 467 variants are distinguished from ordinary Character Variants and are 468 known as Preferred Variant(s) for the particular language. 470 To increase the likelihood of correct and predictable resolution 471 of the IDN by end users, all labels generated from the Preferred 472 Variants for the associated language(s) should be resolvable. 474 In other words, the Preferred Variant Labels should appear in the 475 distributed DNS zone file. 477 (e) IDLs associated with one or more languages may have a large number 478 of Character Variant Labels or Preferred Variant Labels. Some of these 479 labels may include combinations of characters that are meaningless or 480 invalid linguistically. It may therefore be appropriate for a zone to 481 adopt procedures that include only linguistically-acceptable labels in 482 the IDL Package. 484 A zone administrator may impose additional rules and other 485 processing activities to limit the number of Character Variant 486 Labels or Preferred Variant Labels that are actually reserved or 487 registered. 489 These additional rules and other processing activities are based on 490 policies and/or procedures imposed on a per-zone basis and therefore are 491 not within the scope of this document. Such policies or procedures might 492 be used, for example, to restrict the number of Preferred Variant Labels 493 actually reserved or to prevent certain words from being registered at 494 all. 496 (f) There are some Character Variant Labels and Preferred Variant Labels 497 that are associated with each IDL. These labels are considered 498 "equivalent" to each another. To avoid confusion, they all should be 499 assigned to a single domain name holder. 501 The IDL and its variant labels should be grouped together into a 502 single atomic unit, known in this document as an "IDL Package". 504 The IDL Package is created upon registration and is atomic: Transfer and 505 deletion of an IDL is performed on the IDL Package as a whole. That is, 506 an IDL within the IDL Package may not be transferred or deleted 507 individually; any re-registration, transfers, or other actions that 508 impact the IDL should also affect the other variants. 510 The name-conflict resolution policy associated with this zone could 511 result in a conflict with the principle of IDL Package atomicity. In 512 such a case, the policy must be defined to make the precedence clear. 514 3.2. Registration of IDL 516 To conform to the principles described in 3.1, this document introduces 517 two concepts: the Language Variant Table and the IDL Package. These are 518 described in the next two subsections, followed by a description of the 519 algorithm that is used to interpret the table and generate variant 520 labels. 522 3.2.1. Using the Language Variant Table 524 For each zone that uses a given language, each language should have its 525 own Language Variant Table. The table consists of a header section that 526 identifies references and version information, followed by a section 527 with one row for each Code Point that is valid for the language and 528 three columns. 530 (1) The first column contains the subset of Unicode characters that 531 is valid to be registered ("Valid Code Point"). This is used to 532 verify the IDL to be registered (see 3.1b). As in the 533 registration procedure described later, this column is used as an 534 index to examine characters that appear in a proposed IDL to be 535 processed. The collection of Valid Code Points in the table for a 536 particular language can be thought of as defining the script for 537 that language, although the normal definition of a script would 538 not include, for example, ASCII characters with CJK ones. 540 (2) The second column contains the Preferred Variant(s) of the 541 corresponding Unicode character in column one ("Valid Code 542 Point"). These variant characters are used to generate the 543 Preferred Variant Labels for the IDL. Those labels should be 544 resolvable (see 3.1d). Under normal circumstances, all of those 545 Preferred Variant Labels will be activated in the relevant zone 546 file so that they will resolve when the DNS is queried for them. 548 (3) The third column contains the Character Variant(s) for the 549 corresponding Valid Code Point. These are used to generate the 550 Character Variant Labels of the IDL, which are then to be 551 reserved (see 3.1c). Registration--or activation--of labels 552 generated from Character Variants will normally be a registrant 553 decision, subject to local policy. 555 Each entry in a column consists of one or more Code Points, expressed as 556 a numeric character number in the Unicode table and optionally followed 557 by a parenthetical reference. The first column--or Valid Code Point--may 558 have only one Code Point specified in a given row. The other columns may 559 have more than one. 561 Any row may be terminated with an optional comment, starting in "#". 563 The formal syntax of the table and more-precise definitions of some of 564 its organization appear in Section 5. 566 The Language Variant Table should be provided by a relevant group, 567 organization, or body. However, the question of who is relevant or has 568 the authority to create this table and the rules that define it is 569 beyond the scope of this document. 571 3.2.2. IDL Package 573 The IDL Package is created on successful registration and consists of: 575 (1) the IDL registered 576 (2) the language(s) associated with the IDL 578 (3) the version of the associated character variant table 580 (4) the reserved IDLs 582 (5) active IDLs--that is, "Zone Variant Labels" that are to appear in 583 the DNS zone file 585 3.2.3. Procedure for Registering IDLs 587 An explanation follows each step. 589 Step 1. IN <= IDL to be registered and 590 {L} <= Set of languages associated with IN 592 Start the process with the label string (prospective IDL) to be 593 registered and the associated language(s) as input. 595 Step 2. Generate the Nameprep-processed version of the IN, applying 596 all mappings and canonicalization required by IDNA. 598 The prospective IDL is processed by using Nameprep to apply the 599 normalizations and exclusions globally required to use IDNA. If the 600 Nameprep processing fails, then the IDL is invalid and the registration 601 process must stop. 603 Step 2.1. NP(IN) <= Nameprep processed IN 604 Step 2.2. Check availability of NP(IN). 605 If not available, route to conflict policy. 607 The Nameprep-processed IDL is then checked against the contents of the 608 zone file and previously created IDL Packages. If it is already 609 registered or reserved, then a conflict exists that must be resolved by 610 applying whatever policy is applicable for the zone. For example, if 611 FCFS is used, the registration process terminates unless the conflict 612 resolution policy provides another alternative. 614 Step 3. Process each language. 615 For each language (AL} in {L} 617 Step 3 goes through all languages associated with the proposed IDL and 618 checks each character (after Nameprep has been applied) for validity in 619 each of them. It then applies the Preferred Variants (column 2 values) 620 and the Character Variants (column 3 values) to generate candidate 621 labels. 623 Step 3.1. Check validity of NP(IN) in AL. If failed, stop processing. 625 In step 3.1, IDL validation is done by checking that every Code Point in 626 the Nameprep-processed IDL is a Code Point allowed by the "Valid Code 627 Point" column of the Character Variant Table for the language. This is 628 then repeated for any other languages (and hence, Language Variant 629 Tables) specified in the registration. If one or more Code Points are 630 not valid, the registration process terminates. 632 Step 3.2. PV(IN,AL) <= Set of available Nameprep-processed Preferred 633 Variants of NP(IN) in AL 635 Step 3.2 generates the list of Preferred Variant Labels of the IDL by 636 doing a combination (see Step 3.2A below) of all possible variants 637 listed in the "Preferred Variant(s)" column for each Code Point in the 638 Nameprep-processed IDL. The generated Preferred Variant Labels must be 639 processed through Nameprep. If the Nameprep processing fails for any 640 Preferred Variant Label (this is unlikely to occur if the Preferred 641 Variants [Code Points] are processed through Nameprep before being 642 placed in the table), then that variant label will be removed from the 643 list. The remaining Preferred Variant Labels in the list are then 644 checked to see whether they are already registered or reserved. If any 645 are registered or reserved, then the conflict resolution policy will 646 apply. In general, this will not prevent the originally requested IDL 647 from being registered unless the policy prevents such registration. For 648 example, if FCFS is applied, then the conflicting variants will be 649 removed from the list, but the originally requested IDL and any 650 remaining variants will be registered (see steps 5 and 8 below). 652 Step 3.2A Generating variant labels from Variant Code Points. 654 Steps 3.2 and 3.3 require that the Preferred Variants and Character 655 Variants be combined with the original IDL to form sets of variant 656 labels. Conceptually, one starts with the original, Nameprep-processed, 657 IDL and examines each of its characters in turn. If a character is 658 encountered for which there is a corresponding Preferred Variant or 659 Character Variant, a new variant label is produced with the Variant Code 660 Point substituted for the original one. If variant labels already exist 661 as the result of the processing of characters that appeared earlier in 662 the original IDL, then the substitutions are made in them as well, 663 resulting in additional generated variant labels. This operation is 664 repeated separately for the Preferred Variants (in Step 3.2) and 665 Character Variants (in Step 3.3). Of course, equivalent results could be 666 achieved by processing the original IDL's characters in order, building 667 the Preferred Variant Label set and Character Variant Label set in 668 parallel. 670 This process will sometimes generate a very large number of labels. For 671 example, if only two of the characters in the original IDL are 672 associated with Preferred Variants and if the first of those characters 673 has three Preferred Variants and the second has two, one ends up with 12 674 variant labels to be placed in the IDL Package and, normally, in the 675 zone file. Repeating the process for Character Variants, if any exist, 676 would further increase the number of labels. And if more than one 677 language is specified for the original IDL, then repetition of the 678 process for additional languages (see step 4, below) might further 679 increase the size of the set. 681 For illustrative purposes, the "combination" process could be achieved 682 by a recursive function similar to the following pseudocode: 684 Function Combination(Str) 685 F <= first codepoint of Str 686 SStr <= Substring of Str, without the first code point 687 NSC <= {} 688 If SStr is empty then 689 for each V in (Variants of code point F) 690 NSC = NSC set-union (the string with the code point V) 691 End of Loop 692 Else 693 SubCom = Combination(SStr) 694 For each V in (Variants of code point F) 695 For each SC in SubCom 696 NSC = NSC set-union (the string with the 697 first code point V followed by the string SC) 698 End of Loop 699 End of Loop 700 Endif 702 Return NSC 704 Step 3.3. CV(IN,AL) <= Set of available Nameprep-processed Character 705 Variants of NP(IN) in AL 707 This step generates the list of Character Variant Labels by doing a 708 combination (see Step 3.2A above) of all the possible variants listed in 709 the "Character Variant(s)" column for each Code Point in the Nameprep- 710 processed original IDL. As with the Preferred Variant Labels, the 711 generated Character Variant Labels must be processed by, and acceptable 712 to, Nameprep. If the Nameprep processing fails for a Character Variant 713 Label, then that variant label will be removed from the list. The 714 remaining Character Variant Labels are then checked to be sure they are 715 not registered or reserved. If one or more are, then the conflict 716 resolution policy is applied. As with Preferred Variant Labels, a 717 conflict that is resolved in favor of the earlier registrant does not, 718 in general, prevent the IDL from being registered, nor the remaining 719 variants from being reserved in step 6 below. 721 Step 3.4. End of Loop 723 Step 4. Let PVall be the set-union of all PV(IN,AL) 725 Step 4 generates the Preferred Variants Label for all languages. 726 In this step, and again in step 6 below, the zone administrator may 727 impose additional rules and processing activities to restrict the number 728 of Preferred (tentatively to be reserved and activated) and Character 729 (tentatively to be reserved) Label Variants. These additional rules and 730 processing activities are zone policy specific and therefore are not 731 specified in this document. 733 Step 5. {ZV} <= PVall set-union NP(IN) 735 Step 5 generates the initial Zone Variants. The set includes all 736 Preferred Variants for all languages and the original Nameprep-processed 737 IDL. Unless excluded by further processing, these Zone Variants will be 738 activated--that is, placed into the DNS zone. Note that the "set-union" 739 operation will eliminate any duplicates. 741 Step 6. Let CVall be the set-union of all CV(IN,AL), set-minus {ZV} 743 Step 6 generates the Reserved Label Variants (the Character Variant 744 Label set). These labels are normally reserved but not activated. The 745 set includes all Character Variant Labels for all languages, but not the 746 Zone Variants defined in the previous step. The set-union and set-minus 747 operations eliminate any duplicates. 749 Step 7. Create IDL Package for IN using IN, {L}, {ZV} and CVall 751 In Step 7, the "IDL Package" is created using the original IDL, the 752 associated language(s), the Zone Variant Labels, and the Reserved 753 Variant Labels. If zone-specific additional processing or filtering is 754 to be applied to eliminate linguistically inappropriate or other forms, 755 it should be applied before the IDL Package is actually assembled. 757 Step 8. Put {ZV} into zone file 759 The activated IDLs are converted via ToASCII with UseSTD13ASCIIRules 760 [IDNA] before being placed into the zone file. This conversion results 761 in the IDLs being in the actual IDNA ("Punycode") form used in zone 762 files, while the IDLs have been carried in Unicode form up to this 763 point. If ToASCII fails for any of the activated IDLs, that IDL must not 764 be placed into the zone file. If the IDL is a subdomain name, it will be 765 delegated. 767 3.3. Deletion and Transfer of IDL and IDL Package 769 In traditional domain administration, every Domain Name Label is 770 independent of all other Domain Name Labels. Registration, deletion, and 771 transfer of labels is done on a per-label basis. However, with the 772 guidelines discussed here, each IDL is associated with specific 773 languages, with all label variants--both active (zone) and reserved-- 774 together in an IDL Package. This quite deliberately prohibits labels 775 that contain sufficient mixtures of characters from different scripts to 776 make them impossible as words in any given language. If a zone chooses 777 to not impose that restriction--that is, to permit labels to be 778 constructed by picking characters from several different languages and 779 scripts--then the guidelines described here would be inappropriate. 781 As stated earlier, the IDL package should be treated as a single atomic 782 unit and all variants of the IDL should belong to a single domain-name 783 holder. If the local policy related to the handling of disagreements 784 requires a particular IDL to be transferred and deleted independently of 785 the IDL Package, the conflict policy would take precedence. In such an 786 event, the conflict policy should include a transfer or delete procedure 787 that takes the nature of IDL Packages into consideration. 789 When an IDL Package is deleted, all of the Zone and Reserved Label 790 Variants again become available. The deletion of one IDL Package does 791 not change any other IDL Packages. 793 3.4. Activation and Deactivation of IDL variants 795 Because there are active (registered) IDLs and inactive (reserved but 796 not registered) IDLs within an IDL package, processes are required to 797 activate or deactivate IDL variants within an IDL Package. 799 3.4.1. Activation Algorithm 801 Step 1. IN <= IDL to be activated and PA <= IDL Package 803 Start with the IDL to be activated and the IDL Package of which it is a 804 member. 806 Step 2. NP(IN) <= Nameprep processed IN 808 Process the IDL through Nameprep. This step should never cause a 809 problem, or even a change, since all labels that become part of the IDL 810 Package are processed through Nameprep in Step 3.2 or 3.3 of the 811 Registration procedure (section 3.2.3). 813 Step 3. If NP(IN) not in {RV} then stop 815 Verify that the Nameprep-processed version of the IDL appears as a 816 still-unactivated label in the IDL Package, i.e., in the list of 817 Reserved Label Variants, {RV}. It might be a useful "sanity check" to 818 also verify that it does not already appear in the zone file. 820 Step 4. {RV} <= {RV} set-minus NP(IN) and {ZV} <= {ZV} set-union NP(IN) 822 Within the IDL Package, remove the Nameprep-processed version of the IDL 823 from the list of Reserved Label Variants and add it to the list of 824 active (zone) label variants. 826 Step 5. Put {ZV} into the zone file 828 Actually register (activate) the Zone Variant Labels. 830 3.4.2. Deactivation Algorithm 832 Step 1. IN <= IDL to be deactivated and PA <= IDL Package 834 As with activation, start with the IDL to be deactivated and the IDL 835 Package of which it is a member. 837 Step 2. NP(IN) <= Nameprep processed IN 839 Get the Nameprep-processed version of the name (see discussion in the 840 previous section). 842 Step 3. If NP(IN) not in {ZV} then stop 844 Verify that the Nameprep-processed version of the IDL appears as an 845 activated (zone) label variant in the IDL Package. It might be a useful 846 "sanity check" at this point to also verify that it actually appears in 847 the zone file. 849 Step 4. {RV} <= {RV} set-union NP(IN) and {ZV} <= {ZV} set-minus NP(IN) 851 Within the IDL Package, remove the Nameprep-processed version of the IDL 852 from the list of Active (Zone) Label Variants and add it to the list of 853 Reserved (but inactive) Label Variants. 855 Step 5. Put {ZV} into the zone file 856 3.5. Managing Changes in Language Associations 858 Since the IDL package is an atomic unit and the associated list of 859 variants must not be changed after creation, this document does not 860 include a mechanism for adding and deleting language associations within 861 the IDL package. Instead, it recommends deleting the IDL package 862 entirely, followed by a registration with the new set of languages. Zone 863 administrators may find it desirable to devise procedures that prevent 864 other parties from capturing the labels in the IDL Package during these 865 operations. 867 3.6. Managing Changes to the Language Variant Tables 869 Language Variant Tables are subject to changes over time, and these 870 changes may or may not be backward compatible. It is possible that 871 updated Language Variant Tables may produce a different set of Preferred 872 Variants and Reserved Variants. 874 In order to preserve the atomicity of the IDL Package, when the Language 875 Variant Table is changed, IDL Packages created using the previous 876 version of the Language Variant Table must not be updated or affected. 878 4. Examples of Guideline Use in Zones 880 To provide a meaningful example, some Language Variant Tables must be 881 defined. Assume, then, for the purpose of giving examples, that the 882 following four Language Variant Tables are defined: 884 Note: these tables are not a representation of the actual tables, and 885 they do not contain sufficient entries to be used in any actual 886 implementation. 888 a) Language Variant Table for zh-cn and zh-sg 890 Reference 1 CP936 (commonly known as GBK) 891 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt 892 Reference 3 List of Simplified character Table (Simplified column) 893 Reference 4 zSimpVariant in Unihan.txt 894 Reference 5 variant that exists in GB2312, common simplified hanzi 896 Version 1 20020701 # July 2002 898 56E2(1);56E2(5);5718(2) # sphere, ball, circle; mass, lump 899 5718(1);56E2(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump 900 60F3(1);60F3(5); # think, speculate, plan, consider 901 654E(1);6559(5);6559(2) # teach 902 6559(1);6559(5);654E(2) # teach, class 903 6DF8(1);6E05(5);6E05(2) # clear 904 6E05(1);6E05(5);6DF8(2) # clear, pure, clean; peaceful 905 771E(1);771F(5);771F(2) # real, actual, true, genuine 906 771F(1);771F(5);771E(2) # real, actual, true, genuine 907 8054(1);8054(3);806F(2) # connect, join; associate, ally 908 806F(1);8054(3);8054(2),8068(2) # connect, join; associate, ally 909 96C6(1);96C6(5); # assemble, collect together 910 b) Language Variant Table for zh-tw 912 Reference 1 CP950 (commonly known as BIG5) 913 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt 914 Reference 3 List of Simplified Character Table (Traditional column) 915 Reference 4 zTradVariant in Unihan.txt 917 Version 1 20020701 # July 2002 919 5718(1);5718(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump 920 60F3(1);60F3(1); # think, speculate, plan, consider 921 6559(1);6559(1);654E(2) # teach, class 922 6E05(1);6E05(1);6DF8(2) # clear, pure, clean; peaceful 923 771F(1);771F(1);771E(2) # real, actual, true, genuine 924 806F(1);806F(3);8054(2),8068(2) # connect, join; associate, ally 925 96C6(1);96C6(1); # assemble, collect together 927 c) Language Variant Table for ja 929 Reference 1 CP932 (commonly known as Shift-JIS) 930 Reference 2 zVariant in Unihan.txt 931 Reference 3 variant that exists in JIS X0208, commonly used Kanji 933 Version 1 20020701 # July 2002 935 5718(1);5718(3);56E3(2) # sphere, ball, circle; mass, lump 936 60F3(1);60F3(3); # think, speculate, plan, consider 937 654E(1);6559(3);6559(2) # teach 938 6559(1);6559(3);654E(2) # teach, class 939 6DF8(1);6E05(3);6E05(2) # clear 940 6E05(1);6E05(3);6DF8(2) # clear, pure, clean; peaceful 941 771E(1);771E(1);771F(2) # real, actual, true, genuine 942 771F(1);771F(1);771E(2) # real, actual, true, genuine 943 806F(1);806F(1);8068(2) # connect, join; associate, ally 944 96C6(1);96C6(3); # assemble, collect together 946 d) Language Variant Table for ko 948 Reference 1 CP949 (commonly known as EUC-KR) 949 Reference 2 zVariant and K-source in Unihan.txt 951 Version 1 20020701 # July 2002 953 5718(1);5718(1);56E3(2) # sphere, ball, circle; mass, lump 954 60F3(1);60F3(1); # think, speculate, plan, consider 955 654E(1);654E(1);6559(2) # teach 956 6DF8(1);6DF8(1);6E05(2) # clear 957 771E(1);771E(1);771F(2) # real, actual, true, genuine 958 806F(1);806F(1);8068(2) # connect, join; associate, ally 959 96C6(1);96C6(1); # assemble, collect together 961 Example 1: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 962 {L} = {zh-cn, zh-sg, zh-tw} 964 NP(IN) = (U+6E05 U+771F U+6559) 965 PV(IN,zh-cn) = (U+6E05 U+771F U+6559) 966 PV(IN,zh-sg) = (U+6E05 U+771F U+6559) 967 PV(IN,zh-tw) = (U+6E05 U+771F U+6559) 968 {ZV} = {(U+6E05 U+771F U+6559)} 969 CVall = {(U+6E05 U+771E U+6559), 970 (U+6E05 U+771E U+654E), 971 (U+6E05 U+771F U+654E), 972 (U+6DF8 U+771E U+6559), 973 (U+6DF8 U+771E U+654E), 974 (U+6DF8 U+771F U+6559), 975 (U+6DF8 U+771F U+654E)} 977 Example 2: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 978 {L} = {ja} 980 NP(IN) = (U+6E05 U+771F U+6559) 981 PV(IN,ja) = (U+6E05 U+771F U+6559) 982 {ZV} = {(U+6E05 U+771F U+6559)} 984 CVall = {(U+6E05 U+771E U+6559), 985 (U+6E05 U+771E U+654E), 986 (U+6E05 U+771F U+654E), 987 (U+6DF8 U+771E U+6559), 988 (U+6DF8 U+771E U+654E), 989 (U+6DF8 U+771F U+6559), 990 (U+6DF8 U+771F U+654E)} 992 Example 3: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 993 {L} = {zh-cn, zh-sg, zh-tw, ja, ko} 995 NP(IN) = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 996 Invalid registration because U+6E05 is invalid in L = ko 998 Example 4: IDL = (U+806F U+60F3 U+96C6 U+5718) 999 *lian2 xiang3 ji2 tuan2* 1000 {L} = {zh-cn, zh-sg, zh-tw} 1002 NP(IN) = (U+806F U+60F3 U+96C6 U+5718) 1003 PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) 1004 PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) 1005 PV(IN,zh-tw) = (U+806F U+60F3 U+96C6 U+5718) 1006 {ZV} = {(U+8054 U+60F3 U+96C6 U+56E2), 1007 (U+806F U+60F3 U+96C6 U+5718)} 1008 CVall = {(U+8054 U+60F3 U+96C6 U+56E3), 1009 (U+8054 U+60F3 U+96C6 U+5718), 1010 (U+806F U+60F3 U+96C6 U+56E2), 1011 (U+806f U+60F3 U+96C6 U+56E3), 1012 (U+8068 U+60F3 U+96C6 U+56E2), 1013 (U+8068 U+60F3 U+96C6 U+56E3), 1014 (U+8068 U+60F3 U+96C6 U+5718) 1016 Example 5: IDL = (U+8054 U+60F3 U+96C6 U+56E2) 1017 *lian2 xiang3 ji2 tuan2* 1018 {L} = {zh-cn, zh-sg} 1020 NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) 1021 PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) 1022 PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) 1023 {ZV} = {(U+8054 U+60F3 U+96C6 U+56E2)} 1024 CVall = {(U+8054 U+60F3 U+96C6 U+56E3), 1025 (U+8054 U+60F3 U+96C6 U+5718), 1026 (U+806F U+60F3 U+96C6 U+56E2), 1027 (U+806f U+60F3 U+96C6 U+56E3), 1028 (U+806F U+60F3 U+96C6 U+5718), 1029 (U+8068 U+60F3 U+96C6 U+56E2), 1030 (U+8068 U+60F3 U+96C6 U+56E3), 1031 (U+8068 U+60F3 U+96C6 U+5718)} 1033 Example 6: IDL = (U+8054 U+60F3 U+96C6 U+56E2) 1034 *lian2 xiang3 ji2 tuan2* 1035 {L} = {zh-cn, zh-sg, zh-tw} 1037 NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) 1038 Invalid registration because U+8054 is invalid in L = zh-tw 1040 Example 7: IDL = (U+806F U+60F3 U+96C6 U+5718) 1041 *lian2 xiang3 ji2 tuan2* 1042 {L} = {ja,ko} 1044 NP(IN) = (U+806F U+60F3 U+96C6 U+5718) 1045 PV(IN,ja) = (U+806F U+60F3 U+96C6 U+5718) 1046 PV(IN,ko) = (U+806F U+60F3 U+96C6 U+5718) 1047 {ZV} = {(U+806F U+60F3 U+96C6 U+5718)} 1048 CVall = {(U+806F U+60F3 U+96C6 U+56E3), 1049 (U+8068 U+60F3 U+96C6 U+5718), 1050 (U+8068 U+60F3 U+96C6 U+56E3)} 1052 5. Syntax Description for the Language Variant Table 1054 The formal syntax for the Language Variant Table is as follows, using 1055 the IETF "ABNF" metalanguage [ABNF]. Some comments on this syntax appear 1056 immediately after it. 1058 5.1 ABNF Syntax 1060 LanguageVariantTable = 1*ReferenceLine VersionLine 1*EntryLine 1061 ReferenceLine = "Reference" SP RefNo SP RefDesciption [ Comment ] CRLF 1062 RefNo = 1*DIGIT 1063 RefDesciption = *[VCHAR] 1064 VersionLine = "Version" SP VersionNo SP VersionDate [ Comment ] CRLF 1065 VersionNo = 1*DIGIT 1066 VersionDate = YYYYMMDD 1067 EntryLine = VariantEntry/Comment CRLF 1069 VariantEntry = ValidCodePoint ";" 1070 PreferredVariant ";" CharacterVariant [ Comment ] 1071 ValidCodePoint = CodePoint 1072 RefList = RefNo 0*( "," RefNo ) 1073 PreferredVariant = CodePointSet 0*( "," CodePointSet ) 1074 CharacterVariant = CodePointSet 0*( "," CodePointSet ) 1075 CodePointSet = CodePoint 0*( SP CodePoint ) 1076 CodePoint = 4*8DIGIT [ "(" Reflist ")" ] 1077 Comment = "#" *VCHAR 1079 YYYYMMDD is an integer, in alphabetic form, representing a date, where 1080 YYYY is the 4-digit year, MM is the 2-digit month, and DD is the 2-digit 1081 day. 1083 5.2. Comments and Explanation of Syntax 1085 Any lines starting with, or portions of lines after, the hash 1086 symbol("#") are treated as comments. Comments have no significance in 1087 the processing of the tables; nor are there any syntax requirements 1089 between the hash symbol and the end of the line. Blank lines in the 1090 tables are ignored completely. 1092 Every language should have its own Language Variant Table provided by a 1093 relevant group, organization, or other body. That table will normally be 1094 based on some established standard or standards. The group that defines 1095 a Language Variant Table should document references to the appropriate 1096 standards at the beginning of the table, tagged with the word 1097 "Reference" followed by an integer (the reference number) followed by 1098 the description of the reference. For example: 1100 Reference 1 CP936 (commonly known as GBK) 1101 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt 1102 Reference 3 List of Simplified Character Table (Simplified column) 1103 Reference 4 zSimpVariant in Unihan.txt 1104 Reference 5 Variant that exists in GB2312, common simplified Hanzi 1106 Each Language Variant Table must have a version number and its release 1107 date. This is tagged with the word "Version" followed by an integer then 1108 followed by the date in the format YYYYMMDD, where YYYY is the 4-digit 1109 year, MM is the 2-digit month, and DD is the 2-digit day of the 1110 publication date of the table. 1112 Version 1 20020701 # July 2002 Version 1 1114 The table has three columns, separated by semicolons: "Valid Code 1115 Point"; "Preferred Variant(s)"; and "Character Variant(s)". 1117 The "Valid Code Point" is the subset of Unicode characters that are 1118 valid to be registered. 1120 There can be more than one Preferred Variant; hence there could be 1121 multiple entries in the "Preferred Variant(s)" column. If the "Preferred 1122 Variant(s)" column is empty, then there is no corresponding Preferred 1123 Variant; in other words, the Preferred Variant is null. Unless local 1124 policy dictates otherwise, the procedures above will result in only 1125 those labels that reflect the valid code point being activated 1126 (registered) into the zone file. 1128 The "Character Variant(s)" column contains all Character Variants of the 1129 Code Point. Since the Code Point is always a variant of itself, to avoid 1130 redundancy, the Code Point is assumed to be part of the "Character 1131 Variant(s)" and need not be repeated in the "Character Variant(s)" 1132 column. 1134 If the variant in the "Preferred Variant(s)" or the "Character 1135 Variant(s)" column is composed of a sequence of Code Points, then 1136 sequence of Code Points is listed separated by a space. 1138 If there are multiple variants in the "Preferred Variant(s)" or the 1139 "Character Variant(s)" column, then each variant is separated by a 1140 comma. 1142 Any Code Point listed in the "Preferred Variant(s)" column must be 1143 allowed by the rules for the relevant language to be registered. 1144 However, this is not a requirement for the entries in the "Character 1145 Variant(s)" column; it is possible that some of those entries may not be 1146 allowed to be registered. 1148 Every Code Point in the table should have a corresponding reference 1149 number (associated with the references) specified to justify the entry. 1150 The reference number is placed in parentheses after the Code Point. If 1151 there is more than one reference, then the numbers are placed within a 1152 single set of parentheses and separated by commas. 1154 6. Security Considerations 1156 As discussed in the Introduction, substantially-unrestricted use of 1157 international (non-ASCII) characters in domain name labels may cause 1158 user confusion and invite various types of attacks. In particular, in 1159 the case of CJK languages, an attacker has an opportunity to divert or 1160 confuse users as a result of different characters (or, more 1161 specifically, assigned code points) with identical or similar semantics. 1162 These Guidelines provide a partial remedy for those risks by supplying a 1163 framework for prohibiting inappropriate characters from being registered 1164 at all and for permitting "variant" characters to be grouped together 1165 and reserved, so that they can only be registered in the DNS by the same 1166 owner. However, the system it suggests is no better or worse than the 1167 per-zone and per-language tables whose format and use this document 1168 specifies. Specific tables, and any additional local processing, will 1169 reflect per-zone decisions about the balance between risk and 1170 flexibility of registrations. And, of course, errors in construction 1171 of those tables may significantly reduce the quality of protection 1172 provided. 1174 7. Index to Terminology 1176 As a convenience to the reader, this section lists all of the special 1177 terminology used in this document, with a pointer to the section in 1178 which it is defined. 1180 Activated Label 2.1.17 1181 Activation 2.1.4 1182 Active Label 2.1.17 1183 Character Variant 2.1.14 1184 Character Variant Label 2.1.16 1185 CJK Characters 2.1.9 1186 Code point 2.1.7 1187 Code Point Variant 2.1.14 1188 FQDN 2.1.3 1189 Hostname 2.1.1 1190 IDL 2.1.2 1191 IDL Package 2.1.18 1192 IDN 2.1.1 1193 Internationalized Domain Label 2.1.2 1194 ISO/IEC 10646 2.1.6 1195 Label String 2.1.10 1196 Language name codes 2.1.5 1197 Language Variant Table 2.1.11 1198 LDH Subset 2.1.1 1199 Preferred Code Point 2.1.13 1200 Preferred Variant 2.1.13 1201 Preferred Variant Label 2.1.15 1202 Registration 2.1.4 1203 Reserved 2.1.18 1204 RFC3066 2.1.5 1205 Table 2.1.11 1206 UCS 2.1.6 1207 Unicode Character 2.1.7 1208 Unicode String 2.1.8 1209 Valid Code Point 2.1.12 1210 Variant Table 2.1.11 1211 Zone Variant 2.1.17 1213 8. Acknowledgments 1215 The authors gratefully acknowledge the contributions of: 1217 - V. CHEN, N. HSU, H. HOTTA, S. TASHIRO, Y. YONEYA, and other Joint 1218 Engineering Team members at the JET meeting in Bangkok, Thailand. 1220 - Yves Arrouye, an observer at the JET meeting in Bangkok, for his 1221 contribution on the IDL Package. 1223 - Those who commented on, and made suggestions about, earlier versions, 1224 including Harald ALVESTRAND, Erin CHEN, Patrik FALTSTROM, Paul 1225 HOFFMAN, Soobok LEE, LEE Xiaodong, MAO Wei, Erik NORDMARK, and L.M. 1226 TSENG. 1228 9. Authors' Addresses 1230 James SENG 1231 180 Lompang Road 1232 #22-07 Singapore 670180 1233 Phone: +65 9638-7085 1234 E-mail: jseng@pobox.org.sg 1236 Kazunori KONISHI 1237 JPNIC 1238 Kokusai-Kougyou-Kanda Bldg 6F 1239 2-3-4 Uchi-Kanda, Chiyoda-ku 1240 Tokyo 101-0047 1241 Japan 1242 Phone: +81 49-278-7313 1243 E-mail: konishi@jp.apan.net 1245 Kenny HUANG 1246 TWNIC 1247 3F, 16, Kang Hwa Street, Taipei 1248 Taiwan 1249 TEL : 886-2-2658-6510 1250 E-mail: huangk@alum.sinica.edu 1251 QIAN Hualin 1252 CNNIC 1253 No.6 Branch-box of No.349 Mailbox, Beijing 100080 1254 Peoples Republic of China 1255 E-mail: Hlqian@cnnic.net.cn 1257 KO YangWoo 1258 PeaceNet 1259 Yangchun P.O. Box 81 Seoul 158-600 1260 Korea 1261 E-mail: yw@mrko.pe.kr 1263 John C KLENSIN 1264 1770 Massachusetts Avenue, No. 322 1265 Cambridge, MA 02140 1266 U.S.A. 1267 E-mail: Klensin+ietf@jck.com 1269 Wendy RICKARD 1270 The Rickard Group 1271 16 Seminary Ave 1272 Hopewell, NJ 08525 1273 USA 1274 E-mail: rickard@rickardgroup.com 1276 10. Normative References 1278 [ABNF] Crocker, D. and P. Overell, eds.,Augmented BNF for Syntax 1279 Specifications: ABNF, RFC 2234 November 1997. 1281 [STD13] Mockapetris, P. "Domain names--concepts and facilities" 1282 (RFC 1034) and "Domain names--implementation and 1283 specification" (RFC 1035), STD 13, November 1987. 1285 [RFC3066] Alvestrand, H., Tags for the Identification of Languages, 1286 RFC3066, Jan 2001. 1288 [IDNA] Faltstrom, Patrik, Paul Hoffman, Adam M. Costello, 1289 Internationalizing Domain Names in Applications (IDNA), RFC 1290 3490, March 2003. 1292 [PUNYCODE] Costello, A.M., Punycode: A Bootstring encoding of Unicode 1293 for Internationalized Domain Names in Applications (IDNA), 1294 RFC 3492, March 2003. 1296 [STRINGPREP]Hoffman, P. and M. Blanchet, Preparation of 1297 Internationalized Strings ("stringprep"), RFC 3454, December 1298 2002. 1300 [NAMEPREP] Hoffman, P. and M. Blanchet, Nameprep: A Stringprep Profile 1301 for Internationalized Domain Names, RFC 3491, March 2003. 1303 [IS10646] A product of ISO/IEC JTC1/SC2/WG2, Work Item JTC1.02.18 1304 (ISO/IEC 10646). It is a multipart standard: Part 1, 1305 published as ISO/IEC 10646-1:2000(E), covers the 1306 Architecture and Basic Multilingual Plane, and Part 2, 1307 published as ISO/IEC 10646-2:2001(E), covers the 1308 supplementary (additional) planes. 1310 [UNIHAN] Unicode Han Database, Unicode Consortium 1311 ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt. 1313 [UNICODE] The Unicode Consortium, "The Unicode Standard--Version 1314 3.0," ISBN 0-201-61633-5. Unicode Standard Annex #28 1315 (http://www.unicode.org/unicode/reports/tr28/) defines 1316 Version 3.2 of the Unicode Standard, which is definitive for 1317 IDNA and this document. 1319 [ISO7098] ISO 7098;1991 Information and documentation--Romanization 1320 of Chinese, ISO/TC46/SC2. 1322 11. Nonnormative References 1324 [IDN-WG] IETF Internationalized Domain Names Working Group, now 1325 concluded,idn@ops.ietf.org, James Seng, Marc Blanchet, 1326 co-chairs, http://www.i-d-n.net/. 1328 [IESG-IDN] Internet Engineering Steering Group, IETF, "IESG Statement 1329 on IDN", 11 February 2003, 1330 http://www.ietf.org/IESG/STATEMENTS/IDNstatement.txt. 1332 [ISO639] "ISO 639:1988 (E/F)--Code for the representation of names 1333 of languages"--International Organization for 1334 Standardization, 1st edition, 1988-04-01.