idnits 2.17.1 draft-jseng-idn-admin-03.txt: -(139): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == There are 5 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1336 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'IDN-WG' is mentioned on line 1321, but not defined == Missing Reference: 'STRINGPREP' is mentioned on line 128, but not defined == Missing Reference: 'ISO639' is mentioned on line 1329, but not defined == Missing Reference: 'WIPO-UDRP' is mentioned on line 391, but not defined == Missing Reference: 'IESG-IDN' is mentioned on line 1325, but not defined == Unused Reference: 'RFC3066' is defined on line 1283, but no explicit reference was found in the text == Unused Reference: 'UNIHAN' is defined on line 1307, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2234 (ref. 'ABNF') (Obsoleted by RFC 4234) ** Obsolete normative reference: RFC 3066 (Obsoleted by RFC 4646, RFC 4647) ** Obsolete normative reference: RFC 3490 (ref. 'IDNA') (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (ref. 'NAMEPREP') (Obsoleted by RFC 5891) -- Possible downref: Non-RFC (?) normative reference: ref. 'IS10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNIHAN' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO7098' Summary: 11 errors (**), 0 flaws (~~), 10 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT Editors: James SENG 2 draft-jseng-idn-admin-03.txt John C KLENSIN, Wendy RICKARD 3 16 June 2003 Authors: K. KONISHI 4 Expires December 2003 K. HUANG, H. QIAN, Y. KO 6 Internationalized Domain Names Registration and Administration 7 Guideline for Chinese, Japanese, and Korean 9 Status of This Memo 11 This document is an Internet Draft and is in full conformance 12 with all provisions of Section 10 of RFC2026 except that the 13 right to produce derivative works is not granted. 15 Internet Drafts are working documents of the Internet 16 Engineering Task Force (IETF), its areas, and its working 17 groups. Note that other groups may also distribute working 18 documents as Internet Drafts. 20 Internet Drafts are draft documents valid for a maximum of 21 six months and may be updated, replaced, or rendered obsolete by 22 other documents at any time. It is inappropriate to use Internet 23 Drafts as reference material or to cite them other than as 24 "works in progress." 26 The list of current Internet Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 Abstract 34 Achieving internationalized access to domain names raises many complex 35 issues. These are associated not only with basic protocol design--such 36 as how names are represented on the network, compared, and converted to 37 appropriate forms--but also with issues and options for deployment, 38 transition, registration, and administration. 40 The IETF Internationalized Domain Name (IDN) Working Group focused its 41 efforts on the development of a standards-track specification for access 42 to domain names in a range of scripts that is broader in scope than the 43 original ASCII. During its efforts, it became clear that the appearance 44 of characters with similar appearances and/or interpretations created 45 potential for confusion, as well as difficulties in deployment and 46 transition, and that those issues could best be addressed 47 administratively rather than through restrictions embedded in the 48 protocols. 50 This document is an effort of the Joint Engineering Team (JET), a group 51 composed of members of CNNIC, TWNIC, KRNIC, and JPNIC as well as other 52 individual experts. It offers guidelines for zone administrators -- 53 including but not limited to registry operators and registrars -- and 54 information for all domain names holders on the administration of domain 55 names that contain characters drawn from Chinese, Japanese, and Korean 56 scripts. Other language groups are encouraged to develop their own 57 guidelines as needed, based on these guidelines if that is helpful. 59 Table of Contents 61 1. Introduction 63 2. Definitions, Context, and Notation 64 2.1. Definitions and Context 65 2.2. Notation for Ideographs and Other Non-ASCII CJK Characters 67 3. Scope of the Administrative Guidelines 68 3.1. Principles Underlying These Guidelines 69 3.2. Registration of IDL 70 3.2.1. Using the Language Variant Table 71 3.2.2. IDL Package 72 3.2.3. Procedure for Registering IDLs 73 3.3. Deletion and Transfer of IDL and IDL Package 74 3.4. Activation and Deactivation of IDL Variants 75 3.4.1. Activation Algorithm 76 3.4.2. Deactivation Algorithm 77 3.5. Managing Changes in Language Associations 78 3.6. Managing Changes to Language Variant Tables 80 4. Examples of Guideline Use in Zones 82 5. Syntax Description for the Language Variant Table 83 5.1 ABNF Syntax 84 5.2. Comments and Explanation of Syntax 86 6. Security Considerations 88 7. Index to Terminology 90 8. Acknowledgments 92 9. Authors’ Addresses 94 10. Normative References 96 11. Nonnormative References 98 1. Introduction 100 Domain names form the fundamental naming architecture of the Internet. 101 Countless Internet protocols and applications rely on them, not just for 102 stability and continuity, but also to avoid ambiguity. They were 103 designed to be identifiers without any language context. However, as 104 domain names have become visible to end users through Web URLs and 105 e-mail addresses, the strings in domain-name labels are being 106 increasingly interpreted as names, words, or phrases. It is likely that 107 users will do the same with languages of differing character sets--such 108 as Chinese, Japanese and Korean (CJK)--in which many words or concepts 109 are represented using short sequences of characters. 111 The introduction of what are called Internationalized Domain Names (IDN) 112 amplifies both the difficulty of putting names into identifiers and the 113 confusion that exists between scripts and languages. It also affects a 114 number of Internet protocols and applications and creates additional 115 layers of complexity in terms of technical administration and services. 116 Given the added complications of using a much broader range of 117 characters than the original small ASCII subset, precautions are 118 necessary in the deployment of IDNs in order to minimize confusion and 119 fraud. 121 The IETF IDN Working Group [IDN-WG] addressed the problem of handling 122 the encoding and decoding of Unicode strings into and out of Domain Name 123 System (DNS) labels with the goal that its solution would not put the 124 operational DNS at any risk. Its work resulted in one primary protocol 125 and three supporting ones, respectively: 127 1. Internationalizing Host Names in Applications [IDNA] 128 2. Preparation of Internationalized Strings [STRINGPREP] 129 3. A Stringprep Profile for Internationalized Domain Names [NAMEPREP] 130 4. Punycode [PUNYCODE] 132 IDNA--which calls on the others--normalizes and transforms strings that 133 are intended to be used as IDNs. In combination, the four provide the 134 minimum functions required for internationalization, such as performing 135 case mappings, eliminating character differences that would cause severe 136 problems, and specifying matching (equality). They also convert between 137 the resulting Unicode code points and an ASCII-based form that is more 138 suitable for storing in actual DNS labels. In this way, the IDNA 139 transformations improve a user’s chances of getting to the correct IDN. 141 Addressing the issues around differing character sets, a primary 142 consideration and administrative challenge involves region-specific 143 definitions, interpretations, and the semantics of strings to be used in 144 IDNs. A Unicode string may have a specific meaning as a name, word, or 145 phrase in a particular language but that meaning could vary depending on 146 the country, region, culture, or other context in which the string is 147 used. It might also have different interpretations in different 148 languages that share some or all of the same characters. Therefore, 149 individual zones and zone administrators may find it necessary to impose 150 restrictions and procedures to reduce the likelihood of confusion--and 151 instabilities of reference--within their own environments. 153 Over the centuries, the evolution of CJK characters--and the differences 154 in their use in different languages and even in different regions where 155 the same language is spoken--has given rise to the idea of "variants", 156 wherein one conceptual character can be identified with several 157 different Code Points in character sets for computer use. This document 158 provides a framework for handling such variants while minimizing the 159 possibility of serious user confusion in the obtaining or use of domain 160 names. However, the concept of variants is complex and may require many 161 different layers of solution, this guideline offers only one of the 162 solution components. It is not sufficient by itself to solve the whole 163 problem, even with zone-specific tables as described below. 165 Additionally, because of local language or writing-system differences, 166 it is impossible to create universally accepted definitions for which 167 potential variants are the same and which are not the same. It is even 168 more difficult to define a technical algorithm to generate variants that 169 are linguistically accurate--that is, that the variant forms produced 170 make as much sense in the language as the originally specified forms. 171 It is also possible that variants generated may have no meaning in the 172 associated language or languages. The intention is not to generate 173 meaningful "words" but to generate similar variants to be reserved. So 174 even though the method described in this document may not always be 175 linguistically accurate--or need to be--it increases the chances of 176 getting the right variants while accepting the inherent limitations of 177 the DNS and the complexities of human language. 179 This document outlines a model for such conventions for zones in which 180 labels that contain CJK characters are to be registered and a system for 181 implementing that model. It provides a mechanism that allows each zone 182 to define its own local rules for permitted characters and sequences and 183 the handling of IDNs and their variants. 185 2. Definitions, Context, and Notation 187 2.1. Definitions and Context 189 This document uses a number of special terms. In this section, 190 definitions and explanations are grouped topically. Some readers may 191 prefer to skip over this material, returning, perhaps via the index to 192 terminology in section 7, when needed. 194 2.1.1. IDN: The term "IDN" has a number of different uses: (a) as an 195 abbreviation for "Internationalized Domain Name"; (b) as a fully 196 qualified domain name that contains at least one label that contains 197 characters not appearing in ASCII, specifically not in the subset of 198 ASCII recommended for domain names (the so-called "hostname" or "LDH" 199 subset, see RFC1035 [STD13]); (c) as a label of a domain name that 200 contains at least one character beyond ASCII; (d) as a Unicode string to 201 be processed by Nameprep; (e) as a string that is an output from 202 Nameprep; (f) as a string that is the result of processing through both 203 Nameprep and conversion into Punycode; (g) as the abbreviation of an IDN 204 (more properly, IDL) Package, in the terminology of this document; (h) 205 as the abbreviation of the IETF IDN Working Group; (g) as the 206 abbreviation of the ICANN IDN Committee; and (h) as standing for other 207 IDN activities in other companies/organizations. 209 Because of the potential confusion, this document uses the term "IDN" as 210 an abbreviation for Internationalized Domain Name and, specifically, in 211 the second sense described in (b) above. It uses "IDL," defined 212 immediately below, to refer to Internationalized Domain Labels. 214 2.1.2. IDL: This document provides a guideline to be applied on a 215 per-zone basis, one label at a time. Therefore, the term 216 "Internationalized Domain Label" or "IDL" will be used instead of the 217 more general term "IDN" or its equivalents. The processing 218 specifications of this document may be applied, in some zones, to ASCII 219 characters also, if those characters are specified as valid in a 220 Language Variant Table (see below). Hence, in some zones, an IDL may 221 contain or consist entirely of "LDH" characters. 223 2.1.3. FQDN: A fully qualified domain name, one that explicitly 224 contains all labels, including a Top-Level Domain (TLD) name. In this 225 context, a TLD name is one whose label appears in a nameserver record in 226 the root zone. The term "Domain Name Label" refers to any label of a 227 FQDN. 229 2.1.4. Registration: In this document, the term "registration" refers 230 to the process by which a potential domain name holder requests that a 231 label be placed in the DNS either as an individual name within a domain 232 or as a subdomain delegation from another domain name holder. In the 233 case of a successful registration, the label or delegation records are 234 placed in the relevant zone file, or, more specifically, they are 235 "activated" or made "active" and additional IDLs may be reserved as part 236 of an "IDL Package" (see below). The guidelines presented here are 237 recommended for all zones--at any hierarchy level--in which CJK 238 characters are to appear and not just domains at the first or second 239 level. 241 2.1.5. RFC3066: A system, widely used in the Internet, for coding and 242 representing names of languages. It is based on an International 243 Organization for Standardization (ISO) standard for coding language 244 names [ISO639], but expands it to provide additional precision. 246 2.1.6. ISO/IEC 10646: The international standard universal 247 multiple-octet coded character set ("UCS") [IS10646]. The Code Point 248 definitions of this standard are identical to those of corresponding 249 versions of the Unicode standard (see below). Consequently, the 250 characters and their coding are often referred to as "Unicode 251 characters." 253 2.1.7. Unicode Character: The term "Unicode character" is used here in 254 reference to characters chosen from the Unicode Standard Version 3.2 255 [UNICODE] (and hence from ISO/IEC 10646). In this document, the 256 characters are identified by their positions, or "Code Points." The 257 notation U+12AB, for example, indicates the character at the position 258 12AB (hexadecimal) in the Unicode 3.2 table. For characters in 259 positions above FFFF—i.e., requiring more than sixteen bits to 260 represent--a five to eight-character string is used, such as U+112AB for 261 the character in position 12AB of plane 1. 263 2.1.8. Unicode String: "Unicode string" refers to a string of Unicode 264 characters. The Unicode string is identified by the sequence of the 265 Unicode characters regardless of the encoding scheme. 267 2.1.9. CJK Characters: CJK characters are characters commonly used in 268 the Chinese, Japanese, or Korean languages, including but not limited to 269 those defined in the Unicode Standard as ASCII (U+0020 to U+007F), Han 270 ideographs (U+3400 to U+9FAF and U+20000 to U+2A6DF), Bopomofo (U+3100 271 to U+312F and U+31A0 to U+31BF), Kana (U+3040 to U+30FF), Jamo (U+1100 272 to 11FF and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF and U+3130 to 273 U+318F), and the respective compatibility forms. The particular 274 characters that are permitted in a given zone are specified in the 275 Language Variant Table(s) for that zone. 277 2.1.10. Label String: A generic term referring to a string of 278 characters that is a candidate for registration in the DNS or such a 279 string, once registered. A label string may or may not be valid 280 according to the rules of this specification and may even be invalid for 281 IDNA use. The term "label", by itself, refers to a string that has been 282 validated and may be formatted to appear in a DNS zone file. 284 2.1.11. Language Variant Table: The key mechanisms of this 285 specification utilize a three-column table, called a Language Variant 286 Table, for each language permitted to be registered in the zone. Those 287 columns are known, respectively, as "Valid Code Point", "Preferred 288 Variant", and "Character Variant", which are defined separately below. 289 The Language Variant Tables are critical to the success of the guideline 290 described in this document. However, the principles to be used to 291 generate the tables are not within the scope of this document and should 292 be worked out by each registry separately (perhaps by adopting or 293 adapting the work of some other registry). In this document, "Table" 294 and "Variant Table" are used as short forms for Language Variant Table. 296 2.1.12. Valid Code Point: In a Language Variant Table, the list of Code 297 Points that is permitted for that language. Any other Code Points, or 298 any string containing them, will be rejected by this specification. The 299 Valid Code Point list appears as the first column of the Language 300 Variant Table. 302 2.1.13. Preferred Variant: In a Language Variant Table, a list of Code 303 Points corresponding to each Valid Code Point and providing possible 304 substitutions for it. These substitutions are "preferred" in the sense 305 that the variant labels generated using them are normally registered in 306 the zone file, or "activated." The Preferred Code Points appear in 307 column 2 of the Language Variant Table. "Preferred Code Point" is used 308 interchangeably with this term. 310 2.1.14. Character Variant: In a Language Variant Table, a second list 311 of Code Points corresponding to each Valid Code Point and providing 312 possible substitutions for it. Unlike the Preferred Variants, 313 substitutions based on Character Variants are normally reserved but not 314 actually registered (or "activated"). Character Variants appear in 315 column 3 of the Language Variant Table. The term "Code Point Variants" 316 is used interchangeably with this term. 318 2.1.15. Preferred Variant Label: A label generated by use of Preferred 319 Variants (or Preferred Code Points). 321 2.1.16. Character Variant Label: A label generated by use of Character 322 Variants. 324 2.1.17. Zone Variant: A Preferred or Character Variant Label that is 325 actually to be entered (registered) into the DNS--that is, into the zone 326 file for the relevant zone. Zone Variants are also referred to as Zone 327 Variant Labels or Active (or Activated) Labels. 329 2.1.18. IDL Package: A collection of IDLs as determined by these 330 Guidelines. All labels in the package are "reserved", meaning they 331 cannot be registered by anyone other than the holder of the Package. 332 These reserved IDLs may be "activated", meaning they are actually 333 entered into a zone file as a "Zone Variant". The IDL Package also 334 contains identification of the language(s) associated with the 335 registration process. The IDL and its variant labels form a single, 336 atomic unit. 338 2.2 Notation for Ideographs and Other Non-ASCII CJK Characters. 340 For purposes of clarity, particularly in regard to examples, Han 341 ideographs appear in several places in this document. However, they do 342 not appear in the ASCII version of this document. For the convenience 343 of readers of the ASCII version--and some readers not familiar with 344 recognizing and distinguishing Chinese characters--most uses of these 345 characters will be associated with both their Unicode Code Points and an 346 "asterisk tag" with its corresponding Chinese Romanization [ISO7098], 347 with the tone mark represented by a number from 1 to 4. Those tags have 348 no meaning outside this document; they are a quick visual and reading 349 reference to help facilitate the combinations and transformations of 350 characters in the guideline and table excerpts. 352 3. Scope of the Administrative Guidelines 354 Zone administrators are responsible for the administration of the domain 355 name labels under their control. A zone administrator might be 356 responsible for a large zone, such as a top-level domain (TLD)--whether 357 generic or country code--or a smaller one, such as a typical second- or 358 third-level domain. A large zone is often more complex than its smaller 359 counterpart. However, actual technical administrative tasks--such as 360 addition, deletion, delegation, and transfer of zones between domain 361 name holders--are similar for all zones. 363 This document provides guidelines for the ways CJK characters should be 364 handled within a zone, for how language issues should be considered and 365 incorporated, and for how Domain Name Labels containing CJK characters 366 should be administered (including registration, deletion, and transfer 367 of labels). It does not provide any guidance for the handling of 368 non-CKJ characters or languages in zones. 370 Other IDN policies--such as the creation of new top-level domains 371 (TLDs), the cost structure for registrations, and how the processes 372 described here get allocated between registrar and registry if the zone 373 makes that distinction--also are outside the scope of this document. 375 Technical implementation issues are not discussed here either. For 376 example, deciding which guidelines should be implemented as registry 377 actions and which should be registrar actions is left to zone 378 administrators, with the possibility that it will differ from zone to 379 zone. 381 3.1. Principles Underlying These Guidelines 383 In many places, in the event of a dispute over rights to a name (or, 384 more accurately, DNS label string), this document assumes "first-come, 385 first-served" (FCFS) as a resolution policy even though FCFS is not 386 listed below as one of the principles for this document. If policies 387 are already in place governing priorities and "rights", one can use the 388 guidelines here by replacing uses of FCFS in this document with policies 389 specific to the zone. Some of the guidelines here may not be applicable 390 to other policies for determining rights to labels. Still other 391 alternatives--such as use of UDRP [WIPO-UDRP] or mutual exclusion--might 392 have little impact on other aspects of these guidelines. 394 (a) Although some Unicode strings may be pure identifiers made up of an 395 assortment of characters from many languages and scripts, IDLs are 396 likely to be "words" or "names" or "phrases" that have specific meaning 397 in a language. While a zone administration might or might not require 398 "meaning" as a registration criterion, meaning could prove to be a 399 useful tool for avoiding user confusion. 401 Each IDL to be registered should be associated administratively 402 with one or more languages. 404 Language associations should either be predetermined by the zone 405 administrator and applied to the entire zone or be chosen by the 406 registrants on a per-IDL basis. The latter may be necessary for some 407 zones, but it will make administration more difficult and will increase 408 the likelihood of conflicts in variant forms. 410 A given zone might have multiple languages associated with it or 411 it may have no language specified at all. Omitting specification 412 of a language may provide additional opportunities for user 413 confusion and is therefore NOT recommended. 415 (b) Each language uses only a subset of Unicode characters. Therefore, 416 if an IDL is associated with a language, it is not permitted to contain 417 any Unicode character that is not within the valid subset for that 418 language. 420 Each IDL to be registered must be verified against the valid subset 421 of Unicode for the language(s) associated with the IDL. That subset 422 is specified by the list of characters appearing in the first column 423 of the language and zone-specific tables as described later in this 424 document. 426 If the IDL fails this test for any of its associated languages, the IDL 427 is not valid for registration. 429 Note that this verification is not necessarily linguistically accurate, 430 because some languages have special rules. For example, some languages 431 impose restrictions on the order in which particular combinations of 432 characters may appear. Characters that are valid for the language--and 433 hence permitted by this specification--might still not form valid words 434 or even strings in the language. 436 (c) When an IDL is associated with a language, it may have Character 437 Variants that depend on that language associated with it in addition to 438 any Preferred Variants. These variants are potential sources of 439 confusion with the Code Points in the original label string. 440 Consequently, the labels generated from them should be unavailable to 441 registrants of other names, words, or phrases. 443 During registration, all labels generated from the Character 444 Variants for the associated language(s) of the IDL should be 445 reserved. 447 IDL reservations of the type described here normally do not appear in 448 the distributed DNS zone file. In other words, these reserved IDLs may 449 not resolve. Domain name holders could request that these reserved IDLs 450 be placed in the zone file and made active and resolvable. 452 Zones will need to establish local policies about how they are to be 453 made active. Specifically, many zones, especially at the top level, 454 have prohibited or restricted the use of "CNAME"s--DNS 455 aliases--especially CNAMEs that point to nameserver delegation records 456 (NS records). And long-term use of long-term aliases for domain 457 hierarchies, rather than single names ("DNAME records") are considered 458 problematic because of the recursion they can introduce into DNS 459 lookups. 461 (d) When an IDL is a "name", "word", or "phrase", it will have Character 462 Variants depending on the associated language. Furthermore, one or more 463 of those Character Variants will be used more often than others for 464 linguistic, political, or other reasons. These more commonly used 465 variants are distinguished from ordinary Character Variants and are 466 known as Preferred Variant(s) for the particular language. 468 To increase the likelihood of correct and predictable resolution of 469 the IDN by end users, all labels generated from the Preferred 470 Variants for the associated language(s) should be resolvable. 472 In other words, the Preferred Variant Labels should appear in the 473 distributed DNS zone file. 475 (e) IDLs associated with one or more languages may have a large number 476 of Character Variant Labels or Preferred Variant Labels. Some of these 477 labels may include combinations of characters that are meaningless or 478 invalid linguistically. It may therefore be appropriate for a zone to 479 adopt procedures that include only linguistically-acceptable labels in 480 the IDL Package. 482 A zone administrator may impose additional rules and other 483 processing activities to limit the number of Character Variant 484 Labels or Preferred Variant Labels that are actually reserved or 485 registered. 487 These additional rules and other processing activities are based on 488 policies and/or procedures imposed on a per-zone basis and therefore are 489 not within the scope of this document. Such policies or procedures 490 might be used, for example, to restrict the number of Preferred Variant 491 Labels actually reserved or to prevent certain words from being 492 registered at all. 494 (f) There are some Character Variant Labels and Preferred Variant Labels 495 that are associated with each IDL. These labels are considered 496 "equivalent" to each another. To avoid confusion, they all should be 497 assigned to a single domain name holder. 499 The IDL and its variant labels should be grouped together into a 500 single atomic unit, known in this document as an "IDL Package". 502 The IDL Package is created upon registration and is atomic: Transfer and 503 deletion of an IDL is performed on the IDL Package as a whole. That is, 504 an IDL within the IDL Package may not be transferred or deleted 505 individually; any re-registration, transfers, or other actions that 506 impact the IDL should also affect the other variants. 508 The name-conflict resolution policy associated with this zone could 509 result in a conflict with the principle of IDL Package atomicity. In 510 such a case, the policy must be defined to make the precedence clear. 512 3.2. Registration of IDL 514 To conform to the principles described in 3.1, this document introduces 515 two concepts: the Language Variant Table and the IDL Package. These are 516 described in the next two subsections, followed by a description of the 517 algorithm that is used to interpret the table and generate variant 518 labels. 520 3.2.1. Using the Language Variant Table 522 For each zone that uses a given language, each language should have its 523 own Language Variant Table. The table consists of a header section that 524 identifies references and version information, followed by a section 525 with one row for each Code Point that is valid for the language and 526 three columns.. 528 a) The first column contains the subset of Unicode characters that is 529 valid to be registered ("Valid Code Point"). This is used to verify the 530 IDL to be registered (see 3.1b). As in the registration procedure 531 described later, this column is used as an index to examine characters 532 that appear in a proposed IDL to be processed. The collection of Valid 533 Code Points in the table for a particular language can be thought of as 534 defining the script for that language, although the normal definition of 535 a script would not include, for example, ASCII characters with CJK ones. 537 b) The second column contains the Preferred Variant(s) of the 538 corresponding Unicode character in column one ("Valid Code Point"). 539 These variant characters are used to generate the Preferred Variant 540 Labels for the IDL. Those labels should be resolvable (see 3.1d). 541 Under normal circumstances, all of those Preferred Variant Labels will 542 be activated in the relevant zone file so that they will resolve when 543 the DNS is queried for them. 545 c) The third column contains the Character Variant(s) for the 546 corresponding Valid Code Point. These are used to generate the 547 Character Variant Labels of the IDL, which are then to be reserved (see 548 3.1c). Registration--or activation--of labels generated from Character 549 Variants will normally be a registrant decision, subject to local 550 policy. 552 Each entry in a column consists of one or more Code Points, expressed as 553 a numeric character number in the Unicode table and optionally followed 554 by a parenthetical reference. The first column--or Valid Code Point-- 555 may have only one Code Point specified in a given row. The other 556 columns may have more than one. 558 Any row may be terminated with an optional comment, starting in "#". 560 The formal syntax of the table and more-precise definitions of some of 561 its organization appear in Section 5. 563 The Language Variant Table should be provided by a relevant group, 564 organization, or body. However, the question of who is relevant or has 565 the authority to create this table and the rules that define it is 566 beyond the scope of this document. 568 3.2.2. IDL Package 570 The IDL Package is created on successful registration and consists of: 572 a) the IDL registered 574 b) the language(s) associated with the IDL 576 c) the reserved IDLs 578 d) active IDLs--that is, "Zone Variant Labels" that are to appear in 579 the DNS zone file 581 3.2.3. Procedure for Registering IDLs 583 An explanation follows each step. 585 Step 1. IN <= IDL to be registered and 586 {L} <= Set of languages associated with IN 588 Start the process with the label string (prospective IDL) to be 589 registered and the associated language(s) as input. 591 Step 2. Generate the Nameprep-processed version of the IN, applying 592 all mappings and canonicalization required by IDNA. 594 The prospective IDL is processed by using Nameprep to apply the 595 normalizations and exclusions globally required to use IDNA. If the 596 Nameprep processing fails, then the IDL is invalid and the registration 597 process must stop. 599 Step 2.1. NP(IN) <= Nameprep processed IN 600 Step 2.2. Check availability of NP(IN). 601 If not available, route to conflict policy. 603 The Nameprep-processed IDL is then checked against the contents of the 604 zone file and previously created IDL Packages. If it is already 605 registered or reserved, then a conflict exists that must be resolved by 606 applying whatever policy is applicable for the zone. For example, if 607 FCFS is used, the registration process terminates unless the conflict 608 resolution policy provides another alternative. 610 Step 3. Process each language. 611 For each language (AL} in {L} 613 Step 3 goes through all languages associated with the proposed IDL and 614 checks each character (after Nameprep has been applied) for validity in 615 each of them. It then applies the Preferred Variants (column 2 values) 616 and the Character Variants (column 3 values) to generate candidate 617 labels. 619 Step 3.1. Check validity of NP(IN) in AL. If failed, stop processing. 621 In step 3.1, IDL validation is done by checking that every Code Point in 622 the Nameprep-processed IDL is a Code Point allowed by the "Valid Code 623 Point" column of the Character Variant Table for the language. This is 624 then repeated for any other languages (and hence, Language Variant 625 Tables) specified in the registration. If one or more Code Points are 626 not valid, the registration process terminates. 628 Step 3.2. PV(IN,AL) <= Set of available Nameprep-processed Preferred 629 Variants of NP(IN) in AL 631 Step 3.2 generates the list of Preferred Variant Labels of the IDL by 632 doing a combination (see Step 3.2A below) of all possible variants 633 listed in the "Preferred Variant(s)" column for each Code Point in the 634 Nameprep-processed IDL. The generated Preferred Variant Labels must be 635 processed through Nameprep. If the Nameprep processing fails for any 636 Preferred Variant Label (this is unlikely to occur if the Preferred 637 Variants [Code Points] are processed through Nameprep before being 638 placed in the table), then that variant label will be removed from the 639 list. The remaining Preferred Variant Labels in the list are then 640 checked to see whether they are already registered or reserved. If any 641 are registered or reserved, then the conflict resolution policy will 642 apply. In general, this will not prevent the originally requested IDL 643 from being registered unless the policy prevents such registration. For 644 example, if FCFS is applied, then the conflicting variants will be 645 removed from the list, but the originally requested IDL and any 646 remaining variants will be registered (see steps 5 and 8 below). 648 Step 3.2A Generating variant labels from Variant Code Points. 650 Steps 3.2 and 3.3 require that the Preferred Variants and Character 651 Variants be combined with the original IDL to form sets of variant 652 labels. Conceptually, one starts with the original, Nameprep-processed, 653 IDL and examines each of its characters in turn. If a character is 654 encountered for which there is a corresponding Preferred Variant or 655 Character Variant, a new variant label is produced with the Variant Code 656 Point substituted for the original one. If variant labels already exist 657 as the result of the processing of characters that appeared earlier in 658 the original IDL, then the substitutions are made in them as well, 659 resulting in additional generated variant labels. This operation is 660 repeated separately for the Preferred Variants (in Step 3.2) and 661 Character Variants (in Step 3.3). Of course, equivalent results could 662 be achieved by processing the original IDL’s characters in order, 663 building the Preferred Variant Label set and Character Variant Label set 664 in parallel. 666 This process will sometimes generate a very large number of labels. For 667 example, if only two of the characters in the original IDL are 668 associated with Preferred Variants and if the first of those characters 669 has three Preferred Variants and the second has two, one ends up with 12 670 variant labels to be placed in the IDL Package and, normally, in the 671 zone file. Repeating the process for Character Variants, if any exist, 672 would further increase the number of labels. And if more than one 673 language is specified for the original IDL, then repetition of the 674 process for additional languages (see step 4, below) might further 675 increase the size of the set. 677 For illustrative purposes, the "combination" process could be achieved 678 by a recursive function similar to the following pseudocode: 680 Function Combination(Str) 681 F <= first codepoint of Str 682 SStr <= Substring of Str, without the first code point 683 NSC <= {} 685 If SStr is empty then 686 For each V in (Variants of code point F) 687 NSC = NSC set-union (the string with the code point V) 688 End of Loop 689 Else 690 SubCom = Combination(SStr) 691 For each V in (Variants of code point F) 692 For each SC in SubCom 693 NSC = NSC set-union (the string with the first code point V 694 followed by the string SC) 695 End of Loop 696 End of Loop 697 Endif 699 Return NSC 701 Step 3.3. CV(IN,AL) <= Set of available Nameprep-processed Character 702 Variants of NP(IN) in AL 704 This step generates the list of Character Variant Labels by doing a 705 combination (see Step 3.2A above) of all the possible variants listed in 706 the "Character Variant(s)" column for each Code Point in the 707 Nameprep-processed original IDL. As with the Preferred Variant Labels, 708 the generated Character Variant Labels must be processed by, and 709 acceptable to, Nameprep. If the Nameprep processing fails for a 710 Character Variant Label, then that variant label will be removed from 711 the list. The remaining Character Variant Labels are then checked to be 712 sure they are not registered or reserved. If one or more are, then the 713 conflict resolution policy is applied. As with Preferred Variant 714 Labels, a conflict that is resolved in favor of the earlier registrant 715 does not, in general, prevent the IDL from being registered, nor the 716 remaining variants from being reserved in step 6 below. 718 Step 3.4. End of Loop 720 Step 4. Let PVall be the set-union of all PV(IN,AL) 722 Step 4 generates the Preferred Variants Label for all languages. 723 In this step, and again in step 6 below, the zone administrator may 724 impose additional rules and processing activities to restrict the number 725 of Preferred (tentatively to be reserved and activated) and Character 726 (tentatively to be reserved) Label Variants. These additional rules and 727 processing activities are zone policy specific and therefore are not 728 specified in this document. 730 Step 5. {ZV} <= PVall set-union NP(IN) 732 Step 5 generates the initial Zone Variants. The set includes all 733 Preferred Variants for all languages and the original Nameprep-processed 734 IDL. Unless excluded by further processing, these Zone Variants will be 735 activated--that is, placed into the DNS zone. Note that the "set-union" 736 operation will eliminate any duplicates. 738 Step 6. Let CVall be the set-union of all CV(IN,AL), set-minus {ZV} 740 Step 6 generates the Reserved Label Variants (the Character Variant 741 Label set). These labels are normally reserved but not activated. The 742 set includes all Character Variant Labels for all languages, but not the 743 Zone Variants defined in the previous step. The set-union and set-minus 744 operations eliminate any duplicates. 746 Step 7. Create IDL Package for IN using IN, {L}, {ZV} and CVall 748 In Step 7, the "IDL Package" is created using the original IDL, the 749 associated language(s), the Zone Variant Labels, and the Reserved 750 Variant Labels. If zone-specific additional processing or filtering is 751 to be applied to eliminate linguistically inappropriate or other forms, 752 it should be applied before the IDL Package is actually assembled. 754 Step 8. Put {ZV} into zone file 756 The activated IDLs are converted via ToASCII with UseSTD13ASCIIRules 757 [IDNA] before being placed into the zone file. This conversion results 758 in the IDLs being in the actual IDNA ("Punycode") form used in zone 759 files, while the IDLs have been carried in Unicode form up to this 760 point. If ToASCII fails for any of the activated IDLs, that IDL must 761 not be placed into the zone file. If the IDL is a subdomain name, it 762 will be delegated. 764 3.3. Deletion and Transfer of IDL and IDL Package 766 In traditional domain administration, every Domain Name Label is 767 independent of all other Domain Name Labels. Registration, deletion, 768 and transfer of labels is done on a per-label basis. However, with the 769 guidelines discussed here, each IDL is associated with specific 770 languages, with all label variants--both active (zone) and reserved-- 771 together in an IDL Package. This quite deliberately prohibits labels 772 that contain sufficient mixtures of characters from different scripts 773 to make them impossible as words in any given language. If a zone 774 chooses to not impose that restriction--that is, to permit labels to 775 be constructed by picking characters from several different languages 776 and scripts--then the guidelines described here would be inappropriate. 778 As stated earlier, the IDL package should be treated as a single atomic 779 unit and all variants of the IDL should belong to a single domain-name 780 holder. If the local policy related to the handling of disagreements 781 requires a particular IDL to be transferred and deleted independently of 782 the IDL Package, the conflict policy would take precedence. In such an 783 event, the conflict policy should include a transfer or delete procedure 784 that takes the nature of IDL Packages into consideration. 786 When an IDL Package is deleted, all of the Zone and Reserved Label 787 Variants again become available. The deletion of one IDL Package does 788 not change any other IDL Packages. 790 3.4. Activation and Deactivation of IDL variants 792 Because there are active (registered) IDLs and inactive (reserved but 793 not registered) IDLs within an IDL package, processes are required to 794 activate or deactivate IDL variants within an IDL Package. 796 3.4.1. Activation Algorithm 798 Step 1. IN <= IDL to be activated and PA <= IDL Package 800 Start with the IDL to be activated and the IDL Package of which it is a 801 member. 803 Step 2. NP(IN) <= Nameprep processed IN 805 Process the IDL through Nameprep. This step should never cause a 806 problem, or even a change, since all labels that become part of the IDL 807 Package are processed through Nameprep in Step 3.2 or 3.3 of the 808 Registration procedure (section 3.2.3). 810 Step 3. If NP(IN) not in {RV} then stop 812 Verify that the Nameprep-processed version of the IDL appears as a 813 still-unactivated label in the IDL Package, i.e., in the list of 814 Reserved Label Variants, {RV}. It might be a useful "sanity check" to 815 also verify that it does not already appear in the zone file. 817 Step 4. {RV} <= {RV} set-minus NP(IN) and {ZV} <= {ZV} set-union NP(IN) 819 Within the IDL Package, remove the Nameprep-processed version of the IDL 820 from the list of Reserved Label Variants and add it to the list of 821 active (zone) label variants. 823 Step 5. Put {ZV} into the zone file 825 Actually register (activate) the Zone Variant Labels. 827 3.4.2. Deactivation Algorithm 829 Step 1. IN <= IDL to be deactivated and PA <= IDL Package 831 As with activation, start with the IDL to be deactivated and the IDL 832 Package of which it is a member. 834 Step 2. NP(IN) <= Nameprep processed IN 836 Get the Nameprep-processed version of the name (see discussion in the 837 previous section). 839 Step 3. If NP(IN) not in {ZV} then stop 841 Verify that the Nameprep-processed version of the IDL appears as an 842 activated (zone) label variant in the IDL Package. It might be a useful 843 "sanity check" at this point to also verify that it actually appears in 844 the zone file. 846 Step 4. {RV} <= {RV} set-union NP(IN) and {ZV} <= {ZV} set-minus NP(IN) 848 Within the IDL Package, remove the Nameprep-processed version of the IDL 849 from the list of Active (Zone) Label Variants and add it to the list of 850 Reserved (but inactive) Label Variants. 852 Step 5. Put {ZV} into the zone file 854 3.5. Managing Changes in Language Associations 856 Since the IDL package is an atomic unit and the associated list of 857 variants must not be changed after creation, this document does not 858 include a mechanism for adding and deleting language associations within 859 the IDL package. Instead, it recommends deleting the IDL package 860 entirely, followed by a registration with the new set of languages. 861 Zone administrators may find it desirable to devise procedures that 862 prevent other parties from capturing the labels in the IDL Package 863 during these operations. 865 3.6. Managing Changes to the Language Variant Tables 867 Language Variant Tables are subject to changes over time, and these 868 changes may or may not be backward compatible. It is possible that 869 updated Language Variant Tables may produce a different set of Preferred 870 Variants and Reserved Variants. 872 In order to preserve the atomicity of the IDL Package, when the Language 873 Variant Table is changed, IDL Packages created using the previous 874 version of the Language Variant Table must not be updated or affected. 876 4. Examples of Guideline Use in Zones 878 To provide a meaningful example, some Language Variant Tables must be 879 defined. Assume, then, for the purpose of giving examples, that the 880 following four Language Variant Tables are defined: 882 Note: these tables are not a representation of the actual tables, and 883 they do not contain sufficient entries to be used in any actual 884 implementation. 886 a) Language Variant Table for zh-cn and zh-sg 888 Reference 1 CP936 (commonly known as GBK) 889 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt 890 Reference 3 List of Simplified character Table (Simplified column) 891 Reference 4 zSimpVariant in Unihan.txt 892 Reference 5 variant that exists in GB2312, common simplified hanzi 894 Version 1 20020701 # July 2002 896 56E2(1);56E2(5);5718(2) # sphere, ball, circle; mass, lump 897 5718(1);56E2(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump 898 60F3(1);60F3(5); # think, speculate, plan, consider 899 654E(1);6559(5);6559(2) # teach 900 6559(1);6559(5);654E(2) # teach, class 901 6DF8(1);6E05(5);6E05(2) # clear 902 6E05(1);6E05(5);6DF8(2) # clear, pure, clean; peaceful 903 771E(1);771F(5);771F(2) # real, actual, true, genuine 904 771F(1);771F(5);771E(2) # real, actual, true, genuine 905 8054(1);8054(3);806F(2) # connect, join; associate, ally 906 806F(1);8054(3);8054(2),8068(2) # connect, join; associate, ally 907 96C6(1);96C6(5); # assemble, collect together 909 b) Language Variant Table for zh-tw 911 Reference 1 CP950 (commonly known as BIG5) 912 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt 913 Reference 3 List of Simplified Character Table (Traditional column) 914 Reference 4 zTradVariant in Unihan.txt 916 Version 1 20020701 # July 2002 918 5718(1);5718(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump 919 60F3(1);60F3(1); # think, speculate, plan, consider 920 6559(1);6559(1);654E(2) # teach, class 921 6E05(1);6E05(1);6DF8(2) # clear, pure, clean; peaceful 922 771F(1);771F(1);771E(2) # real, actual, true, genuine 923 806F(1);806F(3);8054(2),8068(2) # connect, join; associate, ally 924 96C6(1);96C6(1); # assemble, collect together 926 c) Language Variant Table for ja 928 Reference 1 CP932 (commonly known as Shift-JIS) 929 Reference 2 zVariant in Unihan.txt 930 Reference 3 variant that exists in JIS X0208, commonly used Kanji 932 Version 1 20020701 # July 2002 934 5718(1);5718(3);56E3(2) # sphere, ball, circle; mass, lump 935 60F3(1);60F3(3); # think, speculate, plan, consider 936 654E(1);6559(3);6559(2) # teach 937 6559(1);6559(3);654E(2) # teach, class 938 6DF8(1);6E05(3);6E05(2) # clear 939 6E05(1);6E05(3);6DF8(2) # clear, pure, clean; peaceful 940 771E(1);771E(1);771F(2) # real, actual, true, genuine 941 771F(1);771F(1);771E(2) # real, actual, true, genuine 942 806F(1);806F(1);8068(2) # connect, join; associate, ally 943 96C6(1);96C6(3); # assemble, collect together 945 d) Language Variant Table for ko 947 Reference 1 CP949 (commonly known as EUC-KR) 948 Reference 2 zVariant and K-source in Unihan.txt 950 Version 1 20020701 # July 2002 952 5718(1);5718(1);56E3(2) # sphere, ball, circle; mass, lump 953 60F3(1);60F3(1); # think, speculate, plan, consider 954 654E(1);654E(1);6559(2) # teach 955 6DF8(1);6DF8(1);6E05(2) # clear 956 771E(1);771E(1);771F(2) # real, actual, true, genuine 957 806F(1);806F(1);8068(2) # connect, join; associate, ally 958 96C6(1);96C6(1); # assemble, collect together 960 Example 1: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 961 {L} = {zh-cn, zh-sg, zh-tw} 963 NP(IN) = (U+6E05 U+771F U+6559) 964 PV(IN,zh-cn) = (U+6E05 U+771F U+6559) 965 PV(IN,zh-sg) = (U+6E05 U+771F U+6559) 966 PV(IN,zh-tw) = (U+6E05 U+771F U+6559) 967 {ZV} = (U+6E05 U+771F U+6559)} 968 CVall = (U+6E05 U+771E U+6559), 969 (U+6E05 U+771E U+654E), 970 (U+6E05 U+771F U+654E), 971 (U+6DF8 U+771E U+6559), 972 (U+6DF8 U+771E U+654E), 973 (U+6DF8 U+771F U+6559), 974 (U+6DF8 U+771F U+654E)} 976 Example 2: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 977 {L} = {ja} 979 NP(IN) = (U+6E05 U+771F U+6559) 980 PV(IN,ja) = (U+6E05 U+771F U+6559) 981 {ZV} = (U+6E05 U+771F U+6559)} 982 CVall = (U+6E05 U+771E U+6559), 983 (U+6E05 U+771E U+654E), 984 (U+6E05 U+771F U+654E), 985 (U+6DF8 U+771E U+6559), 986 (U+6DF8 U+771E U+654E), 987 (U+6DF8 U+771F U+6559), 988 (U+6DF8 U+771F U+654E)} 990 Example 3: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 991 {L} = {zh-cn, zh-sg, zh-tw, ja, ko} 993 NP(IN) = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 994 Invalid registration because U+6E05 is invalid in L = ko 996 Example 4: IDL = (U+806F U+60F3 U+96C6 U+5718) 997 *lian2 xiang3 ji2 tuan2* 998 {L} = {zh-cn, zh-sg, zh-tw} 1000 NP(IN) = (U+806F U+60F3 U+96C6 U+5718) 1001 PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) 1002 PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) 1003 PV(IN,zh-tw) = (U+806F U+60F3 U+96C6 U+5718) 1004 {ZV} = (U+8054 U+60F3 U+96C6 U+56E2), 1005 (U+806F U+60F3 U+96C6 U+5718)} 1006 CVall = (U+8054 U+60F3 U+96C6 U+56E3), 1007 (U+8054 U+60F3 U+96C6 U+5718), 1008 (U+806F U+60F3 U+96C6 U+56E2), 1009 (U+806f U+60F3 U+96C6 U+56E3), 1010 (U+8068 U+60F3 U+96C6 U+56E2), 1011 (U+8068 U+60F3 U+96C6 U+56E3), 1012 (U+8068 U+60F3 U+96C6 U+5718) 1014 Example 5: IDL = (U+8054 U+60F3 U+96C6 U+56E2) 1015 *lian2 xiang3 ji2 tuan2* 1016 {L} = {zh-cn, zh-sg} 1018 NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) 1019 PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) 1020 PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) 1021 {ZV} = (U+8054 U+60F3 U+96C6 U+56E2)} 1022 CVall = (U+8054 U+60F3 U+96C6 U+56E3), 1023 (U+8054 U+60F3 U+96C6 U+5718), 1024 (U+806F U+60F3 U+96C6 U+56E2), 1025 (U+806f U+60F3 U+96C6 U+56E3), 1026 (U+806F U+60F3 U+96C6 U+5718), 1027 (U+8068 U+60F3 U+96C6 U+56E2), 1028 (U+8068 U+60F3 U+96C6 U+56E3), 1029 (U+8068 U+60F3 U+96C6 U+5718)} 1031 Example 6: IDL = (U+8054 U+60F3 U+96C6 U+56E2) 1032 *lian2 xiang3 ji2 tuan2* 1033 {L} = {zh-cn, zh-sg, zh-tw} 1035 NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) 1036 Invalid registration because U+8054 is invalid in L = zh-tw 1038 Example 7: IDL = (U+806F U+60F3 U+96C6 U+5718) 1039 *lian2 xiang3 ji2 tuan2* 1040 {L} = {ja,ko} 1042 NP(IN) = (U+806F U+60F3 U+96C6 U+5718) 1043 PV(IN,ja) = (U+806F U+60F3 U+96C6 U+5718) 1044 PV(IN,ko) = (U+806F U+60F3 U+96C6 U+5718) 1045 {ZV} = (U+806F U+60F3 U+96C6 U+5718)} 1046 CVall = (U+806F U+60F3 U+96C6 U+56E3), 1047 (U+8068 U+60F3 U+96C6 U+5718), 1048 (U+8068 U+60F3 U+96C6 U+56E3)} 1050 5. Syntax Description for the Language Variant Table 1052 The formal syntax for the Language Variant Table is as follows, using 1053 the IETF "ABNF" metalanguage [ABNF]. Some comments on this syntax 1054 appear immediately after it. 1056 5.1 ABNF Syntax 1058 LanguageVariantTable = 1*ReferenceLine VersionLine 1*EntryLine 1059 ReferenceLine = "Reference" SP RefNo SP RefDesciption [ Comment ] CRLF 1060 RefNo = 1*DIGIT 1061 RefDesciption = *[VCHAR] 1062 VersionLine = "Version" SP VersionNo SP VersionDate [ Comment ] CRLF 1063 VersionNo = 1*DIGIT 1064 VersionDate = YYYYMMDD 1065 EntryLine = VariantEntry/Comment CRLF 1066 VariantEntry = ValidCodePoint ";" 1067 PreferredVariant ";" CharacterVariant [ Comment ] 1068 ValidCodePoint = CodePoint 1069 RefList = RefNo 0*( "," RefNo ) 1070 PreferredVariant = CodePointSet 0*( "," CodePointSet ) 1071 CharacterVariant = CodePointSet 0*( "," CodePointSet ) 1072 CodePointSet = CodePoint 0*( SP CodePoint ) 1073 CodePoint = 4*8DIGIT [ "(" Reflist ")" ] 1074 Comment = "#" *VCHAR 1076 YYYYMMDD is an integer, in alphabetic form, representing a date, where 1077 YYYY is the 4-digit year, MM is the 2-digit month, and DD is the 2-digit 1078 day. 1080 5.2. Comments and Explanation of Syntax 1082 Any lines starting with, or portions of lines after, the hash 1083 symbol("#") are treated as comments. Comments have no significance in 1084 the processing of the tables; nor are there any syntax requirements 1085 between the hash symbol and the end of the line. Blank lines in the 1086 tables are ignored completely. 1088 Every language should have its own Language Variant Table provided by a 1089 relevant group, organization, or other body. That table will normally 1090 be based on some established standard or standards. The group that 1091 defines a Language Variant Table should document references to the 1092 appropriate standards at the beginning of the table, tagged with the 1093 word "Reference" followed by an integer (the reference number) followed 1094 by the description of the reference. For example: 1096 Reference 1 CP936 (commonly known as GBK) 1097 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt 1098 Reference 3 List of Simplified Character Table (Simplified column) 1099 Reference 4 zSimpVariant in Unihan.txt 1100 Reference 5 Variant that exists in GB2312, common simplified Hanzi 1102 Each Language Variant Table must have a version number and its release 1103 date. This is tagged with the word "Version" followed by an integer 1104 then followed by the date in the format YYYYMMDD, where YYYY is the 1105 4-digit year, MM is the 2-digit month, and DD is the 2-digit day of the 1106 publication date of the table. 1108 Version 1 20020701 # July 2002 Version 1 1110 The table has three columns, separated by semicolons: "Valid Code 1111 Point"; "Preferred Variant(s)"; and "Character Variant(s)". 1113 The "Valid Code Point" is the subset of Unicode characters that are 1114 valid to be registered. 1116 There can be more than one Preferred Variant; hence there could be 1117 multiple entries in the "Preferred Variant(s)" column. If the 1118 "Preferred Variant(s)" column is empty, then there is no corresponding 1119 Preferred Variant; in other words, the Preferred Variant is null. 1120 Unless local policy dictates otherwise, the procedures above will result 1121 in only those labels that reflect the valid code point being activated 1122 (registered) into the zone file. 1124 The "Character Variant(s)" column contains all Character Variants of the 1125 Code Point. Since the Code Point is always a variant of itself, to 1126 avoid redundancy, the Code Point is assumed to be part of the "Character 1127 Variant(s)" and need not be repeated in the "Character Variant(s)" 1128 column. 1130 If the variant in the "Preferred Variant(s)" or the "Character 1131 Variant(s)" column is composed of a sequence of Code Points, then 1132 sequence of Code Points is listed separated by a space. 1134 If there are multiple variants in the "Preferred Variant(s)" or the 1135 "Character Variant(s)" column, then each variant is separated by a 1136 comma. 1138 Any Code Point listed in the "Preferred Variant(s)" column must be 1139 allowed by the rules for the relevant language to be registered. 1140 However, this is not a requirement for the entries in the "Character 1141 Variant(s)" column; it is possible that some of those entries may not be 1142 allowed to be registered. 1144 Every Code Point in the table should have a corresponding reference 1145 number (associated with the references) specified to justify the entry. 1146 The reference number is placed in parentheses after the Code Point. If 1147 there is more than one reference, then the numbers are placed within a 1148 single set of parentheses and separated by commas. 1150 6. Security Considerations 1152 As discussed in the Introduction, substantially-unrestricted use of 1153 international (non-ASCII) characters in domain name labels may cause 1154 user confusion and invite various types of attacks. In particular, in 1155 the case of CJK languages, an attacker has an opportunity to divert or 1156 confuse users as a result of different characters (or, more 1157 specifically, assigned code points) with identical or similar semantics. 1158 These Guidelines provide a partial remedy for those risks by supplying 1159 a framework for prohibiting inappropriate characters from being 1160 registered at all and for permitting "variant" characters to be grouped 1161 together and reserved, so that they can only be registered in the DNS by 1162 the same owner. However, the system it suggests is no better or worse 1163 than the per-zone and per-language tables whose format and use this 1164 document specifies. Specific tables, and any additional local 1165 processing, will reflect per-zone decisions about the balance between 1166 risk and flexibility of registrations. And, of course, errors in 1167 construction of those tables may significantly reduce the quality of 1168 protection provided. 1170 7. Index to Terminology 1172 As a convenience to the reader, this section lists all of the special 1173 terminology used in this document, with a pointer to the section in 1174 which it is defined. 1176 Activated Label 2.1.17 1177 Activation 2.1.4 1178 Active Label 2.1.17 1179 Character Variant 2.1.14 1180 Character Variant Label 2.1.16 1181 CJK Characters 2.1.9 1182 Code point 2.1.7 1183 Code Point Variant 2.1.14 1184 FQDN 2.1.3 1185 Hostname 2.1.1 1186 IDL 2.1.2 1187 IDL Package 2.1.18 1188 IDN 2.1.1 1189 Internationalized Domain Label 2.1.2 1190 ISO/IEC 10646 2.1.6 1191 Label String 2.1.10 1192 Language name codes 2.1.5 1193 Language Variant Table 2.1.11 1194 LDH Subset 2.1.1 1195 Preferred Code Point 2.1.13 1196 Preferred Variant 2.1.13 1197 Preferred Variant Label 2.1.15 1198 Registration 2.1.4 1199 Reserved 2.1.18 1200 RFC3066 2.1.5 1201 Table 2.1.11 1202 UCS 2.1.6 1203 Unicode Character 2.1.7 1204 Unicode String 2.1.8 1205 Valid Code Point 2.1.12 1206 Variant Table 2.1.11 1207 Zone Variant 2.1.17 1209 8. Acknowledgments 1211 The authors gratefully acknowledge the contributions of: 1213 - V. CHEN, N. HSU, H. HOTTA, S. TASHIRO, Y. YONEYA, and other Joint 1214 Engineering Team members at the JET meeting in Bangkok, Thailand. 1216 - Yves Arrouye, an observer at the JET meeting in Bangkok, for his 1217 contribution on the IDL Package. 1219 - Those who commented on, and made suggestions about, earlier versions, 1220 including Harald ALVESTRAND, Erin CHEN, Patrik FALTSTROM, Paul HOFFMAN, 1221 Soobok LEE, LEE Xiaodong, MAO Wei, Erik NORDMARK, and L.M. TSENG. 1223 9. Authors’ Addresses 1225 James SENG 1226 Infocomm Development Authority 1227 8 Temasek Boulevard 1228 #14-00 Suntec Tower Three 1229 Singapore 038988 1230 Phone: +65 9638-7085 1231 E-mail: jseng@pobox.org.sg 1233 Kazunori KONISHI 1234 JPNIC 1235 Kokusai-Kougyou-Kanda Bldg 6F 1236 2-3-4 Uchi-Kanda, Chiyoda-ku 1237 Tokyo 101-0047 1238 Japan 1239 Phone: +81 49-278-7313 1240 E-mail: konishi@jp.apan.net 1242 Kenny HUANG 1243 TWNIC 1244 3F, 16, Kang Hwa Street, Taipei 1245 Taiwan 1246 TEL : 886-2-2658-6510 1247 E-mail: huangk@alum.sinica.edu 1249 QIAN Hualin 1250 CNNIC 1251 No.6 Branch-box of No.349 Mailbox, Beijing 100080 1252 Peoples Republic of China 1253 E-mail: Hlqian@cnnic.net.cn 1255 KO YangWoo 1256 PeaceNet 1257 Yangchun P.O. Box 81 Seoul 158-600 1258 Korea 1259 E-mail: newcat@peacenet.or.kr 1261 John C KLENSIN 1262 1770 Massachusetts Avenue, No. 322 1263 Cambridge, MA 02140 1264 U.S.A. 1265 E-mail: Klensin+ietf@jck.com 1267 Wendy RICKARD 1268 The Rickard Group 1269 16 Seminary Ave 1270 Hopewell, NJ 08525 1271 USA 1272 E-mail: rickard@rickardgroup.com 1274 10. Normative References 1276 [ABNF] Augmented BNF for Syntax Specifications: ABNF, RFC 2234, D. 1277 Crocker and P. Overell, eds., November 1997. 1279 [STD13] Paul Mockapetris, "Domain names--concepts and facilities" 1280 (RFC 1034) and "Domain names--implementation and 1281 specification" (RFC 1035), STD 13, November 1987. 1283 [RFC3066] Tags for the Identification of Languages, RFC3066, 1284 Jan 2001, H. Alvestrand. 1286 [IDNA] Internationalizing Domain Names in Applications (IDNA), 1287 RFC 3490, March 2003, Patrik Faltstrom, Paul Hoffman, 1288 Adam M. Costello. 1290 [PUNYCODE] Punycode: A Bootstring encoding of Unicode for 1291 Internationalized Domain Names in Applications (IDNA), 1292 RFC 3492, March 2003, Adam M. Costello. 1294 [STRINGPREP]Preparation of Internationalized Strings ("stringprep"), 1295 RFC 3454, December 2002, P. Hoffman, M. Blanchet. 1297 [NAMEPREP] Nameprep: A Stringprep Profile for Internationalized 1298 Domain Names, RFC 3491, March 2003, P. Hoffman, M. Blanchet. 1300 [IS10646] A product of ISO/IEC JTC1/SC2/WG2, Work Item JTC1.02.18 1301 (ISO/IEC 10646). It is a multipart standard: Part 1, 1302 published as ISO/IEC 10646-1:2000(E), covers the 1303 Architecture and Basic Multilingual Plane, and Part 2, 1304 published as ISO/IEC 10646-2:2001(E), covers the 1305 supplementary (additional) planes. 1307 [UNIHAN] Unicode Han Database, Unicode Consortium 1308 ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt. 1310 [UNICODE] The Unicode Consortium, "The Unicode Standard--Version 1311 3.0," ISBN 0-201-61633-5. Unicode Standard Annex #28 1312 (http://www.unicode.org/unicode/reports/tr28/) defines 1313 Version 3.2 of the Unicode Standard, which is definitive 1314 for IDNA and this document. 1316 [ISO7098] ISO 7098;1991 Information and documentation--Romanization 1317 of Chinese, ISO/TC46/SC2. 1319 11. Nonnormative References 1321 [IDN-WG] IETF Internationalized Domain Names Working Group, 1322 idn@ops.ietf.org, James Seng, Marc Blanchet. 1323 http://www.i-d-n.net/. 1325 [IESG-IDN] "IESG Statement on IDN", Internet Engineering Steering Group, 1326 IETF, 11 February 2003, 1327 http://www.ietf.org/IESG/STATEMENTS/IDNstatement.txt. 1329 [ISO639] "ISO 639:1988 (E/F)--Code for the representation of names 1330 of languages"--International Organization for 1331 Standardization, 1st edition, 1988-04-01.