idnits 2.17.1 draft-jseng-idn-admin-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1176 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There are 81 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1165 has weird spacing: '...t about good ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'ISO7098' on line 1029 looks like a reference -- Missing reference section? 'IDN-WG' on line 1034 looks like a reference -- Missing reference section? 'STRINGPREP' on line 132 looks like a reference -- Missing reference section? 'IDNA' on line 1005 looks like a reference -- Missing reference section? 'PUNYCODE' on line 1009 looks like a reference -- Missing reference section? 'NAMEPREP' on line 1017 looks like a reference -- Missing reference section? 'Note1' on line 139 looks like a reference -- Missing reference section? 'STD13' on line 1038 looks like a reference -- Missing reference section? 'UNICODE' on line 1024 looks like a reference -- Missing reference section? 'C2C' on line 1042 looks like a reference -- Missing reference section? 'Note2' on line 164 looks like a reference -- Missing reference section? 'I18NTERMS' on line 998 looks like a reference -- Missing reference section? 'RFC3066' on line 1002 looks like a reference -- Missing reference section? 'ABNF' on line 995 looks like a reference -- Missing reference section? 'DIGIT' on line 531 looks like a reference -- Missing reference section? 'UNIHAN' on line 1021 looks like a reference Summary: 5 errors (**), 0 flaws (~~), 3 warnings (==), 19 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT Editors: James SENG 2 draft-jseng-idn-admin-01.txt John KLENSIN 3 18th Oct 2002 Authors: K. KONISHI 4 Expires 18th April 2003 K. HUANG, H. QIAN, Y. KO 6 Internationalized Domain Names Registration and Administration 7 Guideline for Chinese, Japanese and Korean 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance 12 with all provisions of Section 10 of RFC2026 except that the 13 right to produce derivative works is not granted. 15 Internet-Drafts are working documents of the Internet 16 Engineering Task Force (IETF), its areas, and its working 17 groups. Note that other groups may also distribute working 18 documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of 21 six months and may be updated, replaced, or obsoleted by other 22 documents at any time. It is inappropriate to use Internet- 23 Drafts as reference material or to cite them other than as 24 "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 Abstract 34 Achieving internationalized access to domain names raises many complex 35 issues. These include not only associated with basic protocol design 36 (i.e., how the names are represented on the network, compared, and 37 converted to appropriate forms) but also issues and options for 38 deployment, transition, registration and administration. 40 The IETF IDN working group focused on the development of a standards 41 track specification for access to domain names in a broader range of 42 scripts than the original ASCII. It became clear during its efforts 43 that there was great potential for confusion, and difficulties in 44 deployment and transition, due to characters with similar appearances 45 or interpretations and that those issues could best be addressed 46 administratively, rather than through restrictions embedded in the 47 protocols. 49 This document provides guidelines for zone administrators (including 50 but not limited to registry operators and registrars), and information 51 for all domain names holders, on the administration of those domain 52 names which contain characters drawn from Chinese, Japanese and Korean 53 scripts (CJK). Other language groups are encouraged to develop their 54 own guidelines as needed, based on these guideline if that is helpful. 56 Comments on this document can be sent to the authors at 57 idn-admin@jdna.jp. 59 Table of Contents 61 0. Pre-Note for ASCII-version of this document 2 63 1. Introduction 3 65 2. Definitions 5 67 3. Administrative Framework 6 68 3.1. Principles underlying these Guidelines 7 69 3.2. Registration of IDL 8 70 3.2.1. Language character variant table 9 71 3.2.2 Formal syntax 10 72 3.2.3. Registration Algorithm 10 73 3.3. Deletion and Transfer of IDL and IDL Package 12 74 3.4. Activation and De-activation of IDN variants 13 75 3.5. Adding/Deleting language(s) association 13 76 3.6. Versioning of the language character variant tables 13 78 4. Example of Guideline Adoption 14 80 i. Notes 17 82 ii. Acknowledgements 17 84 iii. Authors 18 86 iv. Appendex A 18 88 v. Normative References 19 90 vi. Non-normative References 19 92 vii. Other Issues 19 94 0. Pre-Note for ASCII-version of this document 96 In order to make meanings clear, especially in examples, Han ideographs 97 are used in several places in this document. Of course, these 98 ideographs do not appear in its ASCII form of this document. So, for 99 the convenience of readers of the ASCII format and some readers not 100 familiar with recognizing and distinguishing Chinese characters, each 101 use of a particular character will be associated with both its Unicode 102 code point and an "asterisk tag" with its corresponding Chinese 103 Romanization [ISO7098] with the tone mark represented by a number 1 to 104 4. Those tags have no meaning outside this document; they are intended 105 simply to provide a quick visual and reading reference to facilitate 106 the combinations and transformations of characters in the guideline and 107 table excerpts. Appendix A would provide the Romanization of the 108 ideographs in Japanese (ISO 3602) and Korean (ISO 11941). 110 1. Introduction 112 Defining and specifying protocols for Internationalized Domain Names 113 has been one of the most controversial tasks initiated by the IETF in 114 recent years. Domain names are the fundamental naming architecture of 115 the Internet; many Internet protocols and applications rely on the 116 stability, continuity, and absence of ambiguity of the DNS. 118 The introduction of internationalized domain names (IDN) amplifies the 119 difficulty of putting names into identifiers and the confusion between 120 scripts and languages. It impacts many internet protocols and 121 applications and creates more complexity in technical administration 122 and services. 124 While the IETF IDN working group [IDN-WG] focused on the technical 125 problems of IDN, administrative guidelines are also important in order 126 to reduce unnecessary user confusion and domain name disputes among 127 domain name holders. 129 The IDN working group has completed working group last call for the 130 following internet-drafts: 132 1. Preparation of Internationalized Strings [STRINGPREP] 133 2. Internationalizing Host Names In Applications [IDNA] 134 3. Punycode version 0.3.3 [PUNYCODE] 135 4. A Stringprep Profile for Internationalized Domain Names [NAMEPREP] 137 These drafts specify that the intersystem protocols that make up the 138 domain name system infrastructure remain unchanged. Instead, they 139 introduce internationalization (I18N) [Note1] in client software 140 (particularly via the IDNA protocol) using an ASCII Compatible Encoding 141 (ACE) known as Punycode. 143 The domain name protocols [STD13] also specify that characters are to 144 be interpreted so that upper and lower case Latin-based characters are 145 considered equivalent. But with the introduction of Unicode characters 146 beyond US-ASCII, and the possibility to represent a single character in 147 multiple ways in ISO10646/Unicode [UNICODE], a normalization process, 148 known as Nameprep, has been proposed to handle the more complex 149 problems of character-matching for those additional characters. 150 Nameprep is also executed by client software as described in IDNA. 152 While Nameprep normalizes domain names so that the users have an 153 improved chance of getting the right domain name from information 154 provided in other forms, as required for I18N, Nameprep does not handle 155 any localization (L10N). 157 This becomes significant when a domain name holder attempts to use a 158 Unicode string forming a "name", "word", or "phrase" that may have 159 certain meaning in a certain language or when used as a domain name. 160 Such Unicode string may have different variants in the context of the 161 language or culture. 163 Generally, these localized variants in CJK can be classified into four 164 categories, as described by Halpern et al. [C2C]: [Note2] 166 a. Character (or Code) variants 168 Character (or Code) variants refer to variants that are generated by 169 character-by-character (or code-by-code) substitution. 171 An example in English would be "A" or "a" (U+0041 or U+0061). 172 Two examples in Chinese would be U+98DB *fei1* or U+98DE *fei1* 173 and U+6A5F *ji1* or U+673A *ji1*. 175 Note that this does not mean the choice between U+6A5F and U+673A is 176 always symmetric like the one between "A" and "a" -- it is a choice only 177 for Chinese but not for Japanese. 179 The variants for particular characters may be just to drop them. For 180 example, points and vowels characters in Hebrew (U+05B0 to U+05C4) and 181 Arabic (U+064B to U+0652) are optional; the variants for strings 182 containing them are constructed by simply dropping those points and 183 vowels. 185 Code variants may also occur when different code points are assigned to 186 what visually or abstractly are the "same" character, possibility due 187 to compatibility issues, type face differences or script range. For 188 example, LATIN CAPITAL LETTER A (U+0041) normally has an appearance 189 identical to GREEK CAPTIAL LETTER A (U+0391). CJK scripts have font 190 variants for compatibility (either U+4E0D or U+F967 may be used) and 191 "zVariant" (e.g. U+5154 and U+514E). 193 The difficulty lies in defining which characters are the "same" and 194 which are not. 196 b. Orthographic variants 198 Orthographic variants refer to variants that are generated by word-by- 199 word substitution. 201 An example in English would be "color" and "colour". 203 It is possible for some of these orthographic variants to be generated 204 by character variants. For example "airplane" in Chinese may be either 205 U+98DB U+6A5F *fei1 ji1* or U+98DE U+673A *fei1 ji1*. 207 Other orthographic variants may not be generated by character variants. 208 For example, in Chinese, both U+767C *fa1* and U+9AEE *fa4* 209 are related to U+53D1 *fa1 or fa4* depending on the word. For hair, 210 U+5934 U+53D1 *tou2 fa4*, the variant should be U+982D U+9AEE 211 *tou2 fa4* but not U+982D U+767C *tou2 fa1*. 213 c. Lexemic variants 215 Lexemic variants refer to variants that can be generated when language 216 is considered, by word-by-word substitution. 218 An example in English would be cab, taxi, or taxicab. 220 An example in Chinese would be U+8CC7 U+8A0A *zi1 xun4* or 221 U+4FE1 U+606F *xin4 xi1*. 223 Note that there is no relationship between U+8CC7 and U+4FE1 or U+8A0A 224 and U+606F, i.e., the sequence U+8CC7 U+606F *zi1 xi1* does not 225 exist in Chinese. 227 d. Contextual variants 229 Contextual variants refer to variants that are generated by word-by- 230 word substitutions with context considered. 232 In English, the word "plane" has different meanings and could be 233 replaced by with different equivalent words (synonyms) such as 234 "airplane" or "plane" (as in a flat-surface or device for smoothing 235 wood) depending on context. And, of course, "plain", which is 236 pronounced the same way, and indistinguishable in speech-to-text 237 contexts such as computer input systems for the visually impaired, is a 238 different word entirely. 240 Similarly, the word U+6587 U+4EF6 *wen2 jian4* could be either 241 document U+6587 U+4EF6 *wen2 jian4* or data file U+6A94 U+6848 242 *dang3 an4* depending on context. 244 Although domain names were designed to be identifiers without any 245 language context, users have not been prevented from using strings in 246 domain names and interpreting them as "words" or "names". It is likely 247 that users will do this with IDN as well. Therefore, given the added 248 complications of using a much broader range of characters, precautions 249 will be required when deploying IDN to minimize confusion and fraud. 251 The intention of these guidelines is to provide advice about the 252 deployment of IDNs, with language consideration, but focusing only on 253 the category of character variants to increase the possibility of 254 successful resolution and reduced confusion while accepting inherent 255 DNS limitations. 257 2. Definitions 259 Unless otherwise stated, the definitions of the terms used in this 260 document are consistent with "Terminology Used in Internationalization 261 in the IETF" [I18NTERMS]. 263 "FQDN" refers to a fully-qualified domain name and "domain name label" 264 refers to a label of a FQDN. 266 RFC3066 [RFC3066] defines a system for coding and representing 267 languages. 269 ISO/IEC 10646 is a universal multiple-octet coded character set that is 270 a product of ISO/IEC JTC1/SC2/WG2, Work Item JTC1.02.18 (ISO/IEC 10646). 271 It is a multi-part standard: Part 1, published as ISO/IEC 10646- 272 1:2000(E) covering the Architecture and Basic Multilingual Plane; Part 273 2, published as ISO/IEC 10646-2:2001(E) covers the supplementary 274 (additional) planes. 276 The Unicode Consortium publishes "The Unicode Standard -- Version 3.0", 277 ISBN 0-201-61633-5. In March 2002, Unicode Consortium published Unicode 278 Standard Annex #28. That annex defines Version 3.2 of The Unicode 279 Standard, which is fully synchronized with ISO/IEC 10646-1:2000 (with 280 Amendment 1). 282 The term "Unicode character" is used here to refer to characters chosen 283 from The Unicode Standard Version 3.2 (and hence from ISO/IEC 10646). 284 In this document, the characters are identified by their positions (or 285 "code points"). The notation U+12AB, for example, indicates the 286 character at the position 12AB (hexadecimal) in the Unicode 3.2 table. 288 Similarly, "Unicode string" refers to a string of Unicode characters. 289 The Unicode string is identify by the sequence of the Unicode 290 characters regardless of the encoding scheme. 292 The term "IDN" is often used to refer to many different things: (a) an 293 abbreviation for "Internationalized Domain Name" (b) a fully-qualified 294 domain name that contains at least one label that contains characters 295 not appearing in ASCII (c) a label of a domain name that contains at 296 least one character beyond ASCII (d) a Unicode string to be processed 297 by Nameprep (e) an IDN Package (in this document context) (f) a 298 Nameprep processed string (g) a Nameprep and Punycode processed string 299 (h) the IETF IDN Working Group (g) ICANN IDN Committee (h) other IDN 300 activities in other companies/organizations etc. 302 Because of the potential confusion, this document shall use the term 303 "IDN" as an abbreviation for "Internationalized Domain Name" only. 305 And also, this document provides a guideline to be applied on a per 306 zone basis, one label at a time, the term "Internationalized Domain 307 Name Label" or "IDL" will be used instead. 309 In this document, the term "registration" refers to the process by 310 which a potential domain name holder requests that a label be placed in 311 the DNS, either as an individual name within a domain or as a sub- 312 domain delegation from another domain name holder. A successful 313 registration would then lead to the label or delegation records being 314 placed in the relevant zone file. The guidelines presented here are 315 recommended for all zones, at any hierarchy level, in which CJK 316 characters are to appear, not just domains at the first or second level. 318 CJK characters are characters commonly used in Chinese, Japanese or 319 Korean language including but not limited to ASCII (U+0020 to U+007F, 320 Han Ideograph (U+3400 to U+9FAF and U+20000 to U+2A6DF), Bopomofo 321 (U+3100 to U+312F and U+31A0 to U+31BF), Kana (U+3040 to U+30FF), Jamo 322 (U+1100 to 11FF and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF and 323 U+3130 to U+318F) and its respective compatibility forms. 325 3. Administrative Framework 327 Zone administrators are responsible for the administration of the 328 domain name labels under their control. A zone administrator might be 329 responsible for a large zone such as a Top Level Domain (TLD), generic 330 or country code, or a smaller one such as a typical second or third 331 level domain. A large zone would often be more complex then a smaller 332 one (sometimes it is just larger). However, normally, actual technical 333 administrative tasks -- such as addition, deletion, delegation and 334 transfer of zones between domain name holders -- are similar for all 335 zones. 337 At the same time, different zones may have different policies and 338 processes. For example, a pay-per-domain policy and registry/registrar 339 model for .COM may not be applicable to such domains as .SG or .IBM.COM. 340 The latter, for example, has very restricted policies about who is 341 permitted to have a domain name label under IBM.COM, the types of 342 string that are permitted, and different procedures for obtaining those 343 string. 345 This document only provides guidelines for how CJK characters should be 346 handled within a zone, how language issues should be considered and 347 incorporated, and how domain name labels containing CJK characters 348 should be administered (including registration, deletion and transfer 349 of labels). It does not provide any guidance for handling of non-CKJ 350 characters or languages in zones. 352 Other IDN policies, as the creation of new TLDs, or the cost structure 353 for registrations, are outside the scope of this document. Such 354 discussions should be conducted in forums outside the IETF as well. 356 Technical implementation issues are not discussed here either. For 357 example, the decision as to whether various of the guidelines should be 358 implemented as registry or registrar actions is left to zone 359 administrators, possibly differing from zone to zone. 361 3.1. Principles underlying these Guidelines 363 In many places, this document would assumes "First-Come-First-Serve" 364 (FCFS) as a conflict policy in the event of a dispute although FCFS is 365 not listed as one of the principles. If other policies dominate 366 priorities and "rights", one can use these guidelines by replacing uses 367 of FCFS in this document by appropriate other policy rules specific to 368 the zone. In other cases, some of these guidelines may not be 369 applicable although, some alternatives for determining rights to labels 370 -- such as use of UDRP or mutual exclusion -- might have little impact 371 on other aspects of these guidelines. 373 (a) Each IDL to be registered should be associated with one or more 374 languages. 376 Although some Unicode strings may be pure identifiers made up of an 377 assortment of characters from many languages and scripts, IDLs are 378 likely to be names or phrases that have certain meaning in some 379 language. While a zone administration might or might not require 380 "meaning" as a registration criterion, the possibility of meaning 381 provides a useful tool when trying to avoid user confusion. 383 Zone administrators should administratively associate one or more 384 language with each IDL. These associations should either be pre- 385 determined by the zone administrator and applied to the entire zone or 386 chosen by the registrants on a per-IDL basis. The latter may be 387 necessary for some zones, but will make administration more difficult 388 and will increase the likelihood of conflicts in variant forms. 390 A given zone might have multiple languages associated with it, or have 391 no language specified at all, but doing so may provide additional 392 opportunities for user confusion, and is therefore not recommended. 394 The zone administrator must also verify the validity of the IDL 395 requested by using information associated with the chosen language and 396 possibly other rules as appropriate. 398 (b) When an IDL is registered, all of the character variants for the 399 associated language(s) should be reserved for the registrant. Each 400 language associated with the IDL will lead to different character 401 variants. 403 IDL reservations of the type described here normally do not appear in 404 the distributed DNS zone file. In other words, these reserved IDLs do 405 not resolve. Domain name holders could request these reserved IDLs to 406 be placed in the zone file and made active and resolvable as, e.g., 407 aliases or synonyms. 409 Since different languages may imply different sets of variants, the 410 IDLs reserved for one IDL may overlap those reserved for another. In 411 this case, the reserved IDLs should be bound to one registration or the 412 other, or excluded from both, according to the applicable registration 413 or dispute resolution policy for the zone. 415 (c) For a given base language, the IDL may have one or more recommended 416 variants that should be suggested to the domain name holder for active 417 registration as synonyms. 419 Some language rules may prefer certain variants over others. To 420 increase the likelihood of correct and predictable resolution of the 421 IDL by end-users, the recommended variants should be active. 423 (d) The IDL and its reserved variants with the language(s) association 424 must be atomic. 426 The IDL and its reserved variants for the associated language(s) are to 427 be considered as a single unit -- an "IDL Package". For a given IDL, 428 that IDL package is defined by these guidelines and created upon 429 registration. 431 The IDL Package is atomic: Transfer and deletion of IDL are performed 432 on the IDL Package as a whole. IDL, either active or reserved, within 433 the IDL Package must not be transferred or deleted individually. I.e., 434 any re-registration, transfers, or other actions that impact the IDL 435 should also impact the reserved variants. Separate registration or 436 other actions for the variants are not possible if these guidelines are 437 to accomplish their purpose. 439 Conflict policy of the zone may result in violation of the IDL Package 440 atomicity. In such case, the conflict policy would take precedence. 442 3.2. Registration of IDL 444 Conforming to the principles described in 3.1, the registration of an 445 IDL would require at least two components, i.e., the character variant 446 tables for the language and the registration algorithm. 448 3.2.1. Language character variant table 450 Any lines starting with, or portions of lines after, the hash 451 symbol("#") are treated as comments. Comments have no significance in 452 the processing of the tables, nor are there any syntax requirements 453 between the hash symbol and the end of the line. Blank lines in the 454 tables are ignored completely. 456 Every language should have a character variant table provided by a 457 relevant group (or organization or other body) and based on established 458 standards. The group that defines a particular character variant table 459 should document references to the appropriate standards in beginning of 460 table, tagged with the word "Reference" followed by an integer (the 461 reference number) followed by the description of the reference. For 462 example, 464 Reference 1 CP936 (commonly known as GBK) 465 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt 466 Reference 3 List of Simplified character Table (Simplified column) 467 Reference 4 zSimpVariant in Unihan.txt 468 Reference 5 variant that exists in GB2312, common simplified hanzi 470 Each language character variant table must have a version number. This 471 is tagged with the word "Version" followed by an integer then followed 472 by the date in the format YYYYMMDD, where YYYY is the 4 digit Year, MM 473 is the 2 digit Month and DD is the 2 digit Day of the publication date 474 of the table 476 Version 1 20020701 # July 2002 Version 1 478 The table has three fields, separated by semicolons. The fields are: 479 "valid code point"; "recommended variant(s)"; and "character 480 variant(s)". 482 Only code points listed in the "valid code point" field are allowed to 483 be registered as part of a IDL associated with that language. 485 There can be one or more "recommended variant(s)" (i.e., entries in the 486 "recommended variant(s)" column). If the "recommended variant(s)" 487 column is empty, then there is no corresponding variant. 489 The "character variant(s)" column contains all variants of the code 490 point, including but not limited to the code point itself and the 491 "recommended variant(s)". 493 If the variant is composed of a sequence of code points, then sequence 494 of code points is listed separated by a space in the "recommended 495 variant(s)" or "character variant(s)". 497 If there are multiple variants, each variant must be separated by a 498 comma in the "recommended variant(s)" or "character variant(s)". 500 Any code point listed in the "recommended variant(s)" column must be 501 allowed, by the rules for the relevant language, to be registered. 502 However, this is not a requirement for the entries in the "character 503 variant(s)" column; it is possible that some of those entries may not 504 be allowed to be registered. 506 Every code point in the table should have a corresponding reference 507 number (associated with the references) specified to justify the entry. 508 The reference number is placed in parentheses after the code point. If 509 there is more than one reference, then the numbers are placed within a 510 single set of parentheses and separated by commas. 512 3.2.2. Formal syntax 514 This section uses the IETF "ABNF" metalanguage [ABNF] 516 LanguageCharacterVariantTable = 1*ReferenceLine VersionLine 1*EntryLine 517 ReferenceLine = "Reference" SP RefNo SP RefDesciption [ Comment ] CRLF 518 RefNo = 1*DIGIT 519 RefDesciption = *[VCHAR] 520 VersionLine = "Version" SP VersionNo SP VersionDate [ Comment ] CRLF 521 VersionNo = 1*DIGIT 522 VersionDate = YYYYMMDD 523 EntryLine = VariantEntry/Comment CRLF 524 VariantEntry = ValidCodePoint [ "(" RefList ") ] ;" RecommendedVariant 525 ";" CharacterVariant [ Comment ] 526 ValidCodePoint = CodePoint 527 RefList = RefNo 0*( "," RefNo ) 528 RecommendedVariant = CodePointSet 0*( "," CodePointSet ) 529 CharacterVariant = CodePointSet 0*( "," CodePointSet ) 530 CodePointSet = CodePoint 0* ( SP CodePoint ) 531 CodePoint = 4DIGIT [DIGIT] [DIGIT] 532 Comment = "#" *VCHAR 534 YYYYMMDD is an integer representing a date where YYYY is the 4 digit 535 year, MM is the 2 digit month and DD is the 2 digit day. 537 3.2.3. Registration Algorithm 539 (An explanation of these steps follows them) 541 1. IN <= IDL to be registered and 542 {L} <= Set of languages associated with IN 543 2. {V} <= Set of version numbers of the language character 544 variant tables derived from {L} 545 3. NP(IN) <= Nameprep processed IN and 546 check availability of NP(IN). 547 If not available, route to conflict policy. 548 4. For each AL in {L} 549 4.1. Check validity of NP(IN) in AL. If failed, stop processing. 550 4.2. PV(IN,AL) <= Set of available Nameprep processed recommended 551 variants of NP(IN) in AL 552 4.3. RV(IN,AL) <= Set of available Nameprep processed character 553 variants of NP(IN) in AL 554 4.4. End of Loop 555 5. {PV} <= Set of all PV(IN,AL) with optional processing. 556 6. {ZV} <= {PV} set-union NP(IN) 557 7. {RV} <= Set of all RV(IN,AL) set-minus {ZV} 558 8. Create IDL Package for IN using IN, {L}, {V}, {ZV} and {RV} 559 9. Put {ZV} into zone file 561 Explanation 563 Step 1 takes the IDL to be registered and the associated language(s) as 564 input to the process. 566 Step 2 extract the set of version numbers of the associated language(s) 567 tables. 569 Step 3 Nameprep processed the IDL. If the Nameprep processed IDL is 570 already registered or reserved, then the conflict policy is applied 571 here. For example, if FCFS is used, the registration process would stop 572 here. 574 Step 4 goes through all languages associated with the proposed IDL, 575 checks for validity in each language, and generates the recommended 576 variants and the reserved variants. 578 In step 4.1, IDL validation is done by checking that every code point 579 in the Nameprep processed IDL is a code point allowed by the "valid 580 code point" column of the character variant table for the language. If 581 one or more code points are invalid, the registration process must stop 582 here. 584 Step 4.2 generates the list of recommended variants of the IDL by doing 585 a combination of all possible variants listed in "recommend variant(s)" 586 column for each code point in the Nameprep processed IDL. Generated 587 variants must be processed with Nameprep. If any of the recommended 588 variants of the IDL is registered or reserved, then the conflict policy 589 will be applied although this does not prevent the IDL from being 590 registered. For example, if FCFS is used, then the conflicting 591 variant(s) will be removed from the list. 593 Step 4.3 generates the list of reserved variants by doing a combination 594 of all the possible variants listed in "character variant(s)" column 595 for each code point in the Nameprep processed IDL. Generated variants 596 must be Nameprep processed. If any of the variants are registered or 597 reserved, then the conflict policy will apply here although this does 598 not prevent the IDL from being registered. For example, if FCFS is 599 used, then the conflict variants will be removed from the list. 601 The "combination" in Step 4.2 and Step 4.3 could achieve by a recursive 602 function similar to the following pseudo code: 604 Function Combination(Str) 605 F <= first codepoint of Str 606 SStr <= Substring of Str, without the first code point 607 NSC <= {} 609 If SStr is empty Then 610 For each V in (Variants of code point F) 611 NSC = NSC set-union (the string with the code point V) 612 End of Loop 613 Else 614 SubCom = Combination(SStr) 615 For each V in (Variants of code point F) 616 For each SC in SubCom 617 NSC = NSC set-union (the string with the 618 first code point V followed by the string SC) 619 End of Loop 620 End of Loop 621 Endif 623 Return NSC 625 Step 5 generates the list of all recommended variants for all language. 626 Optionally, the algorithm may reduce the list of recommended variants 627 by prompting the user to select the recommended variants. 629 Step 6 generates the list of variants including the Nameprep processed 630 IDL which to be activated and Step 7 generates the list of reserved 631 variants. 633 Then an "IDL Package" for IDL is created in Step 8 with the original 634 IDL, the associated language(s), all the list of activated IDLs and the 635 list of variants. The version numbers of the language character 636 variants tables are also stored in the IDL Package. 638 Lastly, the activated IDLs are converted using ToASCII [IDNA] with 639 UseSTD13ASCIIRules on and then put into the zone file. If the IDL is a 640 subdomain name, it will be delegated. The activated IDLs may be 641 delegated to a different domain name server so long it is owned by the 642 same domain name holder. 644 3.3. Deletion and Transfer of IDL and IDL Package 646 In normal domain administration, every domain name label is independent 647 of all other domain name labels. Registration, deletion and transfer 648 of domain name labels is done on a per domain name label basis. 649 Depending on the zone's administrative policies, aliases (e.g., "CNAME" 650 entries) may be bound to particular labels with rules about whether one 651 can be changed without the other. Current policies in gTLDs generally 652 prohibit registration of such aliases, in part to avoid needing to form 653 and enforce policies about these change (or binding) rules. 655 However, with internationalization, each IDL is bound to a list of 656 variant IDLs (with the list depending on the associated language), 657 bound together in an IDL Package. 659 Because all variants of the IDL should belong to a single domain name 660 holder, the IDL Package should be treated as a single entity. 661 Individual IDL, either active or reserved, within the IDL Package must 662 not be deleted or transferred independently of the other IDLs. 663 Specifically, if an IDL is to be deleted or transferred, that action 664 must be taken only as part of an action that affects the entire IDL 665 Package. 667 If the local conflict policy requires IDL to be transferred and deleted 668 independently of the IDL Package, the conflict policy would take 669 precedence. In such event, the conflict policy should be associated 670 with a transfer or delete procedure taking IDL Package into 671 consideration. 673 When an IDL Package is deleted, all the active and reserved variants 674 would be available again. IDL Package deletion does not change any 675 other IDL Packages, including IDL Packages that have variants that 676 conflict with the variants in the deleted IDL Package. This is to be 677 consistent with the atomicity and predictability of the IDL Package. 679 3.4. Activation and De-activation of IDL variants 681 As there are active IDLs and inactive IDLs within an IDL Package, 682 processes are required to activate or de-activate IDL variants in an 683 IDL Package. 685 The activation algorithm is described below: 687 1. IN <= IDL to be activated & PA <= IDL Package 688 2. NP(IN) <= Nameprep processed IN 689 3. If NP(IN) not in {RV} then stop 690 4. {RV} <= {RV} set-minus NP(IN) and {ZV} <= {ZV} set-union NP(IN) 691 5. Put {ZV} into the zone file 693 Similarly, the deactivation algorithm: 694 1. IN <= IDL to be deactivated & PA <= IDL Package 695 2. NP(IN) <= Nameprep processed IN 696 3. If NP(IN) not in {ZV} then stop 697 4. {RV} <= {RV} set-union NP(IN) and {ZV} <= {ZV} set-minus NP(IN) 698 5. Put {ZV} into the zone file 700 3.5. Adding/Deleting language(s) association 702 The list of variants is generated from the IDL and tables for the 703 associated languages. If the language associations are changed, then 704 the lists of variants have to be updated. On the other hand, the IDL 705 Package is atomic and the list of variants must not be changed after 706 creation. 708 Therefore, this document recommends deleting the IDL Package followed 709 by a registration with the new set of languages rather than attempting 710 to add or delete language(s) association within the IDL Package. Zone 711 administrators may find it desirable to devise procedures to prevent 712 other parties from capturing the labels in the IDL Package during these 713 operations. 715 3.6. Versioning of the language character variant tables 717 Language character variants tables are subjected to changes over time 718 and the changes may or may not be backward compatible. It is possible 719 that different version of the language character variants tables may 720 produce a different set of recommended variants and reserved variants. 722 New IDL Packages should use the latest version of the language 723 character variants tables. 725 Existing IDL Packages created using previous version of language 726 character variants tables are not affected when there a new version of 727 the character variants table is released. 729 4. Example of Guideline Adoption 731 To provide a meaningful example, some language character variant tables 732 have to be defined. Assume, then, that the following four language 733 character variants tables are defined (note that these tables are not a 734 representation of the actual table and they do not contain sufficient 735 entries to be used in any actual implementation): 737 a) language character variants tables for zh-cn and zh-sg 739 Reference 1 CP936 (commonly known as GBK) 740 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt 741 Reference 3 List of Simplified character Table (Simplified column) 742 Reference 4 zSimpVariant in Unihan.txt 743 Reference 5 variant that exists in GB2312, common simplified hanzi 745 Version 1 20020701 # July 2002 747 56E2(1);56E2(5);5718(2) # sphere, ball, circle; mass, lump 748 5718(1);56E2(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump 749 60F3(1);60F3(5); # think, speculate, plan, consider 750 654E(1);6559(5);6559(2) # teach 751 6559(1);6559(5);654E(2) # teach, class 752 6DF8(1);6E05(5);6E05(2) # clear 753 6E05(1);6E05(5);6DF8(2) # clear, pure, clean; peaceful 754 771E(1);771F(5);771F(2) # real, actual, true, genuine 755 771F(1);771F(5);771E(2) # real, actual, true, genuine 756 8054(1);8054(3);806F(2) # connect, join; associate, ally 757 806F(1);8054(3);8054(2),8068(2) # connect, join; associate, ally 758 96C6(1);96C6(5); # assemble, collect together 760 b) language variants table for zh-tw 762 Reference 1 CP950 (commonly known as BIG5) 763 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt 764 Reference 3 List of Simplified Character Table (Traditional column) 765 Reference 4 zTradVariant in Unihan.txt 767 Version 1 20020701 # July 2002 769 5718(1);5718(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump 770 60F3(1);60F3(1); # think, speculate, plan, consider 771 6559(1);6559(1);654E(2) # teach, class 772 6E05(1);6E05(1);6DF8(2) # clear, pure, clean; peaceful 773 771F(1);771F(1);771E(2) # real, actual, true, genuine 774 806F(1);806F(3);8054(2),8068(2) # connect, join; associate, ally 775 96C6(1);96C6(1); # assemble, collect together 777 c) language variants table for ja 779 Reference 1 CP932 (commonly known as Shift-JIS) 780 Reference 2 zVariant in Unihan.txt 781 Reference 3 variant that exists in JIS X0208, commonly used Kanji 783 Version 1 20020701 # July 2002 785 5718(1);5718(3);56E3(2) # sphere, ball, circle; mass, lump 786 60F3(1);60F3(3); # think, speculate, plan, consider 787 654E(1);6559(3);6559(2) # teach 788 6559(1);6559(3);654E(2) # teach, class 789 6DF8(1);6E05(3);6E05(2) # clear 790 6E05(1);6E05(3);6DF8(2) # clear, pure, clean; peaceful 791 771E(1);771E(1);771F(2) # real, actual, true, genuine 792 771F(1);771F(1);771E(2) # real, actual, true, genuine 793 806F(1);806F(1);8068(2) # connect, join; associate, ally 794 96C6(1);96C6(3); # assemble, collect together 796 d) language variants table for ko 798 Reference 1 CP949 (commonly known as EUC-KR) 799 Reference 2 zVariant in Unihan.txt 801 Version 1 20020701 # July 2002 803 5718(1);56E2(1);56E3(2) # sphere, ball, circle; mass, lump 804 60F3(1);60F3(1); # think, speculate, plan, consider 805 654E(1);6559(1);6559(2) # teach 806 6DF8(1);6E05(1);6E05(2) # clear 807 771E(1);771F(1);771F(2) # real, actual, true, genuine 808 806F(1);8054(1);8068(2) # connect, join; associate, ally 809 96C6(1);96C6(1); # assemble, collect together 811 Example 1: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 812 {L} = {zh-cn, zh-sg, zh-tw} 814 NP(IN) = (U+6E05 U+771F U+6559) 815 PV(IN,zh-cn) = (U+6E05 U+771F U+6559) 816 PV(IN,zh-sg) = (U+6E05 U+771F U+6559) 817 PV(IN,zh-tw) = (U+6E05 U+771F U+6559) 818 {ZV} = {(U+6E05 U+771F U+6559)} 819 {RV} = {(U+6E05 U+771E U+6559), 820 (U+6E05 U+771E U+654E), 821 (U+6E05 U+771F U+654E), 822 (U+6DF8 U+771E U+6559), 823 (U+6DF8 U+771E U+654E), 824 (U+6DF8 U+771F U+6559), 825 (U+6DF8 U+771F U+654E)} 827 Example 2: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 828 {L} = {ja} 830 NP(IN) = (U+6E05 U+771F U+6559) 831 PV(IN,ja) = (U+6E05 U+771F U+6559) 832 {ZV} = {(U+6E05 U+771F U+6559)} 833 {RV} = {(U+6E05 U+771E U+6559), 834 (U+6E05 U+771E U+654E), 835 (U+6E05 U+771F U+654E), 836 (U+6DF8 U+771E U+6559), 837 (U+6DF8 U+771E U+654E), 838 (U+6DF8 U+771F U+6559), 839 (U+6DF8 U+771F U+654E)} 841 Example 3: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 842 {L} = {zh-cn, zh-sg, zh-tw, ja, ko} 844 NP(IN) = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* 845 Invalid registration because U+6E05 is invalid in L = ko 847 Example 4: IDL = (U+806F U+60F3 U+96C6 U+5718) 848 *lian2 xiang3 ji2 tuan2* 849 {L} = {zh-cn, zh-sg, zh-tw} 851 NP(IN) = (U+806F U+60F3 U+96C6 U+5718) 852 PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) 853 PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) 854 PV(IN,zh-tw) = (U+806F U+60F3 U+96C6 U+5718) 855 {ZV} = {(U+8054 U+60F3 U+96C6 U+56E2), 856 (U+806F U+60F3 U+96C6 U+5718)} 857 {RV} = {(U+8054 U+60F3 U+96C6 U+56E3), 858 (U+8054 U+60F3 U+96C6 U+5718), 859 (U+806F U+60F3 U+96C6 U+56E2), 860 (U+806f U+60F3 U+96C6 U+56E3), 861 (U+8068 U+60F3 U+96C6 U+56E2), 862 (U+8068 U+60F3 U+96C6 U+56E3), 863 (U+8068 U+60F3 U+96C6 U+5718) 865 Example 5: IDL = (U+8054 U+60F3 U+96C6 U+56E2) 866 *lian2 xiang3 ji2 tuan2* 867 {L} = {zh-cn, zh-sg} 869 NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) 870 PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) 871 PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) 872 {ZV} = {(U+8054 U+60F3 U+96C6 U+56E2)} 873 {RV} = {(U+8054 U+60F3 U+96C6 U+56E3), 874 (U+8054 U+60F3 U+96C6 U+5718), 875 (U+806F U+60F3 U+96C6 U+56E2), 876 (U+806f U+60F3 U+96C6 U+56E3), 877 (U+806F U+60F3 U+96C6 U+5718), 878 (U+8068 U+60F3 U+96C6 U+56E2), 879 (U+8068 U+60F3 U+96C6 U+56E3), 880 (U+8068 U+60F3 U+96C6 U+5718)} 882 Example 6: IDL = (U+8054 U+60F3 U+96C6 U+56E2) 883 *lian2 xiang3 ji2 tuan2* 884 {L} = {zh-cn, zh-sg, zh-tw} 886 NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) 887 Invalid registration because U+8054 is invalid in L = zh-tw 889 Example 7: IDL = (U+806F U+60F3 U+96C6 U+5718) 890 *lian2 xiang3 ji2 tuan2* 891 {L} = {ja,ko} 893 NP(IN) = (U+806F U+60F3 U+96C6 U+5718) 894 PV(IN,ja) = (U+806F U+60F3 U+96C6 U+5718) 895 PV(IN,ko) = (U+806F U+60F3 U+96C6 U+5718) 896 {ZV} = {(U+806F U+60F3 U+96C6 U+5718)} 897 {RV} = {(U+806F U+60F3 U+96C6 U+56E3), 898 (U+8068 U+60F3 U+96C6 U+5718), 899 (U+8068 U+60F3 U+96C6 U+56E3)} 901 i. Notes 903 1. The terms "i18n" and "l10n", sometimes used in upper-case form (i.e., 904 "I18N" and "L10N"), have become popular in international standards 905 usage as abbreviations for "internationalization" and "localization", 906 respectively. The abbreviations were derived by using the first and 907 last letters of the words, with the number of characters that appear 908 between them. I.e., in "internationalization", there are 18 characters 909 between the initial "i" and the terminal "n". 911 2. Every human language is unique and therefore, every linguistic and 912 localization issue is also unique. It is difficult or impossible to 913 make comparisons across multiple languages or to classify them into 914 categories. And any cross-language analogies are, by their very nature, 915 imperfect at best. 917 For example, to classify Traditional Chinese/Simplified Chinese as 918 upper/lower case makes as much sense as to classify TC/SC as "spelling 919 variant" like "color" and "colour". Both comparisons are potentially 920 useful but neither is completely correct. 922 3. The variants in CJK are very complex and require many different 923 layers of solution. This guideline is a one of the solution components, 924 but not sufficient, by itself, to solve the whole problem. 926 ii. Acknowledgements 928 The authors gratefully acknowledge the contributions of: 930 V.CHEN, N.HSU, H.HOTTA, S.TASHIRO, Y.YONEYA and other Joint Engineering 931 Team members at the JET meeting in Bangkok. 933 Yves Arrouye, an observer at the JET meeting, for his contribution on 934 the IDL Package. 936 Soobok LEE 937 L.M TSENG 938 Patrik FALTSTROM 939 Paul HOFFMAN 940 Erin CHEN 941 LEE Xiaodong 942 Harald ALVESTRAND 944 iii. Author(s) 946 James SENG 947 PSB Certification 948 3 Science Park Drive 949 #03-12 PSB Annex 950 Singapore 118233 951 Phone: +65 6885-1657 952 Email: jseng@pobox.org.sg 954 Kazunori KONISHI 955 JPNIC 956 Kokusai-Kougyou-Kanda Bldg 6F 957 2-3-4 Uchi-Kanda, Chiyoda-ku 958 Tokyo 101-0047 959 JAPAN 960 Phone: +81 49-278-7313 961 Email: konishi@jp.apan.net 963 Kenny HUANG 964 TWNIC 965 3F, 16, Kang Hwa Street, Taipei 966 Taiwan 967 TEL : 886-2-2658-6510 968 Email: huangk@alum.sinica.edu 970 QIAN Hualin 971 CNNIC 972 No.6 Branch-box of No.349 Mailbox, Beijing 100080 973 Peoples Republic of China 974 Email: Hlqian@cnnic.net.cn 976 KO YangWoo 977 PeaceNet 978 Yangchun P.O. Box 81 Seoul 158-600 979 Korea 980 Email: newcat@peacenet.or.kr 982 John C KLENSIN 983 1770 Massachusetts Ave, No. 322 984 Cambridge, MA 02140 985 USA 986 Email: Klensin+ietf@jck.com 988 iv. Appendix A 990 [How to read the Han Ideograph provided in this document. -- Will 991 complete this section in next revision] 993 v. Normative References 995 [ABNF] Augmented BNF for Syntax Specifications: ABNF, RFC 2234, D. 996 Crocker and P. Overell, Eds., November 1997. 998 [I18NTERMS] Terminology Used in Internationalization in the IETF, 999 draft-hoffman-i18n-terms-07.txt, September 2002, 1000 Paul Hoffman, work in progress 1002 [RFC3066] Tags for the Identification of Languages, RFC3066, 1003 Jan 2001, H. Alvestrand 1005 [IDNA] Internationalizing Domain Names in Applications, 1006 draft-ietf-idn-idna, Feb 2002, Patrik Faltstrom, 1007 Paul Hoffman, Adam M. Costella, work in progress 1009 [PUNYCODE] Punycode: An encoding of Unicode for use with IDNA, 1010 draft-ietf-idn-punycode, Feb 2002, Adam M. Costello, 1011 work in progress 1013 [STRINGPREP]Preparation of Internationalized Strings, 1014 draft-hoffman-stringprep, Feb 2002, Paul Hoffman, 1015 Marc Blanchet, work in progress 1017 [NAMEPREP] Nameprep: A Stringprep Profile for Internationalized 1018 Domain Names, work in progress, draft-ietf-idn-nameprep, 1019 Feb 2002, Paul Hoffman, Marc Blanchet, work in progress 1021 [UNIHAN] Unicode Han Database, Unicode Consortium 1022 ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt 1024 [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 1025 3.0", ISBN 0-201-61633-5. Unicode Standard Annex #28, 1026 (http://www.unicode.org/unicode/reports/tr28/) defines 1027 Version 3.2 of The Unicode Standard. 1029 [ISO7098] ISO 7098;1991 Information and documentation -- Romanization 1030 of Chinese, ISO/TC46/SC2. 1032 vi. Non-normative References 1034 [IDN-WG] IETF Internationalized Domain Names Working Group, 1035 idn@ops.ietf.org, James Seng, Marc Blanchet. 1036 http://www.i-d-n.net/ 1038 [STD13] Paul Mockapetris, "Domain names - concepts and facilities" 1039 (RFC 1034) and "Domain names - implementation and 1040 specification" (RFC 1035), STD 13, November 1987. 1042 [C2C] Pitfalls and Complexities of Chinese to Chinese Conversion, 1043 http://www.cjk.org/cjk/c2c/c2c.pdf, Jack Halpern, Jouni 1044 Kerman 1046 vii. Other Issues 1048 It is possible that many variants generated may have no meaning in the 1049 associated language or languages. The intention is not to generate 1050 meaningful "words" but to generate similar variants to be reserved. 1052 The language Character Variants tables are critical to the success of 1053 the guideline. A badly designed table may either generate too many 1054 meaningless variants or may not generate enough meaningful variants. 1055 The principles to be used to generate the tables are not within the 1056 scope of this document, nor are the tables themselves. 1058 This document recommends against registration of IDL in a particular 1059 language until the language character variants table for that language 1060 is available. 1062 Outstanding Issues 1064 (1) Erin suggested (if I (JcK) correctly understood her) that, if 1065 multiple languages are associated with a given name, the recommended 1066 variant list for a given code point be treated as the intersection of 1067 the variant lists for each of the languages, not the union. As I 1068 understand the current algorithm, it effectively takes the union. 1069 Taking the intersection has the technical advantage that it would 1070 significantly reduce the number of variant strings that must be 1071 reserved. It also has the policy advantage of discouraging people 1072 from registering with multiple languages if they don't need to - 1073 otherwise, we will have everyone trying to register in all of the 1074 possibly-relevant languages, which would make this effort a good deal 1075 less effective than it might be. 1077 Taking the intersection is also consistent with a rule that appears to 1078 exist now. As shown in Example 3, if an attempt is made to register a 1079 name and associate it with multiple languages, it must be valid in all 1080 of those languages or the registration attempt will fail. So we 1081 intersect the validity criteria on a language basis, and should 1082 probably intersect the variants. 1084 But that is an algorithm change, since we have to extract the variant 1085 lists for each code point for each language, take the intersection, 1086 and then process against that, rather than against each language in 1087 turn. 1089 [JS - I disagree in taking the intersection of the set. No doubt by 1090 doing intersection we will reduce the abuse of specifying multiple 1091 language to increase the set of reserved variants, our goal is 1092 precisely to reserve as much variants as possible for the domain name 1093 holder, not vice versa. 1095 Suppose we have a string ABC with variants ABD ACD ABF in Chinese, ABE 1096 ACD in Japanese and CBD ACD in Korean. 1098 Assuming a registrant register ABC in CJK, right now he will get the 1099 reserved set of {ABC, ACD, ABF, ABE, CBD}. 1101 On the other hand, if we do intersection, this set will be reduced to 1102 {ACD}, leaving other variants like ABF, ABE and CBD open for potential 1103 conflict. And the only way he can protect this confusion is to 1104 individually register ABF, ABE and CBD manually individually, 1105 something we trying to prevent.] 1107 [Further explanation by Erin: 1109 I'm sorry maybe my previous suggestion is not clear enough. 1111 I mean if multiple languages are associated with a given nanme, the 1112 range of valid code point sould be the intersection of all the 1113 associated languages. 1115 But, if multiple languages are associated with a given nanme, the 1116 recommended variants should be take the union and put into zone file. 1117 The same, the character variant code also sould be take the union for 1118 each of the languages.] 1120 (2) A note went by indicating that the plan was to drop the Han 1121 characters from the IETF-submission version of this document. We can 1122 post I-Ds in PDF and publish RFCs in PDF and/or Postscript, as long as 1123 we provide ASCII. I find having the Han characters very useful, and 1124 trust that those of you who can read them find them even more so. So 1125 I would suggest that we hand off the pair of an ASCII document (with 1126 the Han characters removed) and a PDF document (that looks like the 1127 Word text we have been looking it) to the I-D editor. I've got full 1128 Acrobat here and can presumably produce the thing if needed. 1130 (3) We still need to sort out the issue of whether reserving a 1131 variant that may (in a current or future table) conflict with another 1132 character, with the possibility of activating it is an invitation to 1133 cybersquatting and other abuses. That isn't clear, let me try an 1134 illustration: suppose we have a character X, with variants A, B, and C, 1135 and a character Y, with variants D and C. Now, if Y is registered 1136 first, then its package includes {Y*, D, C}, using the symbol "*" to 1137 denote an active name. When X is registered, its package consists of 1138 {X, A, B}. X's owner can't reserve or activate C, since it was 1139 reserved to Y. But much of the reason for doing all of this work was 1140 the concern that C can be confused with either Y or X. So doesn't 1141 this create an opportunity for Y to threaten, or extort money from, X 1142 by threatening to activate C? 1144 [JS -- The conflict of X & Y over C in this case could be resolved by 1145 existing conflict policy. The revised guideline now makes it possible 1146 to modify the IDL Package in the event of dispute] 1148 That problem gets worse, I think, if Erin's suggestion in (1) is not 1149 adopted. And I continue to believe that the only solution that will 1150 work is to prevent anyone from activating C. Or, more generally, at 1151 any given time, there will be a set of language variant tables that 1152 will be considered valid by the administrator of a particular zone. 1153 The zone administrator would take the union of all of those tables, 1154 using the 'valid code point' as the key as usual, and then permanently 1155 reserve any character that appeared most than once in a variant column. 1156 Small matter of programming. 1158 (4) In page 9, on the paragraph starting with "The character 1159 variant(s) column contains ..." 1161 Page: 21 1162 This seems to be saying that the code points listed in the third 1163 column will always be a proper superset of the union of the first and 1164 second columns. If that is correct, it violates a fundamental 1165 principle that I was taught about good programming and systems design 1166 -- minimization of duplication of information, since such duplicates 1167 are error-prone. And, if I have not interpreted the intent correctly, 1168 the text needs to be fixed. Somehow. 1170 [JS -- correct, it is duplicated. The duplication is bad from 1171 system design view but it makes it 'complete' and easy to explain.]