idnits 2.17.1 draft-xdlee-cnnamestr-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 274 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 22 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** There are 23 instances of lines with control characters in the document. ** The abstract seems to contain references ([DNSSEARCH]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'STD13' is defined on line 259, but no explicit reference was found in the text == Unused Reference: 'ISO10646' is defined on line 265, but no explicit reference was found in the text == Unused Reference: 'Unicode3' is defined on line 269, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'CTCC' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode3' == Outdated reference: A later version (-06) exists of draft-klensin-dns-search-05 -- Possible downref: Normative reference to a draft: ref. 'DNSSEARCH' Summary: 9 errors (**), 0 flaws (~~), 8 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group XiaoDong LEE 2 Internet-Draft Kenny Huang 3 Expires: Nov 21, 2002 Erin Chen 4 Xiang DENG 5 YanFeng WANG 7 Chinese Name String in Search-based access model for the DNS 8 draft-xdlee-cnnamestr-01.txt 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with all 13 provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering Task 16 Force (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Copyright Notice 31 Copyright (C) The Internet Society (2001). All Rights Reserved. 33 Content 34 1. Abstract 35 2. Terminology 36 3. CNS equivalence 37 4. Requirements 38 5. Solution suggested 39 6. Encoding 40 7. Security Considerations 41 8. Authors' Addresses 42 9. Acknowledgements 43 10. References 45 1. Abstract 46 There are many requirements of developing internationalized and 47 human-readable Internet identifiers/names now, thereby there are many 48 systems based on DNS technology to meet such requirements. John C. 49 Klensin has proposed a three-layer search-based access model for the DNS 50 [DNSSEARCH]; this paper is only to explain some related problems 51 mentioned in John C. Klensin's proposal. Especially it focuses on 52 Traditional and Simplified Chinese problems and some other special 53 Chinese requirements. 55 The ultimate goal for any kinds of search-based access system is to help 56 users to access network resources in more natural ways, which have 57 different meaning for different user groups. On the premise of respecting 58 Chinese user's language convention, it is very important for a valuable 59 and human-friendly system to deal with traditional and simplified Chinese 60 equivalence problems. 62 2. Terminology 63 The key words "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", "MUST", and 64 "MAY" in this paper are to be interpreted as described in [RFC2119]. 66 In order to describe the problem simply, we define these terminologies 67 first. 69 "TC" is an abbreviation for Traditional Chinese. 71 "SC" is an abbreviation for Simplified Chinese. 73 "CNS" is defined as an acronym of Chinese Name String that is the most 74 important facet, name string mentioned in [DNSSEARCH], which contains at 75 least one Chinese character. As to the scope of Chinese character, please 76 refer to ISO/IEC 10646-1:2000(E) [second edition 2000-09-15], if one 77 character is marked "C and G-Hanzi-T", it MUST be a Chinese character, 78 such definition does not mean it is not the character of other countries 79 that use HAN ideograph. 81 "TC-only CNS" is a CNS that all characters of it are TC characters. 83 "SC-only CNS" is a CNS that all characters of it are SC characters. 85 "Mixed-use TC and SC CNS" is a CNS of which at least one traditional and 86 one simplified Chinese character appear in all characters. 88 3. CNS equivalence 89 The TC/SC equivalence problem is very complex and difficult to solve 90 perfectly, please refer to [CTCC], nevertheless, there are mainly three 91 categories of single TC/SC character equivalence, so we should solve 92 these problems respectively and one by one, after solving these three 93 kinds of problems, most of the TC/SC problems will be solved, and the 94 result will be acceptable for most Chinese users. 95 a) One to one 96 E.g. U+98A8 (TC, "the wind") can be mapped to U+98CE (SC, the wind) 97 U+5099 (TC, to prepare) can be mapped to U+5907 (SC, to prepare) 98 U+908A (TC, a side) can be mapped to U+8FB9 (SC, a side) 99 b) One to many 100 E.g. U+6FF1 (TC, the shore) can be mapped to U+6EE8,U+6D5C (SC, the 101 shore) 102 U+53C3 (TC, three, to take part in) can be mapped to U+53C2 (SC, to take 103 part in) U+53C1 (SC, three) 104 U+58DF (TC, a ridge or walkway in a field) can be mapped to U+5784,U+5785 105 (SC, a ridge or walkway in a field) 106 c) Many to one 107 E.g. U+85F9,U+8B6A (TC, friendly) can be mapped to U+853C (SC, friendly) 108 U+5225 (TC, to leave), U+5F46 (TC, to awkward) can be mapped to U+522B 109 (SC, to leave, to awkward) 110 U+93DF (TC, a shovel), U+5277 (TC, a shovel) can be mapped to U+94F2 (SC, 111 a shovel) 112 But as to the equivalent problem of CNS, it is a combination of above 113 three categories, so it is more complex than single character, but we 114 could process it one character by one character. 116 4. Requirements 117 These requirements SHOULD be considered for any system supported Chinese 118 name string. 119 a) TC and SC CNS equivalent matching 120 SC is derived from TC, and Chinese people use both SC and TC. So Chinese 121 people think that TC CNS is equivalent with its corresponding SC forms, 122 so any implementation should meet such requirement. 123 b) Mixed TC and SC CNS cause an exponential problem 124 If we want to ensure a CNS in both TC/SC forms to be resolved correctly, 125 we could register all its forms, but along with the length of label, an 126 exponential problem will occur. Most of Chinese character variants are 127 daily used. An ordinary Chinese Name String may have dozens of, hundreds 128 of, even thousands of TC/SC variants. That is unreasonable for users to 129 register, and uneasy for administrators to manage, and complex for system 130 to resolve. No matter which kind of search-based access system, flat or 131 hierarchy, or central-controlled, and so on, it is not reasonable for any 132 administration to process these thousands of name strings 133 un-automatically. 134 c) Some other special requirement 135 As you know, there are many conventional differences between Chinese and 136 English. Such as of name string sequence. English people could write 137 "Minneapolis, Minnesota" to represent a location, but Chinese people 138 would like to write as "Minnesota, Minneapolis". So if we permit 139 search-based access system to use sequence attributes to represent 140 delegation or hierarchy, such kind of special requirement should be 141 satisfied. 143 5. Solution suggested 144 As mentioned in [DNSSEARCH], there are many challenges in doing 145 traditional and simplified Chinese equivalence, because HAN character is 146 not only used in China, but also in other countries, mostly in Asia. To 147 be emphasized firstly, no method could solve traditional and simplified 148 Chinese equivalence perfectly and correctly, and up to now, the best 149 algorithm is only able to achieve about 99%, rather than 100%. So maybe 150 that is the reason why no consensus has been made in IDN WG. 152 Because we have two facets in search layer two, language and country 153 code/ geographical location, which will be very useful to solve most of 154 the problems. Based on these two facets, system with certain language and 155 country code could pick appropriate rules to do traditional and 156 simplified Chinese equivalence without any impact on other languages and 157 countries. 159 In Mainland China, as to "One to One" and "Many to One", we could convert 160 all these TC character into SC character, and then save SC-only CNS into 161 database for Chinese name string resolving. But as to "One to Many", it 162 maybe based on context, the system may handle this in artificial 163 intelligent method, it is a pity that even the best artificial 164 intelligent algorithm cannot solve this conversion completely. As in my 165 opinion, this kind of artificial conversion shouldn't be completed in 166 layer two, which should have affirmative result with some simple facets; 167 these artificial process should be completed in layer three or get user's 168 feedback to make sure which name string he want. User's feedback may be 169 added when doing conversion, or using result cached by last conversion. 171 E.g. 172 a) One to one 173 {[CN] + [zh-cn] + TC} --> {[CN] + [zh-cn] + SC} 174 b) Many to one 175 {[CN] + [zh-cn] + TC1/TC2/.../TCn} --> {[CN] + [zh-cn] + SC} 176 c) One to many 177 User feedback 178 {[CN] + [zh-cn] + TC} -------------------> {[CN] + [zh-cn] + SC1/.../SCn} 180 Finally, all Mixed-use TC and SC CNS should be converted into SC-only CNS 181 before resolving, and only SC-only CNS are stored in resolving database 182 in server. What's more, if we do want to implement "One to Many" 183 conversion in layer two, we could bind the TC CNS with one of its 184 corresponding SC forms with "first come, first use" based on reasonable 185 principle, that is, the binding process should avoid binding two 186 irrelevant CNS and cause meaningless equivalent resolving. 188 As shown above, Mainland of China could select conversion rules from TC 189 to SC, for TC area, contrary rules from SC to TC could be used. As to 190 this suggestion, user feedback is very important for One to Many 191 conversion, we just provide a solution to resolve CNS correctly, it 192 permit user to input unconventional Mixed-use TC and SC CNS in certain 193 language and country or area, but actually it doesn't happened very 194 often. 196 Some people suggest to use fuzziness level to determine matching 197 precision, they want user to select which kind of conversion they want, 198 it is not useful to solve TC/SC equivalence problem, I think, traditional 199 and simplified Chinese equivalence problem is not a fuzziness problem as 200 other fuzzy matching problems in search-based access system. Providing 201 fuzziness level Chinese matching will mislead end users, and will cause 202 questionable namespace in layer two. Chinese name string should have same 203 process rules in system level, which should not be based on user 204 intention completely. 206 6. Encoding 207 In layer two and layer three or above, as to the encoding of Chinese 208 character, we suggest using UNICODE directly, any additional encoding 209 will increase the system complexity, and it is unreasonable for a long 210 term solution. Of course, local encoding isn't limited, but it should 211 be converted into Unicode encoding before interchanging in internet. 213 7. Security Considerations 214 This paper is just a complement document for [DNSSEARCH], so it has same 215 security considerations. TC/SC CNS equivalence problem will not bring any 216 additional security problems into this search-based access model. 218 8. Authors' Addresses 219 XiaoDong LEE 220 Chinese Academy of Sciences, CNNIC 221 4 South 4th Street, ZhongGuanCun, Beijing 100080 222 Phone: +86 10 62619750 ext. 3020 223 E-mail: lee@cnnic.net.cn 225 Kenny Huang 226 Taiwan Network Information Center (TWNIC) 227 4F-2, No.9 Sec. 2, Roosevelt Rd., Taipei, 100 Taiwan 228 E-mail: huangk@alum.sinica.edu 230 Erin Chen ( also as Yu Hsuan Chen) 231 Taiwan Network Information Center (TWNIC) 232 4F-2, No.9 Sec. 2, Roosevelt Rd., Taipei, 100 Taiwan 233 Phone:: +886 2 23411313 ext. 502 234 E-mail: erin@twnic.net.tw 236 Xiang DENG 237 China Internet Network Information Center(CNNIC) 238 4 South 4th Street, ZhongGuanCun, Beijing 100080 239 Phone: +86 10 62619750 ext. 3018 240 E-mail: deng@cnnic.net.cn 242 YanFeng WANG 243 China Internet Network Information Center(CNNIC) 244 4 South 4th Street, ZhongGuanCun, Beijing 100080 245 Phone: +86 10 62619750 ext. 3022 246 E-mail: wyf@cnnic.net.cn 248 9. Acknowledgements 249 Thanks for these person's suggestions and efforts. 250 HuaLin QIAN hlqian@cnnic.net.cn ; CAS, CNNIC 251 Li-Ming Tseng ; NCU, TWNIC 252 Wei MAO mao@cnnic.net.cn ; CNNIC 253 Wen-Sung Chen ; TWNIC 255 10. References 256 [RFC2119] Scott Bradner, Key words for use in RFCs to Indicate 257 Requirement Levels, March 1997, RFC 2119. 259 [STD13] Paul Mockapetris, Domain names - implementation and 260 specification, November 1987, STD 13 (RFC 1034 and 1035). 262 [CTCC] The Pitfalls and Complexities of Chinese to Chinese Conversion 263 Jack Halpern, Jouni Kerman 265 [ISO10646] ISO/IEC 10646-1:2000. International Standard - Information 266 technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 267 1: Architecture and Basic Multilingual Plane. 269 [Unicode3] The Unicode Consortium, "The Unicode Standard -- Version3.0", 270 ISBN 0-201-61633-5. 272 [DNSSEARCH] John C. Klensin, "A Search-based access model for the DNS", 273 draft-klensin-dns-search-05.txt, May 2001,