idnits 2.17.1 

draft-xdlee-cnnamestr-01.txt:
  ** The Abstract section seems to be numbered


  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 274 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 22 instances of too long lines in the document, the longest
     one being 1 character in excess of 72.

  ** There are 23 instances of lines with control characters in the document.

  ** The abstract seems to contain references ([DNSSEARCH]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'STD13' is defined on line 259, but no explicit
     reference was found in the text

  == Unused Reference: 'ISO10646' is defined on line 265, but no explicit
     reference was found in the text

  == Unused Reference: 'Unicode3' is defined on line 269, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CTCC'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode3'

  == Outdated reference: A later version (-06) exists of
     draft-klensin-dns-search-05

  -- Possible downref: Normative reference to a draft: ref. 'DNSSEARCH' 


     Summary: 9 errors (**), 0 flaws (~~), 8 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        XiaoDong LEE
2	Internet-Draft                                                Kenny Huang
3	Expires: Nov 21, 2002                                           Erin Chen
4	                                                               Xiang DENG
5	                                                             YanFeng WANG

7	  Chinese Name String in Search-based access model for the DNS
8	                  draft-xdlee-cnnamestr-01.txt

10	Status of this Memo

12	   This document is an Internet-Draft and is in full conformance with all
13	provisions of Section 10 of RFC2026.

15	   Internet-Drafts are working documents of the Internet Engineering Task
16	Force (IETF), its areas, and its working groups. Note that other
17	groups may also distribute working documents as Internet-Drafts.

19	   Internet-Drafts are draft documents valid for a maximum of six months
20	and may be updated, replaced, or obsoleted by other documents at any
21	time.  It is inappropriate to use Internet-Drafts as reference
22	material or to cite them other than as "work in progress."

24	   The list of current Internet-Drafts can be accessed at
25	   http://www.ietf.org/ietf/1id-abstracts.txt.

27	   The list of Internet-Draft Shadow Directories can be accessed at
28	   http://www.ietf.org/shadow.html.

30	Copyright Notice
31	   Copyright (C) The Internet Society (2001).  All Rights Reserved.

33	Content
34	1.	Abstract
35	2.	Terminology
36	3.	CNS equivalence
37	4.	Requirements
38	5.	Solution suggested
39	6.	Encoding
40	7.	Security Considerations
41	8.	Authors' Addresses
42	9.	Acknowledgements
43	10.	References

45	1. Abstract
46	There are many requirements of developing internationalized and
47	human-readable Internet identifiers/names now, thereby there are many
48	systems based on DNS technology to meet such requirements. John C.
49	Klensin has proposed a three-layer search-based access model for the DNS
50	[DNSSEARCH]; this paper is only to explain some related problems
51	mentioned in John C. Klensin's proposal. Especially it focuses on
52	Traditional and Simplified Chinese problems and some other special
53	Chinese requirements.

55	The ultimate goal for any kinds of search-based access system is to help
56	users to access network resources in more natural ways, which have
57	different meaning for different user groups. On the premise of respecting
58	Chinese user's language convention, it is very important for a valuable
59	and human-friendly system to deal with traditional and simplified Chinese
60	equivalence problems.

62	2. Terminology
63	The key words "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", "MUST", and
64	"MAY" in this paper are to be interpreted as described in [RFC2119].

66	In order to describe the problem simply, we define these terminologies
67	first.

69	"TC" is an abbreviation for Traditional Chinese.

71	"SC" is an abbreviation for Simplified Chinese.

73	"CNS" is defined as an acronym of Chinese Name String that is the most
74	important facet, name string mentioned in [DNSSEARCH], which contains at
75	least one Chinese character. As to the scope of Chinese character, please
76	refer to ISO/IEC 10646-1:2000(E) [second edition 2000-09-15], if one
77	character is marked "C and G-Hanzi-T", it MUST be a Chinese character,
78	such definition does not mean it is not the character of other countries
79	that use HAN ideograph.

81	"TC-only CNS" is a CNS that all characters of it are TC characters.

83	"SC-only CNS" is a CNS that all characters of it are SC characters.

85	"Mixed-use TC and SC CNS" is a CNS of which at least one traditional and
86	one simplified Chinese character appear in all characters.

88	3.	CNS equivalence
89	The TC/SC equivalence problem is very complex and difficult to solve
90	perfectly, please refer to [CTCC], nevertheless, there are mainly three
91	categories of single TC/SC character equivalence, so we should solve
92	these problems respectively and one by one, after solving these three
93	kinds of problems, most of the TC/SC problems will be solved, and the
94	result will be acceptable for most Chinese users.
95	a)	One to one
96	E.g. U+98A8 (TC, "the wind") can be mapped to U+98CE (SC, the wind)
97	U+5099 (TC, to prepare) can be mapped to U+5907 (SC, to prepare)
98	U+908A (TC, a side) can be mapped to U+8FB9 (SC, a side)
99	b)	One to many
100	E.g. U+6FF1 (TC, the shore) can be mapped to U+6EE8,U+6D5C (SC, the
101	shore)
102	U+53C3 (TC, three, to take part in) can be mapped to U+53C2 (SC, to take
103	part in) U+53C1 (SC, three)
104	U+58DF (TC, a ridge or walkway in a field) can be mapped to U+5784,U+5785
105	(SC, a ridge or walkway in a field)
106	c)	Many to one
107	E.g. U+85F9,U+8B6A (TC, friendly) can be mapped to U+853C (SC, friendly)
108	U+5225 (TC, to leave), U+5F46 (TC, to awkward) can be mapped to U+522B
109	(SC, to leave, to awkward)
110	U+93DF (TC, a shovel), U+5277 (TC, a shovel) can be mapped to U+94F2 (SC,
111	a shovel)
112	But as to the equivalent problem of CNS, it is a combination of above
113	three categories, so it is more complex than single character, but we
114	could process it one character by one character.

116	4.	Requirements
117	These requirements SHOULD be considered for any system supported Chinese
118	name string.
119	a)	TC and SC CNS equivalent matching
120	SC is derived from TC, and Chinese people use both SC and TC. So Chinese
121	people think that TC CNS is equivalent with its corresponding SC forms,
122	so any implementation should meet such requirement.
123	b)	Mixed TC and SC CNS cause an exponential problem
124	If we want to ensure a CNS in both TC/SC forms to be resolved correctly,
125	we could register all its forms, but along with the length of label, an
126	exponential problem will occur. Most of Chinese character variants are
127	daily used. An ordinary Chinese Name String may have dozens of, hundreds
128	of, even thousands of TC/SC variants. That is unreasonable for users to
129	register, and uneasy for administrators to manage, and complex for system
130	to resolve. No matter which kind of search-based access system, flat or
131	hierarchy, or central-controlled, and so on, it is not reasonable for any
132	administration to process these thousands of name strings
133	un-automatically.
134	c)	Some other special requirement
135	As you know, there are many conventional differences between Chinese and
136	English. Such as of name string sequence. English people could write
137	"Minneapolis, Minnesota" to represent a location, but Chinese people
138	would like to write as "Minnesota, Minneapolis". So if we permit
139	search-based access system to use sequence attributes to represent
140	delegation or hierarchy, such kind of special requirement should be
141	satisfied.

143	5.  Solution suggested
144	As mentioned in [DNSSEARCH], there are many challenges in doing
145	traditional and simplified Chinese equivalence, because HAN character is
146	not only used in China, but also in other countries, mostly in Asia. To
147	be emphasized firstly, no method could solve traditional and simplified
148	Chinese equivalence perfectly and correctly, and up to now, the best
149	algorithm is only able to achieve about 99%, rather than 100%. So maybe
150	that is the reason why no consensus has been made in IDN WG.

152	Because we have two facets in search layer two, language and country
153	code/ geographical location, which will be very useful to solve most of
154	the problems. Based on these two facets, system with certain language and
155	country code could pick appropriate rules to do traditional and
156	simplified Chinese equivalence without any impact on other languages and
157	countries.

159	In Mainland China, as to "One to One" and "Many to One", we could convert
160	all these TC character into SC character, and then save SC-only CNS into
161	database for Chinese name string resolving. But as to "One to Many", it
162	maybe based on context, the system may handle this in artificial
163	intelligent method, it is a pity that even the best artificial
164	intelligent algorithm cannot solve this conversion completely. As in my
165	opinion, this kind of artificial conversion shouldn't be completed in
166	layer two, which should have affirmative result with some simple facets;
167	these artificial process should be completed in layer three or get user's
168	feedback to make sure which name string he want. User's feedback may be
169	added when doing conversion, or using result cached by last conversion.

171	E.g.
172	a)	One to one
173	{[CN] + [zh-cn] + TC} --> {[CN] + [zh-cn] + SC}
174	b)	Many to one
175	{[CN] + [zh-cn] + TC1/TC2/.../TCn}  --> {[CN] + [zh-cn] + SC}
176	c)	One to many
177	                       User feedback
178	{[CN] + [zh-cn] + TC} -------------------> {[CN] + [zh-cn] + SC1/.../SCn}

180	Finally, all Mixed-use TC and SC CNS should be converted into SC-only CNS
181	before resolving, and only SC-only CNS are stored in resolving database
182	in server. What's more, if we do want to implement "One to Many"
183	conversion in layer two, we could bind the TC CNS with one of its
184	corresponding SC forms with "first come, first use" based on reasonable
185	principle, that is, the binding process should avoid binding two
186	irrelevant CNS and cause meaningless equivalent resolving.

188	As shown above, Mainland of China could select conversion rules from TC
189	to SC, for TC area, contrary rules from SC to TC could be used. As to
190	this suggestion, user feedback is very important for One to Many
191	conversion, we just provide a solution to resolve CNS correctly, it
192	permit user to input unconventional Mixed-use TC and SC CNS in certain
193	language and country or area, but actually it doesn't happened very
194	often.

196	Some people suggest to use fuzziness level to determine matching
197	precision, they want user to select which kind of conversion they want,
198	it is not useful to solve TC/SC equivalence problem, I think, traditional
199	and simplified Chinese equivalence problem is not a fuzziness problem as
200	other fuzzy matching problems in search-based access system. Providing
201	fuzziness level Chinese matching will mislead end users, and will cause
202	questionable namespace in layer two. Chinese name string should have same
203	process rules in system level, which should not be based on user
204	intention completely.

206	6. Encoding
207	In layer two and layer three or above, as to the encoding of Chinese
208	character, we suggest using UNICODE directly, any additional encoding
209	will increase the system complexity, and it is unreasonable for a long
210	term solution. Of course, local encoding isn't limited, but it should
211	be converted into Unicode encoding before interchanging in internet.

213	7. Security Considerations
214	This paper is just a complement document for [DNSSEARCH], so it has same
215	security considerations. TC/SC CNS equivalence problem will not bring any
216	additional security problems into this search-based access model.

218	8.	Authors' Addresses
219	XiaoDong LEE
220	Chinese Academy of Sciences, CNNIC
221	4 South 4th Street, ZhongGuanCun, Beijing 100080
222	Phone: +86 10 62619750 ext. 3020
223	E-mail: lee@cnnic.net.cn

225	Kenny Huang
226	Taiwan Network Information Center (TWNIC)
227	4F-2, No.9 Sec. 2, Roosevelt Rd., Taipei, 100 Taiwan
228	E-mail: huangk@alum.sinica.edu

230	Erin Chen ( also as Yu Hsuan Chen)
231	Taiwan Network Information Center (TWNIC)
232	4F-2, No.9 Sec. 2, Roosevelt Rd., Taipei, 100 Taiwan
233	Phone:: +886 2 23411313 ext. 502
234	E-mail: erin@twnic.net.tw

236	Xiang DENG
237	China Internet Network Information Center(CNNIC)
238	4 South 4th Street, ZhongGuanCun, Beijing 100080
239	Phone: +86 10 62619750 ext. 3018
240	E-mail: deng@cnnic.net.cn

242	YanFeng WANG
243	China Internet Network Information Center(CNNIC)
244	4 South 4th Street, ZhongGuanCun, Beijing 100080
245	Phone: +86 10 62619750 ext. 3022
246	E-mail: wyf@cnnic.net.cn

248	9.	Acknowledgements
249	Thanks for these person's suggestions and efforts.
250	HuaLin QIAN hlqian@cnnic.net.cn ; CAS, CNNIC
251	Li-Ming Tseng     <tsenglm@csie.ncu.edu.tw>; NCU, TWNIC
252	Wei MAO     mao@cnnic.net.cn ; CNNIC
253	Wen-Sung Chen      <wschen@twnic.net.tw>; TWNIC

255	10. References
256	[RFC2119] Scott Bradner, Key words for use in RFCs to Indicate
257	Requirement Levels, March 1997, RFC 2119.

259	[STD13]   Paul Mockapetris, Domain names - implementation and
260	specification, November 1987, STD 13 (RFC 1034 and 1035).

262	[CTCC]    The Pitfalls and Complexities of Chinese to Chinese Conversion
263	 Jack Halpern, Jouni Kerman

265	[ISO10646] ISO/IEC 10646-1:2000. International Standard - Information
266	technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
267	1: Architecture and Basic Multilingual Plane.

269	[Unicode3] The Unicode Consortium, "The Unicode Standard -- Version3.0",
270	           ISBN 0-201-61633-5.

272	[DNSSEARCH] John C. Klensin, "A Search-based access model for the DNS",
273	            draft-klensin-dns-search-05.txt, May 2001,