idnits 2.17.1 

draft-ietf-idn-cjk-01.txt:
-(1): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand
     corner of the first page

  == There are 7 instances of lines with non-ascii characters in the document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 454 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([CNRP]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC1035' is mentioned on line 71, but not defined

  == Missing Reference: 'UTR21' is mentioned on line 82, but not defined

  == Missing Reference: 'UTR15' is mentioned on line 198, but not defined

  == Unused Reference: 'UNISTD3' is defined on line 413, but no explicit
     reference was found in the text

  == Unused Reference: 'IDN' is defined on line 416, but no explicit
     reference was found in the text

  == Unused Reference: 'CJKV' is defined on line 422, but no explicit
     reference was found in the text

  == Unused Reference: 'C2C' is defined on line 424, but no explicit
     reference was found in the text

  == Unused Reference: 'KANJIDIC' is defined on line 428, but no explicit
     reference was found in the text

  == Unused Reference: 'UNICHART' is defined on line 431, but no explicit
     reference was found in the text

  == Unused Reference: 'ISO11941' is defined on line 438, but no explicit
     reference was found in the text

  == Unused Reference: 'KimK 1990' is defined on line 443, but no explicit
     reference was found in the text

  == Unused Reference: 'KimK 1992' is defined on line 447, but no explicit
     reference was found in the text

  == Unused Reference: 'KimK 1999' is defined on line 451, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNISTD3'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UCS'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'IDN'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CNRP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CJKV'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'C2C'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'KANJIDIC'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICHART'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ZONGBIAO'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNIHAN'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO11941'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'KimK 1990'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'KimK 1992'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'KimK 1999'


     Summary: 7 errors (**), 0 flaws (~~), 16 warnings (==), 16 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	���Internet Draft                                                James SENG
2	<draft-ietf-idn-cjk-01.txt>                               Yoshiro YONEYA
3	11th Apr 2001                                                Kenny HUANG
4	Expires 11 Oct 2001                                         KIM Kyongsok

6	        Han Ideograph (CJK) for Internationalized Domain Names

8	Status of this Memo

10	    This document is an Internet-Draft and is in full conformance
11	    with all provisions of Section 10 of RFC2026.

13	    Internet-Drafts are working documents of the Internet
14	    Engineering Task Force (IETF), its areas, and its working
15	    groups. Note that other groups may also distribute working
16	    documents as Internet-Drafts.

18	    Internet-Drafts are draft documents valid for a maximum of
19	    six months and may be updated, replaced, or obsoleted by other
20	    documents at any time. It is inappropriate to use Internet-
21	    Drafts as reference material or to cite them other than as
22	    "work in progress."

24	    The list of current Internet-Drafts can be accessed at
25	    http://www.ietf.org/ietf/1id-abstracts.txt

27	    The list of Internet-Draft Shadow Directories can be accessed at
28	    http://www.ietf.org/shadow.html.

30	Abstract

32	During the development of Internationalized Domain Name (IDN), it is
33	discovered that there is a substantial lack of information and
34	misunderstanding on Han ideographs and its folding mechanism.

36	This document attempts to address some of the issues on doing han
37	folding with respect to IDN. Hopefully, this will dispel some of the
38	common misunderstanding of this problem and to discuss some of the
39	issues with han ideograph and its folding mechanism.

41	This document addresses very specific problem to IDN and thus is not
42	meant as a reference for generic Han folding. Generic Han folding are
43	much more complicated and certainly beyond this document. However, the
44	use of this document may be applicable to other areas that are related
45	with names, e.g. Common Name Resolution Protocol [CNRP].

47	1. Definition and convention

49	Characters mentioned in this document are identified by their position
50	or code point in the Unicode character set [UCS]. The notation U+12AB,
51	for example, indicates the character at the position 12AB (hexadecimal)
52	in the [UCS]. It is strongly recommended that a [UCS] table is available
53	for reference for the ideograph described.

55	Han ideographs are defined as the Chinese ideographs starting from
56	U+3400 to U+9FFF or commonly known as CJK Unification Ideographs. This
57	covers Chinese 'hanzi' {U+6F22 U+5B57/U+6C49 U+5B57}, Japanese 'kanji'
58	(U+6F22 U+5B57) and Korean 'hanja' {U+6F22 U+5B57/U+D55C U+C790}.
59	Additional Han ideographs will appear in other location (not necessary
60	in plane 0) in the future.

62	Conversion between ideographs can be done using four different
63	approaches: Code-base substitution, character-based substitution,
64	lexicon-based substitution and context-based substitution. Han folding
65	refers only to code-base substitution, similar to case mapping of
66	alphabetic characters.

68	2. Introduction

70	Traditionally, domain names have been case insensitive (as defined in
71	[RFC1035] Section 2.3.3). While this is not a problem when domain names
72	are restricted to English alphanumeric letters and digits, it becomes a
73	serious problem for IDN. An important criterion for having a robust IDN
74	is to have good normalization and canonicalization forms. This is to
75	ensure domain name duplications are kept to the minimal.

77	Fortunately, Unicode Consortium is developing technical reports on
78	canonicalization [UTR21] and normalization [UTR15]. Hence, it becomes
79	simple for IDN to ride upon the work of Unicode and use these
80	references.

82	Unfortunately, both [UTR15] and [UTR21] are limited in scope and do not
83	address many other scripts. In particular, Han ideographs are not
84	discussed in detail in these documents and most experts are quick to
85	point out that this problem is technically impossible.

87	2.1 Han ideographs

89	While there are many forms or writing style for Chinese characters, the
90	most common used 'zhengti' {U+6B63 U+4F53/U+6B63 U+9AD4} represent
91	Chinese ideographs by radicals (U+2E80-U+2FDF) that is composed of
92	simple strokes.

94	When the Unicode Consortium started work on Universal Character Set, it
95	was suggested that Hanzi, Kanji and Hanja ideographs should be unified
96	into a single code space. This resulted in the CJK Unification, whereby
97	27,786 Han ideographs are allocated in U+3400-U+9FFF and U+F900-U+FAFF
98	range. Another 41,000 Han ideographs will be added to Plane 2.

100	Ideographs are common in China, Korea and Japan but as ideographs spread
101	and evolve, the form of the ideographs sometimes differs slightly from
102	country to country. For example, the word 'villa' {U+838A} 'zhuang' in
103	Chinese, in Japanese is 'sou' {U+8358}. These are given different code
104	points in Unicode.

106	3. Chinese (Hanzi)

108	Chinese ideographs or hanzi {U+6F22 U+5B57/U+6C49 U+5B57} originated
109	from pictograph. They are 'pictures' which evolved into ideographs
110	during several thousand years. For instance, the ideograph for "hill"
111	{U+5C71} still bears some resembles to 3 peaks of a hill.

113	Not all ideographs are pictograph. There are other classifications such
114	as compound ideographs, phonetic ideographs etc. For example,
115	'endurance' {U+5FCD} is a pierced 'knife' {U+5200} above the 'heart'
116	{U+5FC3}, or as a Chinese saying goes, 'endurance is like having a
117	pierced knife in your heart'.

119	Hence, almost all Han ideographs are associated with some meaning by
120	itself which is very different from most other scripts. This causes some
121	confusion that Han folding is a form of lexicon-substitution.

123	Chinese ideographs underwent a major change in the 1950s after the
124	establishment of People's Republic of China. A committee on Language
125	Reform was established in China whose activities include simplification
126	of Chinese ideographs. The Simplified Chinese (SC) are used in China
127	and Singapore and Traditional Chinese (TC) in Taiwan, Hong Kong PRC,
128	Macau PRC, and most other oversea Chinese.

130	The process is to take complex ideographs and simplify them. The main
131	purposes is to make it easier to remember and write and thus to raise
132	the literacy of the population.

134	For example, 'lightning' TC {U+96FB} becomes SC {U+6535} (They drop the
135	'rain' {U+96E8} part from the TC). In many cases, they bear no
136	resemblance to any of the original traditional forms e.g. 'dragon' TC
137	{U+9F8D} SC {U+9F99}. Two different TC may also have the same SC since
138	it means fewer ideographs to learn, e.g. SC {U+53D1} can be {U+667C} or
139	{U+9AEE} depending on semantics. The official 'Comprehensive List of
140	Simplified Characters' latest published in 1986 listed 2244 SC
141	[ZONGBIAO].

143	Therefore, the process of SC-to-TC is very complicated. It is not
144	possible to do it accurately without considering the semantics of the
145	phrase.

147	On the other hand, TC-to-SC is much simple although different TCs may
148	map to one single SC. While Unicode does not handle TC & SC, in the
149	informal [UNIHAN] document, it listed 2145 TC and its equivalent mapping
150	of SC. However, because that document is informal and not part of the
151	Unicode standard, it is incomplete and has mistakes in the code points.
152	Hence, precise tables for TC-to-SC conversion have not been fully laid
153	out.

155	In domain names, we are particularly interested in is to equivalences
156	comparison of the names, and not converting SC-to-TC. Therefore, for
157	this purpose, it is possible that equivalency matching be done in the
158	TC-to-SC folding prior to comparison, similar to lower-case English
159	strings before comparing them, e.g. 'taiwan' SC {U+53F0 U+6E7E} will
160	match with TC {U+81FA U+5F4E} or TC {U+53F0 U+5F4E}.

162	The side effect of this method is that comparing SC {U+53D1} to TC
163	{U+667C} or TC {U+9AEE} will both be positive. This implies that SC
164	'hair' SC ���� {U+5934 U+53D1} will match TC
165	(U+982D U+9AEE). It will also match TC {U+982D U+9AEE} that does not
166	have any meaning in Chinese.

168	It should also be noted that SC are not used together with TC. Hence,
169	'hair' is either written as SC {U+5934 U+53D1} or TC {U+982D U+9AEE}
170	but (almost) never {U+5934 U+9AEE} or {U+982D U+53D1}. So the problem
171	of SC and TC may not too serious for IDN.

173	Unfortunately, when it comes to names in Chinese, places where SC are
174	used (i.e. Singapore and China), traditional and simplified ideographs
175	are sometimes mixed within a single name for artistic reasons. Some of
176	them even 'create' ideographs for their names.

178	[Need to add a section on Bopomofo U+3118 to U+312A in future draft]

180	4. Korean (Hanja and Hangeul)

182	Korean is one of the first cultures to imported Chinese ideographs into
183	Korean language as a written form. These Korean ideographs are known as
184	'hanja' {U+6F22 U+5B57/U+D55C U+C790} and they are widely used until
185	recently where 'hangeul' {U+D55C U+AE00} become more popular.

187	Hangeul {U+D55C U+AE00} is a systemic script designed by a 15th century
188	ruler and linguistic expert, King Sejong {U+4E16 U+5B97}. It is based
189	on the pronunciation of the Korean language, hanmal. A Korean syllable
190	is composed of 'jamo' {U+5B57 U+6BCD/U+C790 U+BAA8} elements that
191	represent different sound. Hence, unlike Han ideographs, each hangeul
192	syllable does not have any meaning.

194	Each hanja ideographs can be represented by hangeul syllable. For
195	example, 'samsung' hanja {U+4E09 U+661F} hangeul {U+C0BC U+C131}. Note
196	that {U+4E09} is pronounced as 'sa-ah-am' or in jamo {U+3145} {U+314F}
197	{U+3141}, which gives hangeul {U+C0BC}. While Jamo decompositions are
198	described in [UTR15] in Form D decomposition, this document also
199	suggested another hanguel canonical decomposition in Appendix A to
200	accommodates both modern and old hangeul.
201	[Need to fill up Appendix A when information is more complete]

203	Most hanja characters have only one pronunciation. However, some hanja
204	pronunciation differs as according to orthography (same for Chinese &
205	Japanese) or the position in a word, which make this more complex. And
206	of course, conversation of Hangeul back to hanja is impossible by code
207	substitution without consideration for semantics.

209	Korean also invented their own ideographs that are called 'gugja'
210	{U+56FD U+5B57/U+AD6D U+C790}.

212	5. Japanese (Kanji, Hiragana, Katakana)

214	Japanese adopted Chinese ideograph from the Korean and the Chinese since
215	the 5th century. Chinese ideographs in Japanese are known as 'kanji'
216	{U+6F22 U+5B57}. They also developed their own syllabary hiragana
217	{U+5E73 U+4EEE U+540D} (U+3040-U+309F) and katakana {U+7247 U+4EEE
218	U+540D} (U+30A0-U+30FF), both are derivative of kanji that has same
219	pronunciation. Hiragana is a simplified cursive form, for example, 'a'
220	{U+3042} was derived from 'an' {U+5B89}. Katakana is a simplified part
221	form, for example, 'a' {U+30A2} was derived from 'a' {U+963F}. However,
222	kanji all remain very integrated within the Japanese language.

224	Japanese also invented ideographs known as 'kokuji' {U+56FD U+5B57}. For
225	example, 'iwashi' {U+9C2F} is a Japanese kokuji ideograph. Kokuji are
226	invented according to Han ligature rules. For example, 'touge' "mountain
227	pass" {U+5CE0} is a conjunction of meaning with 'yama' "mountain"
228	{U+5C71} + 'ue' "up" {U+4E0A} + 'shita' "down" {U+4E0B}.

230	Japanese is also a vocal language, i.e. the script itself is based on
231	pronunciation. Each hiragana corresponding to one pronunciation and 48
232	hiragana forms the basic of the Japanese language, including the less
233	commonly used 'we' {U+3091}. Furthermore, hiragana has more 35 forms to
234	represent voiced sound, P-sound, double consonant. For example, 'ga'
235	{U+304C} is a voiced sound of 'ka' {U+304B}. Katakana is a mirror of
236	hiragana with few more forms and they are used to integrate foreign
237	words or phrases into Japanese, or to emphasize words or phrases even
238	in Japanese, or to represent onomatopoeia. For example, 'hamburger'
239	pronounced as 'han-baa-gaa' in Japanese is written as {U+30CF U+30F3
240	U+30D0 U+30FC U+30AC U+30FC} instead of {U+306F U+3093 U+3070 U+3041
241	U+304C U+3041} because it is a foreign word.

243	If Japanese uses hiragana and katakana only, then it is fairly obvious
244	that written Japanese is going to be very long. Hence, kanji are used
245	when referring to nouns or verbs. Each kanji corresponds to one or more
246	hiragana characters. For example, 'japan' pronounced as 'nippon'
247	{U+306B U+3063 U+307D U+3093} are written as {U+65E5 U+672C} instead.

249	Hiragana, like Korean jamo, has no meaning itself. And also, Kanji can
250	take on different pronunciation (which means different hiragana)
251	depending where and how it is use in the sentence. For example, 'sky'
252	{U+7A7A} can be pronounced as {U+305D U+3089} or {U+30BD U+30E9}.

254	Hence, a code substitution between hiragana and kanji is impractical.

256	On the other hand, there are Kanji that has the same meaning with the
257	same pronunciation and equivalent. For example, 'river' "kawa" can be
258	either {U+5DDD} or {U+6CB3}. The only differential between the two
259	ideographs is that it signifies the 'size of the river' (the latter is
260	bigger river).

262	Japanese also reduce complex Chinese ideographs to a simplified form.
263	For example, 'both' {U+5169} was simplified {U+4E21}. Note that Chinese
264	simplified it to {U+4E24} instead. However, traditional Japanese kanji
265	are seldom used nowadays beyond documenting old historical text that
266	they are treated different from the more commonly used simplified form,
267	or used to express proper noun such as person's name or trademarks.
268	Hence, Han folding here is not recommended.

270	4. Vietnamese

272	While Vietnamese also adopted Chinese ideographs ('chu han') and created
273	their own ideographs ('chu nom'), they were now replaced by romanized
274	'quoc ngu' today. Hence, this document does not attempt to address any
275	issues with 'chu han' or 'chu nom'.

277	5. zVariant

279	Unicode has a three dimension conceptual model to Ideograph
280	Unification. The three dimensions are semantic (X axis - meaning,
281	function), abstract shape (Y-axis - general form) and actual shape
282	(Z-axis ��� instantiated, type-faced).

284	When two ideographs have similar etymology but are given two different
285	code points in Unicode, they are known as zVariant ideograph i.e. they
286	belong to the same 'Z' axis. For example, 'villa' {U+838A} and {U+8358}.

288	6. Ideographic Description

290	In Unicode v3.0, an ideographic description (U+2FF0-U+2FFB) was
291	introduced allowing Han ideograph to be constructed using radical
292	(U+2E80-U+2FD5) and Han ideograph (U+3400-U+9FFF).

294	The intention of this description method is to allow ideograph that is
295	not defined by Unicode to be described. Hence, it is not necessary that
296	these ideograph can be display properly. In addition, this method are
297	not deterministic and allowing same ideograph to be represented in
298	different sequence.

300	For example, 'zong' {U+9B03} (for discussion sake, we are going to use
301	an ideograph which is already in Unicode) can be decomposed to U+2FF1
302	U+9ADF U+5B97 using descriptive code points and Unified Ideograph.
303	U+9ADF can also be decomposed as U+2FF0 U+2ED2 U+2F3A and U+5B97 as
304	U+2FF5 U+2F28 U+2F70. In addition, U+9ADF is equivalent to U+2FBD.
305	Hence, if we were to use only descriptive code points and radicals only,
306	we can get U+2FF1 U+2FBD U+2FF5 U+2F28 U+2F70 or U+2FF1 U+2FF0 U+2ED2
307	U+2F3A U+2FF5 U+2F28 U+2F70.

309	In addition, certain radical has been simplified and thus, in some
310	context, equivalent. For example, the radical for 'bird' can be either
311	U+2EE6 or U+2FC3.

313	Hence, until there is a deterministic well-defined rule for
314	ideographic description, ideographs formed by this method are not
315	recommended for domain names use.

317	It should be noted that the Unicode Consortium never intended the
318	ideographic description to be used in protocols like IDN where exact
319	comparison must be done. But it is certainly desirable to this feature
320	as it is commons for Chinese to invent ideographs for names by adding
321	or removing radical from standard ideographs.

323	7. Mechanism

325	The implicit proposal in this document is that CJKV ideographs may or
326	may not be "folded" for the purposes of comparison of domain names.

328	But if folding is required, there are four different ways that this
329	folding could be done.

331	a) Folding by DNS clients, or by user agents
332	b) Folding by DNS servers
333	c) Folding by Domain Name registration services for the purposes of
334	   preventing confusing allocations CJKV Domain Names which would,
335	   if transcoded, be the same

337	Before we can give much more reaction, we need to know which use is
338	planned.

340	The third use is important.  It should be put in place. This problem can
341	be reduced alternately by representing non-ASCII characters that are
342	domain names or other URL characters using hex-escaped character
343	references in HTML pages.

345	To characterize Han characters as ideographs or pictograms is
346	inadequate, because most of the Han ideograph have both a phonetic and
347	a semantic element. Indeed, this is enough to characterize Chinese
348	writing as phonetic, though it is other things as well. Thus, it's
349	difficult to comment on whether folding is useful for Chinese or not.

351	The first use has the problem that lightweight devices do not have
352	enough room to fit a Unicode X-axis mapping table.

354	The second use has the problem that introducing mapping will limit the
355	performance of DNS servers.  Alphabetic case mapping can be performed
356	using a single logical AND instruction; CJKV character folding requires
357	a lookup table.

359	In alphabetic scripts, there is also requirement to fold Latin, Greek,
360	Hebrew, Cyrillic, Hebrew and Arabic together. There may be a stronger
361	requirement for CJKV characters.

363	Note also that because modern OS are Unicode based and have network-
364	downloadable IMEs, "interoperability" is becoming less equivalent to
365	"use BIG5 characters only" or "use GB2312 character only" or "use
366	Shift-JIS characters only".

368	If conservative safety is really required, then
369	1) find the x-axis characters which are available in all major CJK
370	   character sets used on the internet;
371	2) only allow variants of those in domain names;
372	3) when one variant is used, no other can be allocated.  So comparisons
373	   are made on x-axis characters, but the license of that domain name
374	   can pick which y or z variants they wish to use..

376	Acknowledgement

378	The editor gratefully acknowledge the contributions of:

380	Paul Hoffman <phoffman@imc.org>
381	Jiang Mingliang <jiang@i-DNS.net>
382	Dongman Lee <dlee@icu.ac.kr>
383	Karlsson Kent <keka@im.se>

385	Author(s)

387	James SENG �����
388	i-DNS.net International Pte Ltd.
389	8 Temasek Boulevard
390	Suntec Tower 3 #24-02
391	Singapore 038988
392	Email: James@Seng.cc
393	Tel: +65 2468208

395	Yoshiro YONEYA
396	NTT Software Corporation
397	Shinagawa IntercityBldg., B-13F
398	2-15-2 Kohnan, Minato-ku Tokyo 108-6113 Japan
399	Email: yone@po.ntts.co.jp
400	Tel: +81-3-5782-7291

402	Kenny HUANG ���雷��
403	Geotempo International Ltd; TWNIC
404	3F, No 16 Kang Hwa Street, Nei Hu
405	Taipei 114, Taiwan
406	Email: huangk@alum.sinica.edu
407	Tel: +886-2-2658-6510

409	KIM Kyongsok/GIM Gyeongseog

411	References

413	[UNISTD3]   The Unicode Standard v3.0. Unicode Consortium.
414	[UCS]       ISBN 0-201-61633-5

416	[IDN]       "IETF Internationalized Domain Names Working Group",
417	            idn@ops.ietf.org, James Seng, Marc Blanchet

419	[CNRP]      "Common Name Resolution Protocol",
420	            cnrp-ietf@lists.netsol.com, Leslie Daigle

422	[CJKV]      CJKV Information Processing ISBN 1-56592-224-7

424	[C2C]       The pitfalls and Complexities of Chinese to Chinese
425	            Conversion. http://www.basistech.com/articles/C2C.html,
426	            Jack Halpern, Jouni Kerman

428	[KANJIDIC]  Sanseido���s Unicode Kanji Information Dictionary
429	            ISBN 4-385-13690-4

431	[UNICHART]  Unicode chart http://charts.unicode.org/

433	[ZONGBIAO]  Simplified Characters Standard Chart 2nd Edition, 1986

435	[UNIHAN]    Unicode Han Database, Unicode Consortium
436	            ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt

438	[ISO11941]  ISO TS 11941: Information and documentation ���
439	            Transliteration of Korean script into Latin characters.
440	            Technical Specification 11941. First edition. 1996-12-31.
441	            ISO (International Organization for Standardization).

443	[KimK 1990] "A New Proposal for a Standard Hangeul (or Korean Script)
444	            Code", KIM Kyongsok.  Computer Standards & Interfaces,
445	            Vol. 9, No. 3, pp. 187-202, 1990.

447	[KimK 1992] "A common Approach to Designing the Hangeul Code and
448	            Keyboard", KIM Kyongsok.  Computer Standards & Interfaces,
449	            Vol. 14, No. 4, pp. 297-325, Aug. 1992.

451	[KimK 1999] A Hangeul story inside computers.  KIM, Kyongsok.  Busan
452	            National University  Press.  1999. [in Hangeul]