idnits 2.17.1 draft-leegim-idn-hangeulchar-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 76 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 39 has weird spacing: '...aracter in KC...' == Line 280 has weird spacing: '...e found many ...' == Line 285 has weird spacing: '... gi-eog and k...' == Line 286 has weird spacing: '... mi-eum and k...' == Line 287 has weird spacing: '... i-eung and l...' == (4 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'IDNREQ' is defined on line 314, but no explicit reference was found in the text == Unused Reference: 'IDNA' is defined on line 324, but no explicit reference was found in the text == Unused Reference: 'ISO10646' is defined on line 340, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'IDNREQ' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'IDNA' -- Possible downref: Non-RFC (?) normative reference: ref. 'NAMEPREP' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' -- Possible downref: Non-RFC (?) normative reference: ref. 'VERSION' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'KSC5601' Summary: 4 errors (**), 0 flaws (~~), 11 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Soobok Lee 2 draft-leegim-idn-hangeulchar-00.txt GyeongSeog Gim 3 Expires 2001-Dec-27 2001-Jun-27 5 Hangeul NAMEPREP considerations version 1.0 7 Status of this Memo 9 This document is an Internet-Draft and is in full conformance with 10 all provisions of Section 10 of RFC2026. 12 Internet-Drafts are working documents of the Internet Engineering 13 Task Force (IETF), its areas, and its working groups. Note 14 that other groups may also distribute working documents as 15 Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months and may be updated, replaced, or obsoleted by other documents 19 at any time. It is inappropriate to use Internet-Drafts as 20 reference material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html 28 Distribution of this document is unlimited. Please send comments to 29 the authors or to the idn working group at idn@ops.ietf.org. 31 Abstract 33 This document suggests Hangeul-specific NAMEPREP recommendations. 34 It defines : 35 - mapping tables for half-width jamo and enclosed jamo 36 - compatibility Hangeul jamo block to be excluded 37 from compatibility decomposition in normalization step 38 - criteria for determining invalid syl-ipf jamo sequence 39 - prohibited hangul filler character in KC norm output. 41 Contents 43 Overview 44 Background: UCS Hangeul 45 Hangeul Canonical Composition 46 Hangeul Compatibility Decomposition 47 Summarized Recommendations 48 Comments on security implication of inter-lingual similarities 49 Security considerations 50 References 51 A1. Acknowledgements 52 A2. Authors 53 A3. the mapping table for enclosed jamo 54 A4. the mapping table for half-width jamo 55 Overview 57 A user can enter a domain name into an application program in a 58 myriad of fashions and the characters entered in the domain name 59 may or may not be those that are allowed in internationalized host 60 names. Thus, there must be a way to normalize the user's input 61 before the name is resolved in the DNS, which is the rationale 62 for NAMEPREP. 64 NAMEPREP design goals are : 66 - to allow users to enter host names in applications and have 67 the highest chance of getting the name correct. The user 68 should not be limited to only entering exactly the characters 69 that might have been used for domain name registration, but 70 be able to enter characters that can be unambiguously 71 normalized to characters in the registered domain name. 73 - to prohibit as few characters as possible that might be used 74 in the future and in the various contexts 76 - to allow the widest possible set of host names as long as 77 those host names do not cause other problems, such as 78 conflict with other standards. 80 The NAMEPREP process to prepare internationalized host names for 81 use in the DNS includes the following stages [NAMEPREP]: 83 - stage1 : mapping characters to other characters, 84 such as to change their case, mapping out some 85 meaningless characters 87 - stage2 : normalizing characters using normalization form KC. 88 KC form consists of two steps detailed in [UTR15] 89 - compatibility decomposition 90 - canonical composition 92 - stage3 : excluding characters that are prohibited from 93 appearing in internationalized host names 95 This draft defines special Hangeul character mappings and 96 exceptions in applying KC normalization. And this draft also 97 defines some prohibited Hangeul characters and sequences so that 98 Hangeul can be used safely in Internet identifiers such as IDN. 100 The content of this draft is subject to change with further 101 discussions and studies. 103 Background : UCS Hangeul 105 Korean Hangeul syllables are formed from a set of Hangeul letters, 106 called jamo in Korean, in a regular fashion. 108 The ISO/IEC 10646 (=Unicode Standard) contains both the complete 109 set of precomposed modern Hangeul syllable blocks and the set of 110 syl-ipf Hangeul jamo (= conjoining jamo in [UNICODE] ). This set 111 of syl-ipf jamo can be used to encode all modern and old syllable 112 blocks. For a description of syl-ipf Hangeul jamo behavior and 113 precomposed Hangeul Syllables, see [UNICODE]. 115 Hangeul jamo are divided into three classes: choseong (leading 116 consonants), jungseong(vowels), and jongseong(trailing consonants). 117 In the following paragraphs, these classes are abbreviated as L 118 (leading consonant), V(vowel), and T (trailing consonant). And for 119 use in composition, two invisible filler characters act as 120 placeholders for choseong or jungseong: 121 U+115f (Hangeul choseong filler) and 122 U+1160 (hangeul jungseong filler). 124 The UCS/Unicode contains a set of Hangeul Compatibility jamo 125 (U+3130~U+318F) which consists of spacing, nonsyl-ipf 126 Hangeul consonants and vowel elements. These characters are 127 provided solely for compatibility with the KS X 1001 (formerly 128 KS C 5601) standard. Unlike the characters found in the Hangeul 129 jamo block (U+1100 .. U+11FF), the compatibility jamo characters 130 have no syl-ipf semantics. 132 The UCS/Unicode Standard also contains 52 half-width modern Hangeul 133 jamo in the halfwidth and fullwidth forms (U+FFA0 .. U+FFDC) block 134 and enclosed Hangeul syllables and jamo in the enclosed CJK letters 135 and months block (U+3200 .. U+32FF). Enclosed ones are consisted of 136 parenthesized jamo and circled jamo. 138 Hangeul canonical composition 140 Modern Hangeul syllables can be expressed with either two or 141 three jamo, either in the form consonant + vowel or in the form 142 consonant + vowel + consonant. There are 19 possible leading 143 (initial) consonants (choseong), 21 vowels (jungseong), and 27 144 trailing (final) consonants (jongseong). Thus there are 399 145 possible two-jamo syllables and 10,773 possible three-jamo 146 syllables, for a total of 11,172 modern Hangeul syllables. 148 Each of the Hangeul syllables may be encoded by an equivalent 149 sequence of syl-ipf jamo; however, the converse is not true 150 because thousands of archaic Hangeul syllables may be encoded 151 only as a sequence of syl-ipf jamo. Implementaions that 152 use a syl-ipf jamo encoding are able to represent these archaic 153 Hangeul syllables. 155 The Hangeul syllables can be derived from syl-ipf jamo by a 156 regular process of composition. The algorithm that maps a sequence 157 of syl-ipf jamo to the encoding point for a Hangeul syllable 158 is detailed in [UNICODE]. 160 In canonical composition, the syl-ipf jamo sequence for modern 161 Hangeul syllable is transformed into the modern Hangeul syllable, 162 but the sequence for archaic Hangeul syllable and the invalid jamo 163 sequence (defective combining character sequence) are preserved 164 in this process. 166 In normalization form KC, all input sequence of code points go 167 through this canonical composition [UTR15]. If any invalid jamo 168 sequence is detected after KC normaliation stage, as it is not 169 displayable correctly and distinguishably, the sequence should be 170 prohibited from being an identifier. Whether a syl-ipf jamo 171 sequence is valid or not can be determined according to 172 the criteria detailed in [UNICODE]. 174 Hangeul compatibility decomposition 176 In normalization form KC, all input code sequence go through this 177 compatibility decomposition and then canonical composition. 179 Every Hangeul compatibility jamo and half-width jamo have 180 compatibility equivalent Hangeul syl-ipf jamo defined in 181 [UNICODE_CHART]. 183 But this equivalence does violate the semantics and combining rules 184 for compatibility jamo sequence in [KSC5601] from which UCS 185 compatibility jamo came. 187 In [KSC5601], a valid compatibility jamo sequence should start with 188 a filler followed by choseong,jungseong and jongseong (or filler) 189 to denote a Hangeul syllable. If the sequence does not fulfill this 190 criterion, its jamo should remain unchaged as compatibility jamo. 191 The same for half-width Hangeul jamo. 193 Current compatibility decomposition blindly transforms compatibility 194 jamo sequence even without a leading filler on a jamo by jamo basis. 195 For example, a valid jamo sequence "filler gi-eog a gi-eug" (U+3164 196 U+3131 U+314F U+3131) denoting a Hangeul syllable "gag"(U+AC01) 197 is errornously transformed into "jungsong_filler chosung_gi-eog 198 jungseong_a chosung_gi-eog" (U+1160 U+1100 U+1161 U+1100) that are 199 canonically composed into "syllable_ga choseong_gi-eog" 200 (U+AC00 U+1100) which are false. 202 If this could be avoided, NAMEPREP should exclude compatibility jamo 203 and half-width jamo from its compatibility decomposition step. And, 204 only valid compatibility jamo sequence should be recognized and 205 transformed into a syl-ipf jamo sequence at the mapping step before 206 KC normalization step in NAMEPREP. 208 Hangeul consonant sequence can be used as abbreviated form of long 209 Hangeul syllables sequence that represent Hangeul business name. 210 And, there may be future need to represent Hangeul syllables in 211 compatibility jamo sequences for alternative syllable writing/ 212 displaying scheme. 214 In NAMEPREP KC normalization and its inner compatibility 215 decomposition, each parethesized Hangeul jamo is transformed into 216 its compatibility equivalent character sequence consisted of one 217 pair of parentheses with inner Hangeul jamo and then that sequence 218 is treated as an invalid domain due to including prohibited 219 parenthses. 220 Each parethesized Hangeul syllable is transformed into its 221 compatibility equivalent character sequence consisted of one 222 pair of parentheses with inner Hangeul syllable and then that 223 sequence is treated as an invalid domain due to prohibited 224 parentheses. 226 So, we have no suggestion on these to-be-prohibited parenthesized 227 jamo and syllables. 229 In NAMEPREP KC normalization and its inner compatibility 230 decomposition, Circled Hangeul jamo is transformed into its 231 compatibility equivalent Hangeul jamo which is not appropriate 232 in IDN context, and preferrably, this NAMEPREP process should map 233 this circled one into the corresponding compatibility Hangeul jamo 234 before KC normalization to bypass this inappropriate 235 compatibility decomposition. 237 Circled Hangeul syllable is transformed into its compatibility 238 equivalent Hangeul Syllable and raises no problem. 240 Summarized Recommendations 242 KC normalization employed in NAMEPREP process does not preserve 243 some Hangeul code semantics and so we recommend the following 244 additional NAMEPREP actions for Hangeul codes: 246 * Stage 1: mapping 248 - circled Hangeul jamo 249 = map into the corresponding Hangeul compatibility jamo 250 code range: U+3160 ~ U+326D 251 mapping table detailed in appendix 3. 253 - half-width Hangeul jamo 254 = map into the corresponding Hangeul compatibility jamo 255 code range: U+FFA1 ~ U+FFDC 256 mapping table detailed in appendix 4. 258 - transform compatibility jamo sequence into syl-ipf jamo 259 sequence with leading filler(U+3164) removed 260 = if and only if 261 the sequence is of filler+ L+ V+ T (or filler) form. 262 = each resulting jamo with intended choseong or jongseong 263 semantics implied in the input sequence 264 * Stage 2: KC normalization 266 - compatibility decomposition 267 = exclude compatibility Hangeul jamo; preserve them 268 code range: U+3130 ~ U+318F 270 * Stage 3: prohibitions 272 - prohibit invalid syl-ipf Hangeul jamo sequences 273 = return error if not meaningful LV or LVT sequence 275 - compatibility Hangeul filler (U+3164) not combined 276 = return error 278 Comments on security implication of inter-lingual similarities 280 We have found many similarities between hangeul jamo and 281 other language scripts like japanese katakana and latin. 283 To list some of them: 285 - hangeul jamo gi-eog and katakana hu 286 - hangeul jamo mi-eum and katakana ro 287 - hangeul jamo i-eung and latin 'o' 288 - hangeul jamo ji-euth and katakana su 289 - hangeul jamo ki-eog and katakana wo 290 - hangeul jamo a and katakana to 292 - hangeul syllable ma and katakana ro-to 293 - hangeul syllable ja and katakana su-to 294 - hangeul syllable ga and katakana hu-to 295 - hangeul syllable i and digits '01' 297 Some hangeul domains similiar to katakana domains 298 can mislead some japanese to believe hangeul hostnames or 299 hangeul email addresses are the japanese ones they trust. 301 To mitigate these inherent security problems, there should be 302 well-prepared registration/dispute resolution policy that 303 can be enforced to every zone masters (including root zone 304 and its lower-level zones) and every email account masters. 305 Of course, whether this is feasible or not is beyond NAMEPREP scope. 307 Security considerations 309 This suggestion improves IDN security by prohibiting/correcting 310 non-displayable or invalid hangeul syllables/sequences in IDN. 312 References 314 [IDNREQ] Requirements of Internationalized Domain Names 315 http://www.ietf.org/internet-drafts/draft-ietf-idn-requirements-08 316 .txt 318 [UNICODE] The Unicode Consortium, "The Unicode Standard", 319 http://www.unicode.org/unicode/standard/standard.html 321 [UNICODE_CHART] THe Unicode Code Charts 322 http://www.unicode.org/charts/ 324 [IDNA] Patrik Falstrom, Paul Hoffman, 325 "Internationalizing Host Names In Applications (IDNA)", 326 http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-02.txt 328 [NAMEPREP] Paul Hoffman, Marc Blanchet, 329 "Preparation of Internationalized Host Names", Feb 2001, 330 http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-03.txt 332 [UTR15] Mark Davis and Martin Duerst. 333 Unicode Normalization Forms. Unicode Technical Report;15. 334 http://www.unicode.org/unicode/reports/tr15/ 336 [VERSION] M Blanchet 337 "Handling versions of internationalized domain names protocols", 338 http://www.ietf.org/internet-drafts/draft-ietf-idn-version-00.txt 340 [ISO10646] ISO/IEC, Information Technology - Universal 341 Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture 342 and Basic Multilingual Plane, Oct. 2000, with amendments. 344 [KSC5601] Korean Standard KS C 5601- 1987 346 A1. Acknowledgements 348 Dongman Lee and Yangwoo Ko 349 made valuable contributions to narrowing down the issues of the 350 prohibition and preservation of some hangeul characters. 352 Thank Mark Davis for his advice on useful UNICODE reference 353 documents. 355 A2. Authors 357 Soobok Lee 358 Postel Servies, Inc. 359 http://www.postel.co.kr 360 Tel: +82-11-9774-2737 362 GyeongSeog Gim 363 Department of Computer Engineering 364 Pusan National University 365 Republic of Korea 366 Tel: +82-51-510-2292 367 A3. the mapping table for enclosed jamo in the format of [VERSION] 369 version=1.0 371 3260;1.0;3131 372 3261;1.0;3134 373 3262;1.0;3137 374 3263;1.0;3139 375 3264;1.0;3141 376 3265;1.0;3142 377 3266;1.0;3145 378 3267;1.0;3147 379 3268;1.0;3148 380 3269;1.0;314A 381 326A;1.0;314B 382 326B;1.0;314C 383 326C;1.0;314D 384 326D;1.0;314E 386 A4. the mapping table for half-width jamo in the format of [VERSION] 388 version=1.0 390 FFA1;1.0;3131 391 FFA2;1.0;3132 392 FFA3;1.0;3133 393 FFA4;1.0;3134 394 FFA5;1.0;3135 395 FFA6;1.0;3136 396 FFA7;1.0;3137 397 FFA8;1.0;3138 398 FFA9;1.0;3139 399 FFAA;1.0;313A 400 FFAB;1.0;313B 401 FFAC;1.0;313C 402 FFAD;1.0;313D 403 FFAE;1.0;313E 404 FFAF;1.0;313F 405 FFB0;1.0;3140 406 FFB1;1.0;3141 407 FFB2;1.0;3142 408 FFB3;1.0;3143 409 FFB4;1.0;3144 410 FFB5;1.0;3145 411 FFB6;1.0;3146 412 FFB7;1.0;3147 413 FFB8;1.0;3148 414 FFB9;1.0;3149 415 FFBA;1.0;314A 416 FFBB;1.0;314B 417 FFBC;1.0;314C 418 FFBD;1.0;314D 419 FFBE;1.0;314E 420 FFC2;1.0;314F 421 FFC3;1.0;3150 422 FFC4;1.0;3151 423 FFC5;1.0;3152 424 FFC6;1.0;3153 425 FFC7;1.0;3154 426 FFCA;1.0;3155 427 FFCB;1.0;3156 428 FFCC;1.0;3157 429 FFCD;1.0;3158 430 FFCE;1.0;3159 431 FFCF;1.0;315A 432 FFD2;1.0;315B 433 FFD3;1.0;315C 434 FFD4;1.0;315D 435 FFD5;1.0;315E 436 FFD6;1.0;315F 437 FFD7;1.0;3160 438 FFDA;1.0;3161 439 FFDB;1.0;3162 440 FFDC;1.0;3163