idnits 2.17.1 draft-ietf-idn-hangeulchar-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 77 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 39 has weird spacing: '...aracter in KC...' == Line 279 has weird spacing: '...e found many ...' == Line 284 has weird spacing: '... gi-eog and k...' == Line 285 has weird spacing: '... mi-eum and k...' == Line 286 has weird spacing: '... i-eung and l...' == (4 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'IDNREQ' is defined on line 313, but no explicit reference was found in the text == Unused Reference: 'IDNA' is defined on line 323, but no explicit reference was found in the text == Unused Reference: 'NAMEPREP' is defined on line 327, but no explicit reference was found in the text == Unused Reference: 'ISO10646' is defined on line 339, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'IDNREQ' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'IDNA' -- Possible downref: Non-RFC (?) normative reference: ref. 'NAMEPREP' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' -- Possible downref: Non-RFC (?) normative reference: ref. 'VERSION' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'KSC5601' Summary: 4 errors (**), 0 flaws (~~), 12 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Soobok Lee 2 draft-ietf-idn-hangeulchar-00.txt GyeongSeog Gim 3 Expires 2001-Dec-27 2001-Jun-27 5 Hangeul NAMEPREP recommendation version 1.0 7 Status of this Memo 9 This document is an Internet-Draft and is in full conformance with 10 all provisions of Section 10 of RFC2026. 12 Internet-Drafts are working documents of the Internet Engineering 13 Task Force (IETF), its areas, and its working groups. Note 14 that other groups may also distribute working documents as 15 Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months and may be updated, replaced, or obsoleted by other documents 19 at any time. It is inappropriate to use Internet-Drafts as 20 reference material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html 28 Distribution of this document is unlimited. Please send comments to 29 the authors or to the idn working group at idn@ops.ietf.org. 31 Abstract 33 This document suggests Hangeul-specific NAMEPREP recommendations. 34 It defines : 35 - mapping tables for half-width jamo and enclosed jamo 36 - excluding compatibility Hangeul jamo block 37 from compatibility decomposition in normalization step 38 - criteria for determining invalid syl-ipf jamo sequence 39 - prohibited hangul filler character in KC norm output. 41 Contents 43 Overview 44 Background: UCS Hangeul 45 Hangeul Canonical Composition 46 Hangeul Compatibility Decomposition 47 Summarized Recommendations 48 Comments on security implication of inter-lingual similarities 49 Security considerations 50 References 51 A1. Acknowledgements 52 A2. Authors 53 A3. the mapping table for enclosed jamo 54 A4. the mapping table for half-width jamo 55 Overview 57 A user can enter a domain name into an application program in a 58 myriad of fashions and the characters entered in the domain name 59 may or may not be those that are allowed in internationalized host 60 names. Thus, there must be a way to normalize the user's input 61 before the name is resolved in the DNS, which is the rationale 62 for NAMEPREP. 64 NAMEPREP design goals are : 66 - to allow users to enter host names in applications and have 67 the highest chance of getting the name correct. The user 68 should not be limited to only entering exactly the characters 69 that might have been used for domain name registration, but 70 be able to enter characters that can be unambiguously 71 normalized to characters in the registered domain name. 73 - to prohibit as few characters as possible that might be used 74 in the future and in the various contexts 76 - to allow the widest possible set of host names as long as 77 those host names do not cause other problems, such as 78 conflict with other standards. 80 The NAMEPREP process to prepare internationalized host names for 81 use in the DNS includes the following stages : 83 - stage1 : mapping characters to other characters, 84 such as to change their case, mapping out some 85 meaningless characters 87 - stage2 : normalizing characters using normalization form KC. 88 KC form consists of two steps detailed in [UTR15] 89 - compatibility decomposition 90 - canonical composition 92 - stage3 : excluding characters that are prohibited from 93 appearing in internationalized host names 95 This draft defines special Hangeul character mappings and 96 exceptions in applying KC normalization. And this draft also 97 defines some prohibited Hangeul characters and sequences so that 98 Hangeul can be used safely in Internet identifiers such as IDN. 100 The content of this draft is subject to change with further 101 discussions and studies. 103 Background : UCS Hangeul 105 Korean Hangeul syllables are formed from a set of Hangeul letters, 106 called jamo in Korean, in a regular fashion. 108 The ISO/IEC 10646 (=Unicode Standard) contains both the complete 109 set of precomposed modern Hangeul syllable blocks and the set of 110 syl-ipf Hangeul jamo (= conjoining jamo in [UNICODE] ). This set 111 of syl-ipf jamo can be used to encode all modern and old syllable 112 blocks. For a description of syl-ipf Hangeul jamo behavior and 113 precomposed Hangeul Syllables, see [UNICODE]. 115 Hangeul jamo are divided into three classes: choseong (leading 116 consonants), jungseong(vowels), and jongseong(trailing consonants). 117 In the following paragraphs, these classes are abbreviated as L 118 (leading consonant), V(vowel), and T (trailing consonant). And for 119 use in composition, two invisible filler characters act as 120 placeholders for choseong or jungseong: 121 U+115f (Hangeul choseong filler) and 122 U+1160 (Hangeul jungseong filler). 124 The UCS/Unicode contains a set of Hangeul Compatibility jamo 125 (U+3130~U+318F) which consists of a filler, nonsyl-ipf 126 Hangeul consonants and vowel elements. These characters are 127 provided solely for compatibility with the KS X 1001 (formerly 128 KS C 5601) standard. Unlike the characters found in the Hangeul 129 jamo block (U+1100 .. U+11FF), the compatibility jamo characters 130 have no syl-ipf semantics, except for only their filler+L+V+T or 131 filler sequence makes a Hangeul syllable according to KS X 1001. 133 The UCS/Unicode Standard also contains 52 half-width modern Hangeul 134 jamo in the halfwidth and fullwidth forms (U+FFA0 .. U+FFDC) block 135 and enclosed Hangeul syllables and jamo in the enclosed CJK letter 136 and month block (U+3200 .. U+32FF). Enclosed ones are consisted of 137 parenthesized jamo and circled jamo. 139 Hangeul canonical composition 141 Modern Hangeul syllables can be expressed with either two or 142 three jamo, either in the form consonant + vowel or in the form 143 consonant + vowel + consonant. There are 19 possible leading 144 (initial) consonants (choseong), 21 vowels (jungseong), and 27 145 trailing (final) consonants (jongseong). Thus there are 399 146 possible two-jamo syllables and 10,773 possible three-jamo 147 syllables, for a total of 11,172 modern Hangeul syllables. 149 Each of the Hangeul syllables may be encoded by an equivalent 150 sequence of syl-ipf jamo; however, the converse is not true 151 because thousands of archaic Hangeul syllables may be encoded 152 only as a sequence of syl-ipf jamo. Implementaions that 153 use a syl-ipf jamo encoding are able to represent these archaic 154 Hangeul syllables. 156 The Hangeul syllables can be derived from syl-ipf jamo by a 157 regular process of composition. The algorithm that maps a sequence 158 of syl-ipf jamo to the encoding point for a Hangeul syllable 159 is detailed in [UNICODE]. 161 In canonical composition, the syl-ipf jamo sequence for modern 162 Hangeul syllable is transformed into the modern Hangeul syllable, 163 but the sequence for archaic Hangeul syllable and the invalid jamo 164 sequence (defective combining character sequence) are preserved 165 in this process. 167 In normalization form KC, all input sequence of code points go 168 through this canonical composition [UTR15]. If any invalid jamo 169 sequence is detected after KC normaliation stage, as it is not 170 displayable correctly and distinguishably, the sequence should be 171 prohibited from being an identifier. Whether a syl-ipf jamo 172 sequence is valid or not can be determined according to 173 the criteria detailed in [UNICODE]. 175 Hangeul compatibility decomposition 177 In normalization form KC, all input code sequence go through this 178 compatibility decomposition and then canonical composition. 180 Every Hangeul compatibility jamo and half-width jamo have its 181 corresponding compatibility equivalent Hangeul syl-ipf jamo 182 defined in [UNICODE_CHART]. 184 But this equivalence does violate the semantics and combining rules 185 for compatibility jamo sequence in [KSC5601] from which UCS 186 compatibility jamo came. 188 In [KSC5601], a valid compatibility jamo sequence should start with 189 a filler followed by choseong,jungseong and jongseong (or filler) 190 to denote a Hangeul syllable. If the sequence does not fulfill this 191 criterion, its jamo should remain unchaged as compatibility jamo. 192 The same for half-width Hangeul jamo. 194 Current compatibility decomposition blindly transforms compatibility 195 jamo sequence even without a leading filler on a jamo by jamo basis. 196 For example, a valid jamo sequence "filler gi-eog a gi-eug" (U+3164 197 U+3131 U+314F U+3131) denoting a Hangeul syllable "gag"(U+AC01) 198 is errornously transformed into "jungsong_filler chosung_gi-eog 199 jungseong_a chosung_gi-eog" (U+1160 U+1100 U+1161 U+1100) that are 200 canonically composed into "syllable_ga choseong_gi-eog" 201 (U+AC00 U+1100) which are false. 203 If false composistion could be avoided, NAMEPREP should exclude 204 compatibility jamo and half-width jamo from its compatibility 205 decomposition step. And, only valid compatibility jamo sequence 206 should be recognized and transformed into a syl-ipf jamo sequence 207 at the mapping step before KC normalization step in NAMEPREP. 209 Hangeul consonant sequence can be used as abbreviated form of long 210 Hangeul syllables sequence that represent Hangeul business name. 211 And, there may be future need to represent Hangeul syllables in 212 compatibility jamo sequences for an alternative syllable writing/ 213 displaying scheme. 215 In NAMEPREP KC normalization and its internal compatibility 216 decomposition step, each parethesized Hangeul jamo is transformed 217 into its compatibility equivalent character sequence consisted of 218 one pair of parentheses with inner Hangeul jamo and then that 219 sequence is treated as an invalid domain since the paranthesis is 220 prohibited in the domain names. For example, parenthesized gi-eog 221 (U+3200) is decomposed into U+0028 + U+1100 + U+0029 which includes 222 prohibited left and right parentheses (U+0028,U+0029 respectively). 224 Each parethesized Hangeul syllable is transformed into its 225 compatibility equivalent character sequence consisted of one 226 pair of parentheses with inner Hangeul syllable and then that 227 sequence is treated as an invalid domain since the paranthesis is 228 prohibited in the domain names. 230 In NAMEPREP KC normalization and its internal compatibility 231 decomposition step, Circled Hangeul jamo is transformed into its 232 compatibility equivalent Hangeul jamo which is not appropriate 233 in IDN context, and preferrably, this NAMEPREP process should map 234 this circled one into the corresponding compatibility Hangeul jamo 235 before KC normalization to bypass this inappropriate 236 compatibility decomposition. 238 Summarized Recommendations 240 KC normalization employed in NAMEPREP process does not preserve 241 some Hangeul code semantics and so we recommend the following 242 additional NAMEPREP actions for Hangeul codes: 244 * Stage 1: mapping 246 - circled Hangeul jamo 247 = map into the corresponding Hangeul compatibility jamo 248 code range: U+3160 ~ U+326D 249 mapping table detailed in appendix 3. 251 - half-width Hangeul jamo 252 = map into the corresponding Hangeul compatibility jamo 253 code range: U+FFA0 ~ U+FFDC 254 mapping table detailed in appendix 4. 256 - transform compatibility jamo sequence with leading filler 257 (U+3164) into syl-ipf jamo sequence 258 = if and only if 259 the sequence is of filler+ L+ V+ T (or filler) form. 260 = preserve unchanged if the sequence is not of this form 261 = so that each resulting jamo is given intended choseong 262 or jongseong semantics implied in the input sequence 263 * Stage 2: KC normalization 265 - compatibility decomposition 266 = exclude compatibility Hangeul jamo; preserve them 267 code range: U+3130 ~ U+318F 269 * Stage 3: prohibitions 271 - prohibit invalid syl-ipf Hangeul jamo sequences 272 = return error if not meaningful LV or LVT sequence 274 - compatibility Hangeul filler (U+3164) not combined 275 = return error 277 Comments on security implication of inter-lingual similarities 279 We have found many similarities between hangeul jamo and 280 other language scripts like japanese katakana and latin. 282 To list some of them: 284 - hangeul jamo gi-eog and katakana hu 285 - hangeul jamo mi-eum and katakana ro 286 - hangeul jamo i-eung and latin 'o' 287 - hangeul jamo ji-euth and katakana su 288 - hangeul jamo ki-eog and katakana wo 289 - hangeul jamo a and katakana to 291 - hangeul syllable ma and katakana ro-to 292 - hangeul syllable ja and katakana su-to 293 - hangeul syllable ga and katakana hu-to 294 - hangeul syllable i and digits '01' 296 Some hangeul domains similiar to katakana domains 297 can mislead some japanese to believe hangeul hostnames or 298 hangeul email addresses are the japanese ones they trust. 300 To mitigate these inherent security problems, there should be 301 well-prepared registration/dispute resolution policy that 302 can be enforced to every zone masters (including root zone 303 and its lower-level zones) and every email account masters. 304 Of course, whether this is feasible or not is beyond NAMEPREP scope. 306 Security considerations 308 This suggestion improves IDN security by prohibiting/correcting 309 non-displayable or invalid hangeul syllables/sequences in IDN. 311 References 313 [IDNREQ] Requirements of Internationalized Domain Names 314 http://www.ietf.org/internet-drafts/draft-ietf-idn-requirements-08 315 .txt 317 [UNICODE] The Unicode Consortium, "The Unicode Standard", 318 http://www.unicode.org/unicode/standard/standard.html 320 [UNICODE_CHART] THe Unicode Code Charts 321 http://www.unicode.org/charts/ 323 [IDNA] Patrik Falstrom, Paul Hoffman, 324 "Internationalizing Host Names In Applications (IDNA)", 325 http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-02.txt 327 [NAMEPREP] Paul Hoffman, Marc Blanchet, 328 "Preparation of Internationalized Host Names", Feb 2001, 329 http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-03.txt 331 [UTR15] Mark Davis and Martin Duerst. 332 Unicode Normalization Forms. Unicode Technical Report;15. 333 http://www.unicode.org/unicode/reports/tr15/ 335 [VERSION] M Blanchet 336 "Handling versions of internationalized domain names protocols", 337 http://www.ietf.org/internet-drafts/draft-ietf-idn-version-00.txt 339 [ISO10646] ISO/IEC, Information Technology - Universal 340 Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture 341 and Basic Multilingual Plane, Oct. 2000, with amendments. 343 [KSC5601] Korean Standard KS C 5601- 1987 345 A1. Acknowledgements 347 Dongman Lee and Yangwoo Ko 348 made valuable contributions to narrowing down the issues of the 349 prohibition and preservation of some hangeul characters. 351 Thank Mark Davis for his advice on useful UNICODE reference 352 documents. 354 A2. Authors 356 Soobok Lee 357 Postel Services, Inc. 358 http://www.postel.co.kr 359 Tel: +82-11-9774-2737 361 GyeongSeog Gim 362 Department of Computer Engineering 363 Pusan National University 364 Republic of Korea 365 Tel: +82-51-510-2292 366 A3. the mapping table for enclosed jamo in the format of [VERSION] 368 version=1.0 370 3260;1.0;3131 371 3261;1.0;3134 372 3262;1.0;3137 373 3263;1.0;3139 374 3264;1.0;3141 375 3265;1.0;3142 376 3266;1.0;3145 377 3267;1.0;3147 378 3268;1.0;3148 379 3269;1.0;314A 380 326A;1.0;314B 381 326B;1.0;314C 382 326C;1.0;314D 383 326D;1.0;314E 385 A4. the mapping table for half-width jamo in the format of [VERSION] 387 version=1.0 389 FFA0;1.0;3164 390 FFA1;1.0;3131 391 FFA2;1.0;3132 392 FFA3;1.0;3133 393 FFA4;1.0;3134 394 FFA5;1.0;3135 395 FFA6;1.0;3136 396 FFA7;1.0;3137 397 FFA8;1.0;3138 398 FFA9;1.0;3139 399 FFAA;1.0;313A 400 FFAB;1.0;313B 401 FFAC;1.0;313C 402 FFAD;1.0;313D 403 FFAE;1.0;313E 404 FFAF;1.0;313F 405 FFB0;1.0;3140 406 FFB1;1.0;3141 407 FFB2;1.0;3142 408 FFB3;1.0;3143 409 FFB4;1.0;3144 410 FFB5;1.0;3145 411 FFB6;1.0;3146 412 FFB7;1.0;3147 413 FFB8;1.0;3148 414 FFB9;1.0;3149 415 FFBA;1.0;314A 416 FFBB;1.0;314B 417 FFBC;1.0;314C 418 FFBD;1.0;314D 419 FFBE;1.0;314E 420 FFC2;1.0;314F 421 FFC3;1.0;3150 422 FFC4;1.0;3151 423 FFC5;1.0;3152 424 FFC6;1.0;3153 425 FFC7;1.0;3154 426 FFCA;1.0;3155 427 FFCB;1.0;3156 428 FFCC;1.0;3157 429 FFCD;1.0;3158 430 FFCE;1.0;3159 431 FFCF;1.0;315A 432 FFD2;1.0;315B 433 FFD3;1.0;315C 434 FFD4;1.0;315D 435 FFD5;1.0;315E 436 FFD6;1.0;315F 437 FFD7;1.0;3160 438 FFDA;1.0;3161 439 FFDB;1.0;3162 440 FFDC;1.0;3163