idnits 2.17.1 draft-yoneya-precis-mappings-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([RFC5895], [I-D.ietf-precis-framework], [I-D.ietf-precis-problem-statement]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 3, 2012) is 4215 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC5226' is mentioned on line 220, but not defined ** Obsolete undefined reference: RFC 5226 (Obsoleted by RFC 8126) == Unused Reference: 'RFC3490' is defined on line 259, but no explicit reference was found in the text == Unused Reference: 'RFC3491' is defined on line 263, but no explicit reference was found in the text == Unused Reference: 'RFC3722' is defined on line 267, but no explicit reference was found in the text == Unused Reference: 'RFC3748' is defined on line 270, but no explicit reference was found in the text == Unused Reference: 'RFC4013' is defined on line 274, but no explicit reference was found in the text == Unused Reference: 'RFC4314' is defined on line 277, but no explicit reference was found in the text == Unused Reference: 'RFC6122' is defined on line 288, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3454 (Obsoleted by RFC 7564) ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (Obsoleted by RFC 5891) ** Obsolete normative reference: RFC 4013 (Obsoleted by RFC 7613) ** Obsolete normative reference: RFC 6122 (Obsoleted by RFC 7622) == Outdated reference: A later version (-23) exists of draft-ietf-precis-framework-03 == Outdated reference: A later version (-09) exists of draft-ietf-precis-problem-statement-06 Summary: 9 errors (**), 0 flaws (~~), 11 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. YONEYA 3 Internet-Draft JPRS 4 Intended status: Informational T. NEMOTO 5 Expires: April 6, 2013 Keio University 6 October 3, 2012 8 Mapping characters for precis classes 9 draft-yoneya-precis-mappings-03 11 Abstract 13 Preparation and comparison of internationalized strings ("precis") 14 framework [I-D.ietf-precis-framework] is defining several classes of 15 strings for preparation and comparison. In the document, case 16 mapping is defined because many of protocols handle case sensitive or 17 case insensitive string comparison and therefore preparation of 18 string is mandatory. As described in IDNA mapping [RFC5895] and 19 precis problem statement [I-D.ietf-precis-problem-statement], 20 mappings in internationalized strings are not limited to case, but 21 also width, delimiters and/or other specials are taken into 22 consideration. This document is a guideline for authors of protocol 23 profiles of precis framework and describes the mappings that must be 24 considered between receiving user input and passing permitted code 25 points to internationalized protocols. 27 Status of this Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on April 6, 2013. 44 Copyright Notice 46 Copyright (c) 2012 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 1. Introduction 61 In many cases, user input of internationalized strings is generated 62 by input method editor ("IME") or copy-and-paste from free text. 63 Usually users do not care case and/or width of input characters 64 because they are identical for users' eyes. Further, users rarely 65 switch IME state to input special characters such as protocol 66 elements. For Internationalized Domain Names ("IDNs"), IDNA Mapping 67 [RFC5895] describes methods to treat these issues. For precis 68 strings, case mapping is defined as a process in precis framework 69 [I-D.ietf-precis-framework], but width mapping, delimiter mapping 70 and/or special mapping are not defined. Handling of mappings other 71 than case is also important to increase chance of strings match as 72 users expect. This document is a guideline for authors of protocol 73 profiles of precis framework and describes the mappings that must be 74 considered between receiving user input and passing permitted code 75 points to internationalized protocols. 77 2. Types of mapping 79 This document defines two types of mapping. One is protocol 80 independent mapping that doesn't depend on protocol rules and the 81 other is protocol dependent mapping that depend on protocol rules. 82 This document defines some mappings in these mapping types. Authors 83 of protocol profiles of precis framework should need to give careful 84 consideration to choice of mappings. 86 Each mapping type is described in following sections. 88 3. Protocol independent mapping 90 Protocol independent mapping is a mapping that doesn't depend on 91 protocol rules. 93 3.1. Width mapping 95 Fullwidth and halfwidth characters (those defined with Decomposition 96 Types and ) are mapped to their decomposition mappings 97 as shown in the Unicode character database [Unicode]. 99 Width mapping will increase backward compatibility with Stringprep 100 [RFC3454] and precis framework [I-D.ietf-precis-framework]. Because 101 in a Stringprep profile which specifies Unicode normalization form KC 102 (NFKC) for normalization method, fullwidth/halfwidth characters are 103 mapped into its compatible form. If a precis framework profile 104 specified NFKC (which is not recommended), width mapping might not be 105 useful. 107 4. Protocol dependent mapping 109 Protocol dependent mapping is a mapping that depend on protocol 110 rules. 112 4.1. Delimiter mapping 114 Definitions of delimiters in certain protocols are differ from each 115 other. Therefore, delimiter mapping table should be based on well 116 defined mapping table for each protocol. 118 One of the most useful case of delimiter mapping is when FULL STOP 119 character (U+002E) is a delimiter as well as domain name. Some of 120 IME generates FULL STOP compatible characters such as IDEOGRAPHIC 121 FULL STOP (U+3002) when users type FULL STOP on the keyboard. 123 4.2. Special mapping 125 Certain protocols have characters which need to map different 126 character from precis framework defined mapping rule other than 127 delimiter characters. In this document, these mappings are named 128 special mapping. They are differ from each protocol. Therefore, 129 special mapping table should be based on well defined mapping table 130 for each protocol. Examples of special mapping are following; 132 o White spaces are mapped to SPACE (U+0020) 134 o Some characters such as control characters are mapped to nothing 135 (Deletion) 137 LDAPprep[RFC4518] defines the rule that some codepoints(Appendix B.4) 138 are mapped to SPACE (U+0020). 140 4.3. Local case mapping 142 Local case mapping is case folding that depend on language context. 143 For example, given there is upper case I in a user ID strings, you 144 should care what's language context that this user ID depend on when 145 this character is mapped into lower case character. And if this 146 depends on Turkish, the character should be mapped into LATIN SMALL 147 LETTER DOTLESS I (U+0131) as this character's lower case. 149 This document defines characters that need local case mapping based 150 on the Specialcasing.txt [Specialcasing] in section 3.13 of The 151 Unicode Standerd [Unicode] to solve such a problem. Local case 152 mapping targets only characters that get two different results to 153 perfom just casefolding that is defined in the Casefolding.txt 154 [Casefolding] and perfom special casefolding that is defined in the 155 Specialcasing.txt then casefolding, because precis framework have 156 casefolding. 158 There are two types casefoldings defined as Unconditional Mappings 159 and Conditional Mappings in the Specialcasing.txt. Conditional 160 mappings have Language-Insensitive Mappings that targets characters 161 whose full case mappings do not depend on language, but do depend on 162 context and Language-Sensitive Mappings that these are characters 163 whose full case mappings depend on language and perhaps also context. 165 Of these mappings, characters that Unconditional Mappings and 166 Language-Insensitive Mappings in Conditional Mappings target are 167 mapped into same codepoint(s) with just casefolding and special 168 casefolding then casefolding. But characters that Language-Sensitive 169 Mappings in Conditional Mappings targets are mapped into different 170 codepoint with them. Therefore this document defined characters that 171 are a part of characters of Lithuanian(lt), Turkish(tr) and 172 Azerbaijanian(az) that Language-Sensitive Mappings targets as targets 173 for local case mapping. 175 A list of characters that need Local case mapping are as follows. 177 Format: 178 ; ; ; 180 lt; 0049; 0069 0307; LATIN CAPITAL LETTER I 181 lt; 004A; 006A 0307; LATIN CAPITAL LETTER J 182 lt; 012E; 012F 0307; LATIN CAPITAL LETTER I WITH OGONEK 183 lt; 00CC; 0069 0307 0300; LATIN CAPITAL LETTER I WITH GRAVE 184 lt; 00CD; 0069 0307 0301; LATIN CAPITAL LETTER I WITH ACUTE 185 lt; 0128; 0069 0307 0303; LATIN CAPITAL LETTER I WITH TILDE 186 tr; 0130; 0069; LATIN CAPITAL LETTER I WITH DOT ABOVE 187 tr; 0049; 0131; LATIN CAPITAL LETTER I 188 az; 0130; 0069; LATIN CAPITAL LETTER I WITH DOT ABOVE 189 az; 0049; 0131; LATIN CAPITAL LETTER I 191 Section 6 "IANA Considerations" contains a template to registry these 192 characters to IANA as precis local case mapping registry. 194 5. Applying order of mapping 196 Basically, applying order of mapping that this document describes 197 aren't sensitive. This section defines applying order of mapping to 198 minimize effect of codepoint change by mappings. This mapping order 199 is very general and was designed to be acceptable to the widest user 200 community. 202 1. width mapping 204 2. delimiter mapping 206 3. special mapping 208 4. local case mapping 210 5. precis framework 212 Mappings that this document describes should be performed before 213 precis framework. 215 6. IANA Considerations 217 6.1. precis local case mapping registry 219 IANA is requested to create a registry of precis local case mapping. 220 In accordance with [RFC5226], the registration policy is "RFC 221 Required". 223 6.2. Template for precis local case mapping registry 225 The following information is to be given when a new precis local case 226 mapping rule is created. The registration template is as follows: 228 Language: language name 230 Codepoint: Local case mapping that can be applied when this code 231 point exists in the strings 233 Local lowercase: The lowercase codepoint after performing local 234 case mapping 236 Comment: Character name of the code point 238 Appendix C contains further discussion and a table from which that 239 registry can be initialized. 241 7. Security Considerations 243 TBD. 245 8. Acknowledgment 247 Martin Duerst suggested a need for the case folding about the 248 mapping(map final sigma to sigma, German sz to ss,.). 250 Pete Resnick et al. gave important suggestion for this document 251 during at WG meeting. 253 9. References 255 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 256 Internationalized Strings ("stringprep")", RFC 3454, 257 December 2002. 259 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 260 "Internationalizing Domain Names in Applications (IDNA)", 261 RFC 3490, March 2003. 263 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 264 Profile for Internationalized Domain Names (IDN)", 265 RFC 3491, March 2003. 267 [RFC3722] Bakke, M., "String Profile for Internet Small Computer 268 Systems Interface (iSCSI) Names", RFC 3722, April 2004. 270 [RFC3748] Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H. 271 Levkowetz, "Extensible Authentication Protocol (EAP)", 272 RFC 3748, June 2004. 274 [RFC4013] Zeilenga, K., "SASLprep: Stringprep Profile for User Names 275 and Passwords", RFC 4013, February 2005. 277 [RFC4314] Melnikov, A., "IMAP4 Access Control List (ACL) Extension", 278 RFC 4314, December 2005. 280 [RFC4518] Zeilenga, K., "Lightweight Directory Access Protocol 281 (LDAP): Internationalized String Preparation", RFC 4518, 282 June 2006. 284 [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for 285 Internationalized Domain Names in Applications (IDNA) 286 2008", RFC 5895, September 2010. 288 [RFC6122] Saint-Andre, P., "Extensible Messaging and Presence 289 Protocol (XMPP): Address Format", RFC 6122, March 2011. 291 [I-D.ietf-precis-framework] 292 Saint-Andre, P. and M. Blanchet, "PRECIS Framework: 293 Preparation and Comparison of Internationalized Strings in 294 Application Protocols", draft-ietf-precis-framework-03 295 (work in progress), May 2012. 297 [I-D.ietf-precis-problem-statement] 298 Blanchet, M. and A. Sullivan, "Stringprep Revision and 299 PRECIS Problem Statement", 300 draft-ietf-precis-problem-statement-06 (work in progress), 301 July 2012. 303 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 304 6.1.0", , 305 2012. 307 [Casefolding] 308 "CaseFolding-6.1.0.txt", Unicode Character Database, July 309 2011, . 312 [Specialcasing] 313 "SpecialCasing-6.1.0.txt", Unicode Character Database, 314 July 2011, . 317 Appendix A. Mapping type list each protocol 319 A.1. Mapping type list for each protocol 321 This table is the mapping type list for each protocol. Values marked 322 "o" indicate that the protocol use the type of mapping. Values 323 marked "-" indicate that the protocol doesn't use the type of 324 mapping. 326 +----------------------+-------------+-----------+------+---------+ 327 | \ Type of mapping | Width | Delimiter | Case | Special | 328 | RFC \ | (NFKC) | | | | 329 +----------------------+-------------+-----------+------+---------+ 330 | 3490 | - | o | - | - | 331 | 3491 | o | - | o | - | 332 | 3722 | o | - | o | - | 333 | 3748 | o | - | - | o | 334 | 4013 | o | - | - | o | 335 | 4314 | o | - | - | o | 336 | 4518 | o | - | o | o | 337 | 6120 | - | - | o | - | 338 +----------------------+-------------+-----------+------+---------+ 340 Appendix B. Codepoints which need special mapping 342 B.1. RFC3748 344 Non-ASCII space characters [StringPrep, C.1.2] that can be mapped to 345 SPACE (U+0020). 347 B.2. RFC4013 349 Non-ASCII space characters [StringPrep, C.1.2] that can be mapped to 350 SPACE (U+0020). 352 B.3. RFC4314 354 Non-ASCII space characters [StringPrep, C.1.2] that can be mapped to 355 SPACE (U+0020). 357 B.4. RFC4518 359 Codepoints mapped to SPACE (U+0020) are following; 361 U+0009 (CHARACTER TABULATION) 362 U+000A (LINE FEED (LF)) 363 U+000B (LINE TABULATION) 364 U+000C (FORM FEED (FF)) 365 U+000D (CARRIAGE RETURN (CR)) 366 U+0085 (NEXT LINE (NEL)) 367 U+0020 (SPACE) 368 U+00A0 (NO-BREAK SPACE) 369 U+1680 (OGHAM SPACE MARK) 370 U+2000 (EN QUAD) 371 U+2001 (EM QUAD) 372 U+2002 (EN SPACE) 373 U+2003 (EM SPACE) 374 U+2004 (THREE-PER-EM SPACE) 375 U+2005 (FOUR-PER-EM SPACE) 376 U+2006 (SIX-PER-EM SPACE) 377 U+2007 (FIGURE SPACE) 378 U+2008 (PUNCTUATION SPACE) 379 U+2009 (THIN SPACE) 380 U+200A (HAIR SPACE) 381 U+2028 (Line Separator) 382 U+2029 (Paragraph Separator) 383 U+202F (NARROW NO-BREAK SPACE) 384 U+205F (MEDIUM MATHEMATICAL SPACE) 385 U+3000 (IDEOGRAPHIC SPACE) 387 All other control code (e.g., Cc) points or code points with a 388 control function (e.g., Cf) are mapped to nothing. Codepoints mapped 389 to nothing that aren't specified by Stringprep are following; 391 U+0000-0008 392 U+000E-001F 393 U+007F-0084 394 U+0086-009F 395 U+06DD 396 U+070F 397 U+180E 398 U+200E-200F 399 U+202A-202E 400 U+2061-2063 401 U+206A-206F 402 U+FFF9-FFFB 403 U+1D173-1D17A 404 U+E0001 405 U+E0020-E007F 407 Appendix C. The initial precis local case mapping registrations 409 C.1. Lithuanian 411 language: Lithuanian 413 Codepoint: U+0049 414 Local lowercase: U+0069 U+0307 415 Comment: LATIN CAPITAL LETTER I 417 Codepoint: U+004A 418 Local lowercase: U+006A U+0307 419 Comment: LATIN CAPITAL LETTER J 421 Codepoint: U+012E 422 Local lowercase: U+012F U+0307 423 Comment: LATIN CAPITAL LETTER I WITH OGONEK 425 Codepoint: U+00CC 426 Local lowercase: U+0069 U+0307 U+0300 427 Comment: LATIN CAPITAL LETTER I WITH GRAVE 429 Codepoint: U+00CD 430 Local lowercase: U+0069 U+0307 U+0301 431 Comment: LATIN CAPITAL LETTER I WITH ACUTE 433 Codepoint: U+0128 434 Local lowercase: U+0069 U+0307 U+0303 435 Comment: LATIN CAPITAL LETTER I WITH TILDE 437 C.2. Turkish 439 language: Turkish 441 Codepoint: U+0130 442 Local lowercase: U+0069 443 Comment: LATIN CAPITAL LETTER I WITH DOT ABOVE 445 Codepoint: U+0049 446 Local lowercase: U+0131 447 Comment: LATIN CAPITAL LETTER I 449 C.3. Azerbaijanian 451 language: Azerbaijanian 453 Codepoint: U+0130 454 Local lowercase: U+0069 455 Comment: LATIN CAPITAL LETTER I WITH DOT ABOVE 457 Codepoint: U+0049 458 Local lowercase: U+0131 459 Comment: LATIN CAPITAL LETTER I 461 Appendix D. Change Log 463 D.1. Changes since -00 465 o Add the Section 2.3 "Special mapping" in Section 2 Type of 466 mappings. 468 o Add the topic about the special mapping and additional case 469 mapping in Section 3 "Discussion". 471 o Add Appendices; 472 Appendix A "Mapping type list each protocols" 473 Appendix B "Code point list is need special mapping" 474 Appendix D "Change Log" 476 o Add the Section 8 "Acknowledgment". 478 D.2. Changes since -01 480 o Modify document structure as a guideline for authors of protocol 481 profiles of precis framework. 483 o Group mappings that this document defines into two types. 485 o Add the Section 5 "Applying order of mapping". 487 o Delete the section 3 "Discussion". 489 D.3. Changes since -02 491 o Modify the Section 4.3 "Local case mapping" for defining 492 characters that local case mapping targets. 494 o Request creating registry of precis local case mapping to IANA and 495 define a template for registry of precis local case mapping in the 496 Section 6 "IANA Considerations". 498 o Add the Appendix C "The initial precis local case mapping 499 registrations". 501 Authors' Addresses 503 Yoshiro YONEYA 504 JPRS 505 Chiyoda First Bldg. East 13F 506 3-8-1 Nishi-Kanda 507 Chiyoda-ku, Tokyo 101-0065 508 Japan 510 Phone: +81 3 5215 8451 511 Email: yoshiro.yoneya@jprs.co.jp 513 Takahiro NEMOTO 514 Keio University 515 Graduate School of Media Design 516 4-1-1 Hiyoshi, Kohoku-ku 517 Yokohama, Kanagawa 223-8526 518 Japan 520 Phone: +81 45 564 2517 521 Email: t.nemo10@kmd.keio.ac.jp