idnits 2.17.1 draft-ietf-ldapbis-strprep-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. ** The abstract seems to contain references ([CONTROLCHARACTERS], [RFC2119], [CharModel], [Unicode], [Glossary]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 832 has weird spacing: '...for the purpo...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (15 February 2004) is 7376 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC3377' is mentioned on line 146, but not defined ** Obsolete undefined reference: RFC 3377 (Obsoleted by RFC 4510) == Missing Reference: 'Stringprep' is mentioned on line 257, but not defined -- No information found for draft-ietf-ldapbis-roadmap-xx - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'Roadmap' -- No information found for draft-hoffman-rfc3454bis-xx - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'StringPrep' -- No information found for draft-ietf-ldapbis-syntaxes-xx - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'Syntaxes' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX15' -- No information found for draft-zeilenga-ldapbis-strmatch-xx - is the name correct? Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet-Draft Kurt D. Zeilenga 3 Intended Category: Standard Track OpenLDAP Foundation 4 Expires in six months 15 February 2004 6 LDAP: Internationalized String Preparation 7 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with all 12 provisions of Section 10 of RFC2026. 14 Distribution of this memo is unlimited. Technical discussion of this 15 document will take place on the IETF LDAP Revision Working Group 16 mailing list . Please send editorial 17 comments directly to the author . 19 Internet-Drafts are working documents of the Internet Engineering Task 20 Force (IETF), its areas, and its working groups. Note that other 21 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as ``work in progress.'' 27 The list of current Internet-Drafts can be accessed at 28 . The list of 29 Internet-Draft Shadow Directories can be accessed at 30 . 32 Copyright (C) The Internet Society (2004). All Rights Reserved. 34 Please see the Full Copyright section near the end of this document 35 for more information. 37 Abstract 39 The previous Lightweight Directory Access Protocol (LDAP) technical 40 specifications did not precisely define how character string matching 41 is to be performed. This led to a number of usability and 42 interoperability problems. This document defines string preparation 43 algorithms for character-based matching rules defined for use in LDAP. 45 Conventions 47 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 48 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 49 document are to be interpreted as described in BCP 14 [RFC2119]. 51 Character names in this document use the notation for code points and 52 names from the Unicode Standard [Unicode]. For example, the letter 53 "a" may be represented as either or . 54 In the lists of mappings and the prohibited characters, the "U+" is 55 left off to make the lists easier to read. The comments for character 56 ranges are shown in square brackets (such as "[CONTROL CHARACTERS]") 57 and do not come from the standard. 59 Note: a glossary of terms used in Unicode can be found in [Glossary]. 60 Information on the Unicode character encoding model can be found in 61 [CharModel]. 63 1. Introduction 65 1.1. Background 67 A Lightweight Directory Access Protocol (LDAP) [Roadmap] matching rule 68 [Syntaxes] defines an algorithm for determining whether a presented 69 value matches an attribute value in accordance with the criteria 70 defined for the rule. The proposition may be evaluated to True, 71 False, or Undefined. 73 True - the attribute contains a matching value, 75 False - the attribute contains no matching value, 77 Undefined - it cannot be determined whether the attribute contains 78 a matching value or not. 80 For instance, the caseIgnoreMatch matching rule may be used to compare 81 whether the commonName attribute contains a particular value without 82 regard for case and insignificant spaces. 84 1.2. X.500 String Matching Rules 86 "X.520: Selected attribute types" [X.520] provides (amongst other 87 things) value syntaxes and matching rules for comparing values 88 commonly used in the Directory. These specifications are inadequate 89 for strings composed of Unicode [Unicode] characters. 91 The caseIgnoreMatch matching rule [X.520], for example, is simply 92 defined as being a case insensitive comparison where insignificant 93 spaces are ignored. For printableString, there is only one space 94 character and case mapping is bijective, hence this definition is 95 sufficient. However, for Unicode string types such as 96 universalString, this is not sufficient. For example, a case 97 insensitive matching implementation which folded lower case characters 98 to upper case would yield different different results than an 99 implementation which used upper case to lower case folding. Or one 100 implementation may view space as referring to only SPACE (U+0020), a 101 second implementation may view any character with the space separator 102 (Zs) property as a space, and another implementation may view any 103 character with the whitespace (WS) category as a space. 105 The lack of precise specification for character string matching has 106 led to significant interoperability problems. When used in 107 certificate chain validation, security vulnerabilities can arise. To 108 address these problems, this document defines precise algorithms for 109 preparing character strings for matching. 111 1.3. Relationship to "stringprep" 113 The character string preparation algorithms described in this document 114 are based upon the "stringprep" approach [StringPrep]. In 115 "stringprep", presented and stored values are first prepared for 116 comparison and so that a character-by-character comparison yields the 117 "correct" result. 119 The approach used here is a refinement of the "stringprep" 120 [StringPrep] approach. Each algorithm involves two additional 121 preparation steps. 123 a) prior to applying the Unicode string preparation steps outlined in 124 "stringprep", the string is transcoded to Unicode; 126 b) after applying the Unicode string preparation steps outlined in 127 "stringprep", characters insignificant to the matching rules are 128 removed. 130 Hence, preparation of character strings for X.500 matching involves 131 the following steps: 133 1) Transcode 134 2) Map 135 3) Normalize 136 4) Prohibit 137 5) Check Bidi (Bidirectional) 138 6) Insignificant Character Removal 140 These steps are described in Section 2. 142 1.4. Relationship to the LDAP Technical Specification 144 This document is a integral part of the LDAP technical specification 145 [Roadmap] which obsoletes the previously defined LDAP technical 146 specification [RFC3377] in its entirety. 148 This document details new LDAP internationalized character string 149 preparation algorithms used by [Syntaxes] and possible other technical 150 specifications defining LDAP syntaxes and/or matching rules. 152 1.5. Relationship to X.500 154 LDAP is defined [Roadmap] in X.500 terms as an X.500 access mechanism. 155 As such, there is a strong desire for alignment between LDAP and X.500 156 syntax and semantics. The character string preparation algorithms 157 described in this document are based upon "Internationalized String 158 Matching Rules for X.500" [XMATCH] proposal to ITU/ISO Joint Study 159 Group 2. 161 2. String Preparation 163 The following six-step process SHALL be applied to each presented and 164 attribute value in preparation for character string matching rule 165 evaluation. 167 1) Transcode 168 2) Map 169 3) Normalize 170 4) Prohibit 171 5) Check bidi 172 6) Insignificant Character Removal 174 Failure in any step causes the assertion to evaluate to Undefined. 176 This process is intended to act upon non-empty character strings. If 177 the string to prepare is empty, this process is not applied and the 178 assertion is evaluated to Undefined. 180 The character repertoire of this process is Unicode 3.2 [Unicode]. 182 2.1. Transcode 184 Each non-Unicode string value is transcoded to Unicode. 186 TeletexString [X.680][T.61] values are transcoded to Unicode as 187 described in Appendix A. 189 PrintableString [X.680] value are transcoded directly to Unicode. 191 UniversalString, UTF8String, and bmpString [X.680] values need not be 192 transcoded as they are Unicode-based strings (in the case of 193 bmpString, a subset of Unicode). 195 The output is the transcoded string. 197 2.2. Map 199 SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code 200 points are mapped to nothing. COMBINING GRAPHEME JOINER (U+034F) and 201 VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also 202 mapped to nothing. The OBJECT REPLACEMENT CHARACTER (U+FFFC) is 203 mapped to nothing. 205 CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE 206 TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR) 207 (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020). 209 All other control code points (e.g., Cc) or code points with a control 210 function (e.g., Cf) are mapped to nothing. 212 ZERO WIDTH SPACE (U+200B) is mapped to nothing. All other code points 213 with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or 214 Zp) are mapped to SPACE (U+0020). 216 Appendix B provides a table detailing the above mappings. 218 For case ignore, numeric, and stored prefix string matching rules, 219 characters are case folded per B.2 of [StringPrep]. 221 The output is the mapped string. 223 2.3. Normalize 225 The input string is be normalized to Unicode Form KC (compatibility 226 composed) as described in [UAX15]. The output is the normalized 227 string. 229 2.4. Prohibit 231 All Unassigned code points are prohibited. Unassigned code points are 232 listed in Table A.1 of [StringPrep]. 234 Characters which, per Section 5.8 of [Stringprep], change display 235 properties or are deprecated are prohibited. These characters are are 236 listed in Table C.8 of [StringPrep]. 238 Private Use (U+E000-F8FF, F0000-FFFFD, 100000-10FFFD) code points are 239 prohibited. 241 All non-character code points (U+FDD0-FDEF, FFFE-FFFF, 1FFFE-1FFFF, 242 2FFFE-2FFFF, 3FFFE-3FFFF, 4FFFE-4FFFF, 5FFFE-5FFFF, 6FFFE-6FFFF, 243 7FFFE-7FFFF, 8FFFE-8FFFF, 9FFFE-9FFFF, AFFFE-AFFFF, BFFFE-BFFFF, 244 CFFFE-CFFFF, DFFFE-DFFFF, EFFFE-EFFFF, FFFFE-FFFFF, 10FFFE-10FFFF) are 245 prohibited. 247 Surrogate codes (U+D800-DFFFF) are prohibited. 249 The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited. 251 The step fails if the input string contains any prohibited code point. 252 Otherwise, the output is the input string. 254 2.5. Check bidi 256 This step fails if the input string does not conform to the the 257 bidirectional character restrictions detailed in 6 of [Stringprep]. 258 Otherwise, the output is the input string. 260 2.6. Insignificant Character Removal 262 In this step, characters insignificant to the matching rule are to be 263 removed. The characters to be removed differ from matching rule to 264 matching rule. 266 Section 2.6.1 applies to case ignore and exact string matching. 267 Section 2.6.2 applies to numericString matching. 268 Section 2.6.3 applies to telephoneNumber matching. 270 2.6.1. Insignificant Space Removal 272 For the purposes of this section, a space is defined to be the SPACE 273 (U+0020) code point followed by no combining marks. 275 NOTE - The previous steps ensure that the string cannot contain any 276 code points in the separator class, other than SPACE (U+0020). 278 If the input string consists entirely of spaces or is empty, the 279 output is a string consisting of exactly one space (e.g. " "). 281 Otherwise, the following spaces are removed: 282 - leading spaces (i.e. those preceding the first character that is 283 not a space); 284 - trailing spaces (i.e. those following the last character that is 285 not a space); 286 - multiple consecutive spaces (these are taken as equivalent to a 287 single space character). 289 For example, removal of spaces from the Form KC string: 290 "foobar" 291 would result in the output string: 292 "foobar" 293 and the Form KC string: 294 "" 295 would result in the output string: 296 "". 298 2.6.2. numericString Insignificant Character Removal 300 For the purposes of this section, a space is defined to be the SPACE 301 (U+0020) code point followed by no combining marks. 303 All spaces are regarded as not significant. If the input string 304 consists entirely of spaces or is empty, the output is a string 305 consisting of exactly one space (e.g. " "). Otherwise, all spaces are 306 to be removed. 308 For example, removal of spaces from the Form KC string: 309 "123456" 310 would result in the output string: 311 "123456" 312 and the Form KC string: 313 "" 314 would result in the output string: 315 "". 317 2.6.3. telephoneNumber Insignificant Character Removal 319 For the purposes of this section, a hyphen is defined to be 320 HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010), 321 NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS 322 (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by no 323 combining marks and a space is defined to be the SPACE (U+0020) code 324 point followed by no combining marks. 326 All hyphens and spaces are considered insignificant. If the string 327 contains only spaces and hyphens or is empty, then the output is a 328 string consisting of one space. Otherwise, all hyphens and spaces are 329 removed. 331 For example, removal of hyphens and spaces from the Form KC string: 332 "123456" 333 would result in the output string: 334 "123456" 335 and the Form KC string: 336 "" 337 would result in the output string: 338 "". 340 3. Security Considerations 342 "Preparation for International Strings ('stringprep')" [StringPrep] 343 security considerations generally apply to the algorithms described 344 here. 346 4. Contributors 348 Appendix A and B of this document were authored by Howard Chu 349 of Symas Corporation (based upon information provided 350 in RFC 1345). 352 5. Acknowledgments 354 The approach used in this document is based upon design principles and 355 algorithms described in "Preparation of Internationalized Strings 356 ('stringprep')" [StringPrep] by Paul Hoffman and Marc Blanchet. Some 357 additional guidance was drawn from Unicode Technical Standards, 358 Technical Reports, and Notes. 360 This document is a product of the IETF LDAP Revision (LDAPBIS) Working 361 Group. 363 6. Author's Address 364 Kurt D. Zeilenga 365 OpenLDAP Foundation 367 Email: Kurt@OpenLDAP.org 369 7. References 371 7.1. Normative References 373 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 374 Requirement Levels", BCP 14 (also RFC 2119), March 1997. 376 [Roadmap] Zeilenga, K. (editor), "LDAP: Technical Specification 377 Road Map", draft-ietf-ldapbis-roadmap-xx.txt, a work in 378 progress. 380 [StringPrep] Hoffman P. and M. Blanchet, "Preparation of 381 Internationalized Strings ('stringprep')", 382 draft-hoffman-rfc3454bis-xx.txt, a work in progress. 384 [Syntaxes] Legg, S. (editor), "LDAP: Syntaxes and Matching Rules", 385 draft-ietf-ldapbis-syntaxes-xx.txt, a work in progress. 387 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 388 3.2.0" is defined by "The Unicode Standard, Version 3.0" 389 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), 390 as amended by the "Unicode Standard Annex #27: Unicode 391 3.1" (http://www.unicode.org/reports/tr27/) and by the 392 "Unicode Standard Annex #28: Unicode 3.2" 393 (http://www.unicode.org/reports/tr28/). 395 [UAX15] Davis, M. and M. Duerst, "Unicode Standard Annex #15: 396 Unicode Normalization Forms, Version 3.2.0". 397 , 398 March 2002. 400 [X.680] International Telecommunication Union - 401 Telecommunication Standardization Sector, "Abstract 402 Syntax Notation One (ASN.1) - Specification of Basic 403 Notation", X.680(1997) (also ISO/IEC 8824-1:1998). 405 [T.61] CCITT (now ITU), "Character Repertoire and Coded 406 Character Sets for the International Teletex Service", 407 T.61, 1988. 409 7.2. Informative References 411 [X.500] International Telecommunication Union - 412 Telecommunication Standardization Sector, "The Directory 413 -- Overview of concepts, models and services," 414 X.500(1993) (also ISO/IEC 9594-1:1994). 416 [X.501] International Telecommunication Union - 417 Telecommunication Standardization Sector, "The Directory 418 -- Models," X.501(1993) (also ISO/IEC 9594-2:1994). 420 [X.520] International Telecommunication Union - 421 Telecommunication Standardization Sector, "The 422 Directory: Selected Attribute Types", X.520(1993) (also 423 ISO/IEC 9594-6:1994). 425 [Glossary] The Unicode Consortium, "Unicode Glossary", 426 . 428 [CharModel] Whistler, K. and M. Davis, "Unicode Technical Report 429 #17, Character Encoding Model", UTR17, 430 , August 431 2000. 433 [XMATCH] Zeilenga, K., "Internationalized String Matching Rules 434 for X.500", draft-zeilenga-ldapbis-strmatch-xx.txt, a 435 work in progress. 437 [RFC1345] Simonsen, K., "Character Mnemonics & Character Sets", 438 RFC 1345, June 1992. 440 Appendix A. Teletex (T.61) to Unicode 442 This appendix defines an algorithm for transcoding [T.61] characters 443 to [Unicode] characters for use in string preparation for LDAP 444 matching rules. This appendix is normative. 446 The transcoding algorithm is derived from the T.61-8bit definition 447 provided in [RFC1345]. With a few exceptions, the T.61 character 448 codes from x00 to x7f are equivalent to the corresponding [Unicode] 449 code points, and their values are left unchanged by this algorithm. 450 E.g. the T.61 code x20 is identical to (U+0020). The exceptions are 451 for these T.61 codes that are undefined: x23, x24, x5c, x5e, x60, x7b, 452 x7d, and x7e. 454 The codes from x80 to x9f are also equivalent to the corresponding 455 Unicode code points. This is specified for completeness only, as 456 these codes are control characters, and will be mapped to nothing in 457 the LDAP String Preparation Mapping step. 459 The remaining T.61 codes are mapped below in Table A.1. Table 460 positions marked "??" are undefined. 462 Input strings containing undefined T.61 codes SHALL produce an 463 Undefined matching result. For diagnostic purposes, this algorithm 464 does not fail for undefined input codes. Instead, undefined codes in 465 the input are mapped to the Unicode REPLACEMENT CHARACTER (U+FFFD). 466 As the LDAP String Preparation Prohibit step disallows the REPLACEMENT 467 CHARACTER from appearing in its output, this transcoding yields the 468 desired effect. 470 Note: RFC 1345 listed the non-spacing accent codepoints as residing in 471 the range starting at (U+E000). In the current Unicode 472 standard, the (U+E000) range is reserved for Private Use, and 473 the non-spacing accents are in the range starting at (U+0300). 474 The tables here use the (U+0300) range for these accents. 476 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 477 --+------+------+------+------+------+------+------+------+ 478 a0| 00a0 | 00a1 | 00a2 | 00a3 | 0024 | 00a5 | 0023 | 00a7 | 479 a8| 00a8 | ?? | ?? | 00ab | ?? | ?? | ?? | ?? | 480 b0| 00b0 | 00b1 | 00b2 | 00b3 | 00d7 | 00b5 | 00b6 | 00b7 | 481 b8| 00f7 | ?? | ?? | 00bb | 00bc | 00bd | 00be | 00bf | 482 c0| ?? | 0300 | 0301 | 0302 | 0303 | 0304 | 0306 | 0307 | 483 c8| 0308 | ?? | 030a | 0327 | 0332 | 030b | 0328 | 030c | 484 d0| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 485 d8| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 486 e0| 2126 | 00c6 | 00d0 | 00aa | ?? | 0126 | 0132 | 013f | 487 e8| 0141 | 00d8 | 0152 | 00ba | 00de | 0166 | 014a | 0149 | 488 f0| 0138 | 00e6 | 0111 | 00f0 | 0127 | 0131 | 0133 | 0140 | 489 f8| 0142 | 00f8 | 0153 | 00df | 00fe | 0167 | 014b | ?? | 490 --+------+------+------+------+------+------+------+------+ 491 Table A.1: Mapping of 8-bit T.61 codes to Unicode 493 T.61 also defines a number of accented characters that are formed by 494 combining an accent prefix followed by a base character. These 495 prefixes are in the code range xc1 to xcf. If a prefix character 496 appears at the end of a string, the result is undefined. Otherwise 497 these sequences are mapped to Unicode by substituting the 498 corresponding non-spacing accent code (as listed in Table A.1) for the 499 accent prefix, and exchanging the order so that the base character 500 precedes the accent. 502 Appendix B. Additional Teletex (T.61) to Unicode Tables 504 All of the accented characters in T.61 have a corresponding code point 505 in Unicode. For the sake of completeness, the combined character 506 codes are presented in the following tables. This is informational 507 only; for matching purposes it is sufficient to map the non-spacing 508 accent and exchange the order of the character pair as specified in 509 Appendix A. This appendix is informative. 511 B.1. Combinations with SPACE 513 Accents may be combined with a to generate the accent by 514 itself. For each accent code, the result of combining with is 515 listed in Table B.1. 517 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 518 --+------+------+------+------+------+------+------+------+ 519 c0| ?? | 0060 | 00b4 | 005e | 007e | 00af | 02d8 | 02d9 | 520 c8| 00a8 | ?? | 02da | 00b8 | ?? | 02dd | 02db | 02c7 | 521 --+------+------+------+------+------+------+------+------+ 522 Table B.1: Mapping of T.61 Accents with to Unicode 524 B.2. Combinations for xc1: (Grave accent) 526 T.61 has predefined characters for combinations with A, E, I, O, and 527 U. Unicode also defines combinations for N, W, and Y. All of these 528 combinations are present in Table B.2. 530 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 531 --+------+------+------+------+------+------+------+------+ 532 40| ?? | 00c0 | ?? | ?? | ?? | 00c8 | ?? | ?? | 533 48| ?? | 00cc | ?? | ?? | ?? | ?? | 01f8 | 00d2 | 534 50| ?? | ?? | ?? | ?? | ?? | 00d9 | ?? | 1e80 | 535 58| ?? | 1ef2 | ?? | ?? | ?? | ?? | ?? | ?? | 536 60| ?? | 00e0 | ?? | ?? | ?? | 00e8 | ?? | ?? | 537 68| ?? | 00ec | ?? | ?? | ?? | ?? | 01f9 | 00f2 | 538 70| ?? | ?? | ?? | ?? | ?? | 00f9 | ?? | 1e81 | 539 78| ?? | 1ef3 | ?? | ?? | ?? | ?? | ?? | ?? | 540 --+------+------+------+------+------+------+------+------+ 541 Table B.2: Mapping of T.61 Grave Accent Combinations 543 B.3. Combinations for xc2: (Acute accent) 545 T.61 has predefined characters for combinations with A, E, I, O, U, Y, 546 C, L, N, R, S, and Z. Unicode also defines G, K, M, P, and W. All of 547 these combinations are present in Table B.3. 549 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 550 --+------+------+------+------+------+------+------+------+ 551 40| ?? | 00c1 | ?? | 0106 | ?? | 00c9 | ?? | 01f4 | 552 48| ?? | 00cd | ?? | 1e30 | 0139 | 1e3e | 0143 | 00d3 | 553 50| 1e54 | ?? | 0154 | 015a | ?? | 00da | ?? | 1e82 | 554 58| ?? | 00dd | 0179 | ?? | ?? | ?? | ?? | ?? | 555 60| ?? | 00e1 | ?? | 0107 | ?? | 00e9 | ?? | 01f5 | 556 68| ?? | 00ed | ?? | 1e31 | 013a | 1e3f | 0144 | 00f3 | 557 70| 1e55 | ?? | 0155 | 015b | ?? | 00fa | ?? | 1e83 | 558 78| ?? | 00fd | 017a | ?? | ?? | ?? | ?? | ?? | 559 --+------+------+------+------+------+------+------+------+ 560 Table B.3: Mapping of T.61 Acute Accent Combinations 562 B.4. Combinations for xc3: (Circumflex) 564 T.61 has predefined characters for combinations with A, E, I, O, U, Y, 565 C, G, H, J, S, and W. Unicode also defines the combination for Z. 566 All of these combinations are present in Table B.4. 568 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 569 --+------+------+------+------+------+------+------+------+ 570 40| ?? | 00c2 | ?? | 0108 | ?? | 00ca | ?? | 011c | 571 48| 0124 | 00ce | 0134 | ?? | ?? | ?? | ?? | 00d4 | 572 50| ?? | ?? | ?? | 015c | ?? | 00db | ?? | 0174 | 573 58| ?? | 0176 | 1e90 | ?? | ?? | ?? | ?? | ?? | 574 60| ?? | 00e2 | ?? | 0109 | ?? | 00ea | ?? | 011d | 575 68| 0125 | 00ee | 0135 | ?? | ?? | ?? | ?? | 00f4 | 576 70| ?? | ?? | ?? | 015d | ?? | 00fb | ?? | 0175 | 577 78| ?? | 0177 | 1e91 | ?? | ?? | ?? | ?? | ?? | 578 --+------+------+------+------+------+------+------+------+ 579 Table B.4: Mapping of T.61 Circumflex Accent Combinations 581 B.5. Combinations for xc4: (Tilde) 583 T.61 has predefined characters for combinations with A, I, O, U, and 584 N. Unicode also defines E, V, and Y. All of these combinations are 585 present in Table B.5. 587 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 588 --+------+------+------+------+------+------+------+------+ 589 40| ?? | 00c3 | ?? | ?? | ?? | 1ebc | ?? | ?? | 590 48| ?? | 0128 | ?? | ?? | ?? | ?? | 00d1 | 00d5 | 591 50| ?? | ?? | ?? | ?? | ?? | 0168 | 1e7c | ?? | 592 58| ?? | 1ef8 | ?? | ?? | ?? | ?? | ?? | ?? | 593 60| ?? | 00e3 | ?? | ?? | ?? | 1ebd | ?? | ?? | 594 68| ?? | 0129 | ?? | ?? | ?? | ?? | 00f1 | 00f5 | 595 70| ?? | ?? | ?? | ?? | ?? | 0169 | 1e7d | ?? | 596 78| ?? | 1ef9 | ?? | ?? | ?? | ?? | ?? | ?? | 597 --+------+------+------+------+------+------+------+------+ 598 Table B.5: Mapping of T.61 Tilde Accent Combinations 600 B.6. Combinations for xc5: (Macron) 602 T.61 has predefined characters for combinations with A, E, I, O, and 603 U. Unicode also defines Y, G, and AE. All of these combinations are 604 present in Table B.6. 606 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 607 --+------+------+------+------+------+------+------+------+ 608 40| ?? | 0100 | ?? | ?? | ?? | 0112 | ?? | 1e20 | 609 48| ?? | 012a | ?? | ?? | ?? | ?? | ?? | 014c | 610 50| ?? | ?? | ?? | ?? | ?? | 016a | ?? | ?? | 611 58| ?? | 0232 | ?? | ?? | ?? | ?? | ?? | ?? | 612 60| ?? | 0101 | ?? | ?? | ?? | 0113 | ?? | 1e21 | 613 68| ?? | 012b | ?? | ?? | ?? | ?? | ?? | 014d | 614 70| ?? | ?? | ?? | ?? | ?? | 016b | ?? | ?? | 615 78| ?? | 0233 | ?? | ?? | ?? | ?? | ?? | ?? | 616 e0| ?? | 01e2 | ?? | ?? | ?? | ?? | ?? | ?? | 617 f0| ?? | 01e3 | ?? | ?? | ?? | ?? | ?? | ?? | 618 --+------+------+------+------+------+------+------+------+ 619 Table B.6: Mapping of T.61 Macron Accent Combinations 621 B.7. Combinations for xc6: (Breve) 623 T.61 has predefined characters for combinations with A, U, and G. 624 Unicode also defines E, I, and O. All of these combinations are 625 present in Table B.7. 627 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 628 --+------+------+------+------+------+------+------+------+ 629 40| ?? | 0102 | ?? | ?? | ?? | 0114 | ?? | 011e | 630 48| ?? | 012c | ?? | ?? | ?? | ?? | ?? | 014e | 631 50| ?? | ?? | ?? | ?? | ?? | 016c | ?? | ?? | 632 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 633 60| ?? | 0103 | ?? | ?? | ?? | 0115 | ?? | 011f | 634 68| ?? | 012d | ?? | ?? | ?? | ?? | 00f1 | 014f | 635 70| ?? | ?? | ?? | ?? | ?? | 016d | ?? | ?? | 636 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 637 --+------+------+------+------+------+------+------+------+ 638 Table B.7: Mapping of T.61 Breve Accent Combinations 640 B.8. Combinations for xc7: (Dot Above) 641 T.61 has predefined characters for C, E, G, I, and Z. Unicode also 642 defines A, O, B, D, F, H, M, N, P, R, S, T, W, X, and Y. All of these 643 combinations are present in Table B.8. 645 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 646 --+------+------+------+------+------+------+------+------+ 647 40| ?? | 0226 | 1e02 | 010a | 1e0a | 0116 | 1e1e | 0120 | 648 48| 1e22 | 0130 | ?? | ?? | ?? | 1e40 | 1e44 | 022e | 649 50| 1e56 | ?? | 1e58 | 1e60 | 1e6a | ?? | ?? | 1e86 | 650 58| 1e8a | 1e8e | 017b | ?? | ?? | ?? | ?? | ?? | 651 60| ?? | 0227 | 1e03 | 010b | 1e0b | 0117 | 1e1f | 0121 | 652 68| 1e23 | ?? | ?? | ?? | ?? | 1e41 | 1e45 | 022f | 653 70| 1e57 | ?? | 1e59 | 1e61 | 1e6b | ?? | ?? | 1e87 | 654 78| 1e8b | 1e8f | 017c | ?? | ?? | ?? | ?? | ?? | 655 --+------+------+------+------+------+------+------+------+ 656 Table B.8: Mapping of T.61 Dot Above Accent Combinations 658 B.9. Combinations for xc8: (Diaeresis) 660 T.61 has predefined characters for A, E, I, O, U, and Y. Unicode also 661 defines H, W, X, and t. All of these combinations are present in 662 Table B.9. 664 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 665 --+------+------+------+------+------+------+------+------+ 666 40| ?? | 00c4 | ?? | ?? | ?? | 00cb | ?? | ?? | 667 48| 1e26 | 00cf | ?? | ?? | ?? | ?? | ?? | 00d6 | 668 50| ?? | ?? | ?? | ?? | ?? | 00dc | ?? | 1e84 | 669 58| 1e8c | 0178 | ?? | ?? | ?? | ?? | ?? | ?? | 670 60| ?? | 00e4 | ?? | ?? | ?? | 00eb | ?? | ?? | 671 68| 1e27 | 00ef | ?? | ?? | ?? | ?? | ?? | 00f6 | 672 70| ?? | ?? | ?? | ?? | 1e97 | 00fc | ?? | 1e85 | 673 78| 1e8d | 00ff | ?? | ?? | ?? | ?? | ?? | ?? | 674 --+------+------+------+------+------+------+------+------+ 675 Table B.8: Mapping of T.61 Diaeresis Accent Combinations 677 B.10. Combinations for xca: (Ring Above) 679 T.61 has predefined characters for A, and U. Unicode also defines w 680 and y. All of these combinations are present in Table B.10. 682 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 683 --+------+------+------+------+------+------+------+------+ 684 40| ?? | 00c5 | ?? | ?? | ?? | ?? | ?? | ?? | 685 48| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 686 50| ?? | ?? | ?? | ?? | ?? | 016e | ?? | ?? | 687 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 688 60| ?? | 00e5 | ?? | ?? | ?? | ?? | ?? | ?? | 689 68| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 690 70| ?? | ?? | ?? | ?? | ?? | 016f | ?? | 1e98 | 691 78| ?? | 1e99 | ?? | ?? | ?? | ?? | ?? | ?? | 692 --+------+------+------+------+------+------+------+------+ 693 Table B.10: Mapping of T.61 Ring Above Accent Combinations 695 B.11. Combinations for xcb: (Cedilla) 697 T.61 has predefined characters for C, G, K, L, N, R, S, and T. 698 Unicode also defines E, D, and H. All of these combinations are 699 present in Table B.11. 701 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 702 --+------+------+------+------+------+------+------+------+ 703 40| ?? | ?? | ?? | 00c7 | 1e10 | 0228 | ?? | 0122 | 704 48| 1e28 | ?? | ?? | 0136 | 013b | ?? | 0145 | ?? | 705 50| ?? | ?? | 0156 | 015e | 0162 | ?? | ?? | ?? | 706 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 707 60| ?? | ?? | ?? | 00e7 | 1e11 | 0229 | ?? | 0123 | 708 68| 1e29 | ?? | ?? | 0137 | 013c | ?? | 0146 | ?? | 709 70| ?? | ?? | 0157 | 015f | 0163 | ?? | ?? | ?? | 710 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 711 --+------+------+------+------+------+------+------+------+ 712 Table B.11: Mapping of T.61 Cedilla Accent Combinations 714 B.12. Combinations for xcd: (Double Acute Accent) 716 T.61 has predefined characters for O, and U. These combinations are 717 present in Table B.12. 719 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 720 --+------+------+------+------+------+------+------+------+ 721 48| ?? | ?? | ?? | ?? | ?? | ?? | ?? | 0150 | 722 50| ?? | ?? | ?? | ?? | ?? | 0170 | ?? | ?? | 723 68| ?? | ?? | ?? | ?? | ?? | ?? | ?? | 0151 | 724 70| ?? | ?? | ?? | ?? | ?? | 0171 | ?? | ?? | 725 --+------+------+------+------+------+------+------+------+ 726 Table B.12: Mapping of T.61 Double Acute Accent Combinations 728 B.13. Combinations for xce: (Ogonek) 730 T.61 has predefined characters for A, E, I, and U. Unicode also 731 defines the combination for O. All of these combinations are present 732 in Table B.13. 734 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 735 --+------+------+------+------+------+------+------+------+ 736 40| ?? | 0104 | ?? | ?? | ?? | 0118 | ?? | ?? | 737 48| ?? | 012e | ?? | ?? | ?? | ?? | ?? | 01ea | 738 50| ?? | ?? | ?? | ?? | ?? | 0172 | ?? | ?? | 739 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 740 60| ?? | 0105 | ?? | ?? | ?? | 0119 | ?? | ?? | 741 68| ?? | 012f | ?? | ?? | ?? | ?? | ?? | 01eb | 742 70| ?? | ?? | ?? | ?? | ?? | 0173 | ?? | ?? | 743 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | 744 --+------+------+------+------+------+------+------+------+ 745 Table B.13: Mapping of T.61 Ogonek Accent Combinations 747 B.14. Combinations for xcf: (Caron) 749 T.61 has predefined characters for C, D, E, L, N, R, S, T, and Z. 750 Unicode also defines A, I, O, U, G, H, j,and K. All of these 751 combinations are present in Table B.14. 753 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 754 --+------+------+------+------+------+------+------+------+ 755 40| ?? | 01cd | ?? | 010c | 010e | 011a | ?? | 01e6 | 756 48| 021e | 01cf | ?? | 01e8 | 013d | ?? | 0147 | 01d1 | 757 50| ?? | ?? | 0158 | 0160 | 0164 | 01d3 | ?? | ?? | 758 58| ?? | ?? | 017d | ?? | ?? | ?? | ?? | ?? | 759 60| ?? | 01ce | ?? | 010d | 010f | 011b | ?? | 01e7 | 760 68| 021f | 01d0 | 01f0 | 01e9 | 013e | ?? | 0148 | 01d2 | 761 70| ?? | ?? | 0159 | 0161 | 0165 | 01d4 | ?? | ?? | 762 78| ?? | ?? | 017e | ?? | ?? | ?? | ?? | ?? | 763 --+------+------+------+------+------+------+------+------+ 764 Table B.14: Mapping of T.61 Caron Accent Combinations 766 Appendix B -- Mapping Table 768 Input Output 769 ----- ------ 770 0000-0008 771 0009-000D 0020 772 000E-001F 773 007F-009F 774 0085 0020 775 00A0 0020 776 00AD 777 034F 778 06DD 779 070F 780 1680 0020 781 1806 782 180B-180E 783 2000-200A 0020 784 200B-200F 785 2028-2029 0020 786 202A-202E 787 202F 0020 788 205F 0020 789 2060-2063 790 206A-206F 791 3000 0020 792 FEFF 793 FF00-FE0F 794 FFF9-FFFC 795 1D173-1D17A 796 E0001 797 E0020-E007F 799 Intellectual Property Rights 801 The IETF takes no position regarding the validity or scope of any 802 intellectual property or other rights that might be claimed to pertain 803 to the implementation or use of the technology described in this 804 document or the extent to which any license under such rights might or 805 might not be available; neither does it represent that it has made any 806 effort to identify any such rights. Information on the IETF's 807 procedures with respect to rights in standards-track and 808 standards-related documentation can be found in BCP-11. Copies of 809 claims of rights made available for publication and any assurances of 810 licenses to be made available, or the result of an attempt made to 811 obtain a general license or permission for the use of such proprietary 812 rights by implementors or users of this specification can be obtained 813 from the IETF Secretariat. 815 The IETF invites any interested party to bring to its attention any 816 copyrights, patents or patent applications, or other proprietary 817 rights which may cover technology that may be required to practice 818 this standard. Please address the information to the IETF Executive 819 Director. 821 Full Copyright 822 Copyright (C) The Internet Society (2004). All Rights Reserved. 824 This document and translations of it may be copied and furnished to 825 others, and derivative works that comment on or otherwise explain it 826 or assist in its implementation may be prepared, copied, published and 827 distributed, in whole or in part, without restriction of any kind, 828 provided that the above copyright notice and this paragraph are 829 included on all such copies and derivative works. However, this 830 document itself may not be modified in any way, such as by removing 831 the copyright notice or references to the Internet Society or other 832 Internet organizations, except as needed for the purpose of 833 developing Internet standards in which case the procedures for 834 copyrights defined in the Internet Standards process must be followed, 835 or as required to translate it into languages other than English.