idnits 2.17.1 draft-zeilenga-ldapbis-strprep-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([UNICODE], [CONTROLCHARACTERS], [ISO10646], [RFC2119], [UTR17], [GLOSSARY]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 395 has weird spacing: '...for the purpo...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (4 May 2003) is 7664 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC3377' is mentioned on line 145, but not defined ** Obsolete undefined reference: RFC 3377 (Obsoleted by RFC 4510) ** Obsolete normative reference: RFC 3454 (Obsoleted by RFC 7564) -- No information found for draft-ietf-ldapbis-roadmap-xx - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'Roadmap' -- No information found for draft-ietf-ldapbis-syntaxes-xx - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'Syntaxes' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX15' -- No information found for draft-zeilenga-ldapbis-strmatch-xx - is the name correct? Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet-Draft Editor: Kurt D. Zeilenga 3 Intended Category: Standard Track OpenLDAP Foundation 4 Expires in six months 4 May 2003 6 LDAP: Internationalized String Preparation 7 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with all 12 provisions of Section 10 of RFC2026. 14 Distribution of this memo is unlimited. Technical discussion of this 15 document will take place on the IETF LDAP Revision Working Group 16 mailing list . Please send editorial 17 comments directly to the author . 19 Internet-Drafts are working documents of the Internet Engineering Task 20 Force (IETF), its areas, and its working groups. Note that other 21 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as ``work in progress.'' 27 The list of current Internet-Drafts can be accessed at 28 . The list of 29 Internet-Draft Shadow Directories can be accessed at 30 . 32 Copyright 2003, The Internet Society. All Rights Reserved. 34 Please see the Copyright section near the end of this document for 35 more information. 37 Abstract 39 The previous Lightweight Directory Access Protocol (LDAP) technical 40 specifications did not precisely define how string matching is to be 41 performed. This lead to a number of usability and interoperability 42 problems. This document defines string preparation algorithms for 43 matching rules defined for use in LDAP. 45 Conventions 47 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 48 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 49 document are to be interpreted as described in BCP 14 [RFC2119]. 51 Character names in this document use the notation for code points and 52 names from the Unicode Standard [UNICODE] and ISO/IEC 10646-1 53 [ISO10646]. For example, the letter "a" may be represented as either 54 or . In the lists of mappings and the 55 prohibited characters, the "U+" is left off to make the lists easier 56 to read. The comments for character ranges are shown in square 57 brackets (such as "[CONTROL CHARACTERS]") and do not come from the 58 standards. 60 Note: a glossary of terms used in Unicode and ISO/IEC 10646 can be 61 found in [GLOSSARY]. Information on the ISO/IEC 10646/Unicode 62 character encoding model can be found in [UTR17]. 64 1. Introduction 66 1.1. Background 68 An LDAP matching rule [Syntaxes] defines an algorithm for determining 69 whether a presented value matches an attribute value in accordance 70 with the criteria defined for the rule. The proposition may be 71 evaluated to True, False, or Undefined. 73 True - the attribute contains a matching value, 75 False - the attribute contains no matching value, 77 Undefined - it cannot be determined whether the attribute contains 78 a matching value or not. 80 For instance, the caseIgnoreMatch matching rule may be used to compare 81 whether the commonName attribute contains a particular value without 82 regard for case and insignificant spaces. 84 1.2. X.500 String Matching Rules 86 "X.520: Selected attribute types" [X.520] provides (amongst other 87 things) value syntaxes and matching rules for comparing values 88 commonly used in the Directory. These specifications are inadequate 89 for strings composed of characters from the Universal Character Set 90 (UCS) [ISO10646], a superset of Unicode [UNICODE]. 92 The CaseIgnoreMatch matching rule [X.520], for example, is simply 93 defined as being a case insensitive comparison where insignificant 94 spaces are ignored. For printableString, there is only one space 95 character and case mapping is bijective, hence this definition is 96 sufficient. However, for UCS-based string types such as 97 universalString, this is not sufficient. For example, a case 98 insensitive matching implementation which folded lower case characters 99 to upper case would yield different different results than an 100 implementation which used upper case to lower case folding. Or one 101 implementation may view space as referring to only SPACE (U+0020), a 102 second implementation may view any character with the space separator 103 (Zs) property as a space, and another implementation may view any 104 character with the whitespace (WS) category as a space. 106 The lack of precise specification for string matching has led to 107 significant interoperability problems. When used in certificate chain 108 validation, security vulnerabilities can arise. To address these 109 problems, this document defines precise algorithms for preparing 110 strings for matching. 112 1.3. Relationship to "stringprep" 114 The string preparation algorithms described in this document are based 115 upon the "stringprep" approach [RFC3454]. In "stringprep", presented 116 and stored values are first prepared for comparison and so that a 117 character-by-character comparison yields the "correct" result. 119 The approach used here is a refinement of the "stringprep" [RFC3454] 120 approach. Each algorithm involves two additional preparation steps. 122 a) prior to applying the Unicode string preparation steps outlined in 123 "stringprep", the string is transcoded to Unicode; 125 b) after applying the Unicode string preparation steps outlined in 126 "stringprep", characters insignificant to the matching rules are 127 removed. 129 Hence, preparation of strings for X.500 matching involves the 130 following steps: 132 1) Transcode 133 2) Map 134 3) Normalize 135 4) Prohibit 136 5) Check Bidi (Bidirectional) 137 6) Insignificant Character Removal 139 These steps are described in Section 2. 141 1.4. Relationship to the LDAP Technical Specification 143 This document is a integral part of the LDAP technical specification 144 [Roadmap] which obsoletes the previously defined LDAP technical 145 specification [RFC3377] in its entirety. 147 This document details LDAP internationalized string preparation 148 algorithms used by [Syntaxes] and possible other technical 149 specifications defining LDAP syntaxes and/or matching rules. 151 1.5. Relationship to X.500 153 LDAP is defined [Roadmap] in X.500 terms as an X.500 access mechanism. 154 As such, there is a strong desire for alignment between LDAP and X.500 155 syntax and semantics. The string preparation algorithms described in 156 this document are based upon "Internationalized String Matching Rules 157 for X.500" [XMATCH] proposal to ITU/ISO Joint Study Group 2. 159 2. String Preparation 161 The following six-step process SHALL be applied to each presented and 162 attribute value in preparation for string match rule evaluation. 164 1) Transcode 165 2) Map 166 3) Normalize 167 4) Prohibit 168 5) Check bidi 169 6) Insignificant Character Removal 171 Failure in any step is be cause the assertion to be Undefined. 173 The character repertoire of this process is Unicode 3.2 [UNICODE]. 175 2.1. Transcode 177 Each non-Unicode string value is transcoded to Unicode. 179 TeletexString values are transcoded to Unicode as described in 180 Appendix A. 182 PrintableString value are transcoded directly to Unicode. 184 UniversalString, UTF8String, and bmpString values need not be 185 transcoded as they are Unicode-based strings (in the case of 186 bmpString, restricted to a subset of Unicode). 188 If the implementation is unable or unwilling to perform the 189 transcoding as described above, or the transcoding fails, this step 190 fails and the assertion is evaluated to Undefined. 192 The transcoded string is the output string. 194 2.2. Map 196 SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code 197 points are mapped to nothing. COMBINING GRAPHEME JOINER (U+034F) and 198 VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also 199 mapped to nothing. The OBJECT REPLACEMENT CHARACTER (U+FFFC) is 200 mapped to nothing. 202 CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE 203 TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR) 204 (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020). 206 All other control code points (e.g., Cc) or code points with a control 207 function (e.g., Cf) are mapped to nothing. 209 ZERO WIDTH SPACE (U+200B) is mapped to nothing. All other code points 210 with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or 211 Zp) are mapped to SPACE (U+0020). 213 For case ignore, numeric, and stored prefix string matching rules, 214 characters are case folded per B.2 of [RFC3454]. 216 2.3. Normalize 218 The input string is be normalized to Unicode Form KC (compatibility 219 composed) as described in [UAX15]. 221 2.4. Prohibit 223 All Unassigned, Private Use, and non-character code points are 224 prohibited. Surrogate codes (U+D800-DFFFF) are prohibited. 226 The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited. 228 The first code point of a string is prohibited from being a combining 229 character. 231 Empty strings are prohibited. 233 The step fails and the assertion is evaluated to Undefined if the 234 input string contains any prohibited code point. The output string is 235 the input string. 237 2.5. Check bidi 239 There are no bidirectional restrictions. The output string is the 240 input string. 242 2.5. Insignificant Character Removal 244 In this step, characters insignificant to the matching rule are to be 245 removed. The characters to be removed differ from matching rule to 246 matching rule. 248 Section 2.6.1 applies to case ignore and exact string matching. 249 Section 2.6.2 applies to numericString matching. 250 Section 2.6.3 applies to telephoneNumber matching 252 2.6.1. Insignificant Space Removal 254 For the purposes of this section, a space is defined to be the SPACE 255 (U+0020) code point followed by no combining marks. 257 NOTE - The previous steps ensure that the string cannot contain 258 any code points in the separator class, other than SPACE 259 (U+0020). 261 The following spaces are regarded as not significant and are to be 262 removed: 263 - leading spaces (i.e. those preceding the first character that is 264 not a space); 265 - trailing spaces (i.e. those following the last character that is 266 not a space); 267 - multiple consecutive spaces (these are taken as equivalent to a 268 single space character). 270 (A string consisting entirely of spaces is equivalent to a string 271 containing exactly one space.) 273 For example, removal of spaces from the Form KC string: 275 "foobar" would result in 276 the output string: 277 "foobar". 279 and the Form KC string: 280 "" would result in the output string: 281 "". 283 2.6.2. NumericString Insignificant Character Removal 285 For the purposes of this section, a space is defined to be the SPACE 286 (U+0020) code point followed by no combining marks. 288 All spaces are regarded as not significant and are to be removed. 290 For example, removal of spaces from the Form KC string: 291 "123456" would result in 292 the output string: 293 "123456". 295 and the Form KC string: 296 "" would result in an empty output string. 298 2.6.3. TelephoneNumber Insignificant Character Removal 300 For the purposes of this section, a hyphen is defined to be 301 HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010), 302 NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS 303 (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by no 304 combining marks and a space is defined to be the SPACE (U+0020) code 305 point followed by no combining marks. 307 All hyphens and spaces are regarded as not significant and are to be 308 removed. 310 3. Security Considerations 312 "Preparation for International Strings ('stringprep')" [RFC3454] 313 security considerations generally apply to the algorithms described 314 here. 316 4. Acknowledgments 318 The approach used in this document is based upon design principles and 319 algorithms described in "Preparation of Internationalized Strings 320 ('stringprep')" [RFC3454] by Paul Hoffman and Marc Blanchet. Some 321 additional guidance was drawn from Unicode Technical Standards, 322 Technical Reports, and Notes. 324 5. Editor's Address 326 Kurt Zeilenga 327 E-mail: 329 6. References 331 6.1. Normative References 333 [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate 334 Requirement Levels", BCP 14 (also RFC 2119), March 1997. 336 [RFC3454] P. Hoffman, M. Blanchet, "Preparation of Internationalized 337 Strings ('stringprep')", RFC 3454, December 2002. 339 [Roadmap] K. Zeilenga, "LDAP: Technical Specification Road Map", 340 draft-ietf-ldapbis-roadmap-xx.txt, a work in progress. 342 [Syntaxes] S. Legg (editor), "LDAP: Syntaxes and Matching Rules", 343 draft-ietf-ldapbis-syntaxes-xx.txt, a work in progress. 345 [ISO10646] Universal Multiple-Octet Coded Character Set (UCS) - 346 Architecture and Basic Multilingual Plane, ISO/IEC 10646-1 347 : 1993. 349 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 350 3.2.0" is defined by "The Unicode Standard, Version 3.0" 351 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), as 352 amended by the "Unicode Standard Annex #27: Unicode 3.1" 353 (http://www.unicode.org/reports/tr27/) and by the "Unicode 354 Standard Annex #28: Unicode 3.2" 355 (http://www.unicode.org/reports/tr28/). 357 [UAX15] M. Davis, M. Duerst, "Unicode Standard Annex #15: Unicode 358 Normalization Forms, Version 3.2.0". 359 , 360 March 2002. 362 6.2. Informative References 364 [X.500] International Telephone Union, "The Directory: Overview of 365 Concepts, Models and Service", X.500, 2000. 367 [X.501] International Telephone Union, "The Directory: The Models", 368 X.501, 2000. 370 [X.520] International Telephone Union, "The Directory: Selected 371 Attribute Types", X.520, 2000. 373 [XMATCH] K. Zeilenga, "Internationalized String Matching 374 Rules for X.500", draft-zeilenga-ldapbis-strmatch-xx.txt a 375 work in progress. 377 [GLOSSARY] The Unicode Consortium, "Unicode Glossary", 378 . 380 [UTR17] K. Whistler, M. Davis, "Unicode Technical Report 381 #17, Character Encoding Model", UTR17, 382 , August 383 2000. 385 Copyright 2003, The Internet Society. All Rights Reserved. 387 This document and translations of it may be copied and furnished to 388 others, and derivative works that comment on or otherwise explain it 389 or assist in its implementation may be prepared, copied, published and 390 distributed, in whole or in part, without restriction of any kind, 391 provided that the above copyright notice and this paragraph are 392 included on all such copies and derivative works. However, this 393 document itself may not be modified in any way, such as by removing 394 the copyright notice or references to the Internet Society or other 395 Internet organizations, except as needed for the purpose of 396 developing Internet standards in which case the procedures for 397 copyrights defined in the Internet Standards process must be followed, 398 or as required to translate it into languages other than English. 400 The limited permissions granted above are perpetual and will not be 401 revoked by the Internet Society or its successors or assigns. 403 This document and the information contained herein is provided on an 404 "AS IS" basis and THE AUTHORS, THE INTERNET SOCIETY, AND THE INTERNET 405 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, 406 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 407 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 408 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 410 Appendix A. Teletex (T.61) to Unicode 412 TBD.