idnits 2.17.1 draft-faltstrom-unicode-synchronisation-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 22 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 27, 2003) is 7457 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'RFC3454' on line 50 == Unused Reference: '1' is defined on line 178, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 3454 (ref. '2') (Obsoleted by RFC 7564) -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Obsolete informational reference (is this intentional?): RFC 3491 (ref. '5') (Obsoleted by RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 2616 (ref. '6') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2821 (ref. '7') (Obsoleted by RFC 5321) == Outdated reference: A later version (-10) exists of draft-ietf-sasl-saslprep-03 == Outdated reference: A later version (-06) exists of draft-ietf-ips-iscsi-string-prep-04 Summary: 5 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Architecture Board P. Faltstrom 3 Internet-Draft IAB 4 Expires: May 27, 2004 November 27, 2003 6 Synchronization of Stringprep with Unicode Normalization rules 7 draft-faltstrom-unicode-synchronisation-00.txt 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that other 16 groups may also distribute working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at http:// 24 www.ietf.org/ietf/1id-abstracts.txt. 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 This Internet-Draft will expire on May 27, 2004. 31 Copyright Notice 33 Copyright (C) The Internet Society (2003). All Rights Reserved. 35 Abstract 37 This memo provides information about potential problems for 38 applications that use the Unicode Character set in IETF standards. It 39 especially examines differences between normalization rules in 40 different versions of the Unicode character set. 42 1. The problem 44 The Unicode Standard Annex #15 (Unicode Normalization Forms) [3] 45 specify how the normalization rules are to be applied to strings. In 46 Annex 12 (Corrigenda) differences between normalization rules between 47 versions of Unicode are discussed. 49 The IETF uses these Normalization rules in various standards, 50 especially the ones creating profiles of stringprep [RFC3454] [2]. 52 The Unicode Consortium has well-defined policies in place to govern 53 changes that affect backwards compatibility. Once a character is 54 encoded, its canonical combining class and decomposition mapping will 55 not be changed in a way that will destabilize normalization. 57 What this means is: If a string contains only characters from a given 58 version of the Unicode Standard (e.g., Unicode 3.1.1), and it is put 59 into a normalized form in accordance with that version of Unicode, 60 then it will be in normalized form according to any past or future 61 versions of Unicode. 63 This guarantee has been in place for Unicode 3.1 and after. It has 64 been necessary to correct the decompositions of a small number of 65 characters since Unicode 3.1, as listed in the Normalization 66 Corrections data file, but such corrections are in accordance with 67 the above principles: all text normalized on old systems will test as 68 normalized in future systems. All text normalized in future systems 69 will test as normalized on past systems. What may change, for those 70 few characters, is that unnormalized text may normalize differently 71 on past and future systems. 73 2. Scenario 75 Assume a client receives a non-normalized string, and then applies 76 normalization according to normalization rules in a particular 77 version of Unicode. If the client passes the normalized string to a 78 server that also has normalized a non-normalized copy of the string, 79 but has used a different version of the Unicode normalization rules, 80 the two strings might not match. 82 Example: In version 3.1 of Unicode, codepoint U+2F874 is normalized 83 to U+5F33. In version 3.2 U+2F874 is normalized to U+5F53. Say we 84 have on the Internet nodes A and B. Assume that A is using version 85 3.1 of Unicode, and B is using version 3.2. U+2F874 is passed to both 86 A and B. After normalization they will store the strings U+5F33 and 87 U+5F53 respectively. The end result is that even if the same 88 codepoint, U+2F874, is passed to both nodes, they will after 89 normalization have different strings (U+5F33 and U+5F53). If A sends 90 a message with normalized version of U+2F874 (U+5F33) to B as a 91 search string, there will be no match at B because B has normalized 92 the data (U+2F874) to U+5F53. 94 For the problem to exist, the string (only consisting of the 95 codepoint U+2F874 in the example above) needs only include at least 96 one of the codepoints in the correction list (see appendix A). As of 97 version 4.0.0 of Unicode, the list of corrections (since Unicode 3.1) 98 consists of exactly 5 codepoints. Over time, as additional errors in 99 the normalization rules are found, this list will grow. The list is 100 controlled by the Unicode Consortium; IETF has little or no specific 101 input into it. 103 3. Recommendation 105 Applications that implement stringprep or one of its profiles must be 106 aware of the existence of the corrections table [4]. Version 4.0.0 of 107 this correction list can be found in Appendix A. If a string that is 108 to be used for matching includes any of these codepoints, unexpected 109 results (non-matching when matching should occur) may occur. Because 110 of this, it is recommended that in sensitive applications / 111 deployments, special care should be taken. 113 Examples of problems include (but are not limited to) problems in 114 protocols which use stringprep and pass a normalized version of 115 strings received from a human. Such protocols include the DNS [5] 116 (dispute resolution at the time of domain name registration) and 117 protocols using domain names (HTTP [6], SMTP [7] etc), LDAP [8] 118 (characters in the domain name labels as well as searches on 119 attribute values), Kerberos [9]Kerberos, SASL [10] (authentication 120 mechanism), iSCSI [11] (names of volumes). 122 As codepoints can be added to the list at any time, addition of 123 codepoints can affect already normalized strings. Say a registry 124 accepts registrations of domain names. If a domain name U+2F868 is to 125 be registered, according to nameprep profile in Unicode 3.2 the 126 string U+2136A is to be registered. If later the registry switches to 127 use version 4.0 of Unicode, the question is whether the registered 128 string U+2136A is to stay, or whether it should be changed to U+36FC. 129 It might even be the case that U+36FC is already registered, and by a 130 different domain name holder. The change in normalization rules in 131 this case create a potential dispute resolution. 133 3.1 Message to the Unicode Consortium 135 The IETF strongly encourages the Unicode Consortium to keep the size 136 and rate of change of the correction list to an absolute minimum, as 137 it will be impossible for implementations (applications) to know what 138 version of the normalization tables which are in use. This is 139 because, in practice, the tables in many cases will be part of the 140 operating system. The end user will expect the same normalization 141 rules to be used in all applications in her environment. 143 3.2 Alternatives for the IETF 145 When the Stringprep [2] specification is updated in the IETF, there 146 will be three possible paths forward and a choice must be made: 148 1. Stay with use of Unicode 3.2 149 2. Change to a later version of Unicode than 3.2, but without the 150 changes listed in the correction list at that time 151 3. Change to a later version of Unicode than 3.2, and accept 152 incompatible changes in the normalization tables 154 4. Security Considerations 156 This memo discusses the impact that corrections to the Unicode 157 normalization rules will have on protocols in the IETF that uses 158 those rules. Inconsistencies among versions of the rules will create 159 non-backward compatibility problems. Even if protocols and 160 implementations are created correctly, this will lead to strings that 161 should match in a search or other operation being reported as not 162 matching. 164 These false negatives for strings that include the codepoints in the 165 Unicode Correction Table might lead, for example, to the following 166 problems: 167 o Domain names lookups that should succeed fail instead 168 o Collisions between registered domain names occur (i.e., two 169 different names appear to match, even when there were no 170 collisions at registration time 171 o Searches in LDAP databases fail 172 o Searching for iSCSI devices fail 173 o Authentication to Kerberos realms (logging in to systems using 174 Kerberos) fail 176 Normative References 178 [1] The Unicode Consortium, "The Unicode Standard", ISBN 179 0-321-18578-1 The Unicode Standard 4.0, April 2003. 181 [2] Hoffman, P. and M. Blanchet, "Preparation of Internationalized 182 Strings ("stringprep")", RFC 3454, December 2002. 184 [3] Davis, M. and M. Durst, "Unicode Normalization Forms", Unicode 185 Technical Report 15, April 2003. 187 [4] The Unicode Consortium, "Normalization Corrections", http:// 188 www.unicode.org/Public/UNIDATA/NormalizationCorrections.txt 189 Version 4.0.0, April 2003. 191 Informative References 193 [5] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile 194 for Internationalized Domain Names (IDN)", RFC 3491, March 195 2003. 197 [6] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L., 198 Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- 199 HTTP/1.1", RFC 2616, June 1999. 201 [7] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April 202 2001. 204 [8] Zeilenga, K., "LDAP: Internationalized String Preparation", 205 draft-zeilenga-ldapbis-strprep-00.txt (work in progress), May 206 2003. 208 [9] Altman, J., "Preparation of Internationalized Strings Profile 209 for Kerberos UTF-8 Strings", 210 draft-ietf-krb-wg-utf8-profile-01.txt (work in progress), 211 February 2003. 213 [10] Zeilenga, K., "SASLprep: Stringprep profile for user names and 214 passwords", draft-ietf-sasl-saslprep-03.txt (work in progress), 215 June 2003. 217 [11] Bakke, M., "String Profile for iSCSI Names", 218 draft-ietf-ips-iscsi-string-prep-04.txt (work in progress), 219 March 2003. 221 Author's Address 223 Patrik Faltstroms 224 Internet Architecture Board 226 EMail: paf@cisco.com 228 Appendix A. Appendix A 230 # NormalizationCorrections-4.0.0.txt 231 # 232 # This file is a normative contributory data file in the 233 # Unicode Character Database. 234 # 235 # The normalization stabilization policy of the Unicode 236 # Consortium ordinarily precludes any change to the decomposition 237 # for any character, once established in a relevant version 238 # of the UnicodeData.txt data file. However, under certain 239 # exceptional (and rare) conditions, an error in a decomposition 240 # mapping may be discovered that is truly just an unintended 241 # typo in the data, and not a matter of dubious interpretation. 242 # 243 # Whenever such an error may be found, and if it meets the 244 # requirements for possible exceptions to normalization 245 # stability, the correction is entered in this data file, 246 # so that any implementation depending on absolute stability 247 # of normalization, *including* any errors in the data, can 248 # safely reconstruct the exact state of the data tables at 249 # any given version of Unicode. 250 # 251 # Currently this list has exactly six entries in it, one for the 252 # typo found and corrected in Corrigendum #3, and five for 253 # the typos and misidentifications found and corrected in 254 # Corrigendum #4. All efforts 255 # will be made to keep the entries limited to just those fixes. 256 # 257 # Interpretation of the fields: 258 # Field 1: Unicode code point 259 # Field 2: Original (erroneous) decomposition 260 # Field 3: Corrected decomposition 261 # Field 4: Version of Unicode for which the correction was 262 # entered into UnicodeData.txt, in n.n.n format. 263 # Comment: Indicates the Unicode Corrigendum which documents 264 # the correction 265 # 266 # 267 F951;96FB;964B;3.2.0 # Corrigendum 3 268 2F868;2136A;36FC;4.0.0 # Corrigendum 4 269 2F874;5F33;5F53;4.0.0 # Corrigendum 4 270 2F91F;43AB;243AB;4.0.0 # Corrigendum 4 271 2F95F;7AAE;7AEE;4.0.0 # Corrigendum 4 272 2F9BF;4D57;45D7;4.0.0 # Corrigendum 4 274 Intellectual Property Statement 276 The IETF takes no position regarding the validity or scope of any 277 intellectual property or other rights that might be claimed to 278 pertain to the implementation or use of the technology described in 279 this document or the extent to which any license under such rights 280 might or might not be available; neither does it represent that it 281 has made any effort to identify any such rights. Information on the 282 IETF's procedures with respect to rights in standards-track and 283 standards-related documentation can be found in BCP-11. Copies of 284 claims of rights made available for publication and any assurances of 285 licenses to be made available, or the result of an attempt made to 286 obtain a general license or permission for the use of such 287 proprietary rights by implementors or users of this specification can 288 be obtained from the IETF Secretariat. 290 The IETF invites any interested party to bring to its attention any 291 copyrights, patents or patent applications, or other proprietary 292 rights which may cover technology that may be required to practice 293 this standard. Please address the information to the IETF Executive 294 Director. 296 Full Copyright Statement 298 Copyright (C) The Internet Society (2003). All Rights Reserved. 300 This document and translations of it may be copied and furnished to 301 others, and derivative works that comment on or otherwise explain it 302 or assist in its implementation may be prepared, copied, published 303 and distributed, in whole or in part, without restriction of any 304 kind, provided that the above copyright notice and this paragraph are 305 included on all such copies and derivative works. However, this 306 document itself may not be modified in any way, such as by removing 307 the copyright notice or references to the Internet Society or other 308 Internet organizations, except as needed for the purpose of 309 developing Internet standards in which case the procedures for 310 copyrights defined in the Internet Standards process must be 311 followed, or as required to translate it into languages other than 312 English. 314 The limited permissions granted above are perpetual and will not be 315 revoked by the Internet Society or its successors or assignees. 317 This document and the information contained herein is provided on an 318 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 319 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 320 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 321 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 322 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 324 Acknowledgment 326 Funding for the RFC Editor function is currently provided by the 327 Internet Society. 329