idnits 2.17.1 

draft-faltstrom-unicode-synchronisation-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 22 instances of too long lines in the document, the longest
     one being 1 character in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 27, 2003) is 7457 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'RFC3454' on line 50

  == Unused Reference: '1' is defined on line 178, but no explicit reference
     was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 3454 (ref. '2') (Obsoleted by RFC 7564)

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Obsolete informational reference (is this intentional?): RFC 3491 (ref.
     '5') (Obsoleted by RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 2616 (ref.
     '6') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC
     7235)

  -- Obsolete informational reference (is this intentional?): RFC 2821 (ref.
     '7') (Obsoleted by RFC 5321)

  == Outdated reference: A later version (-10) exists of
     draft-ietf-sasl-saslprep-03

  == Outdated reference: A later version (-06) exists of
     draft-ietf-ips-iscsi-string-prep-04


     Summary: 5 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Architecture Board                                 P. Faltstrom
3	Internet-Draft                                                       IAB
4	Expires: May 27, 2004                                  November 27, 2003

6	      Synchronization of Stringprep with Unicode Normalization rules
7	              draft-faltstrom-unicode-synchronisation-00.txt

9	Status of this Memo

11	    This document is an Internet-Draft and is in full conformance with
12	    all provisions of Section 10 of RFC2026.

14	    Internet-Drafts are working documents of the Internet Engineering
15	    Task Force (IETF), its areas, and its working groups. Note that other
16	    groups may also distribute working documents as Internet-Drafts.

18	    Internet-Drafts are draft documents valid for a maximum of six months
19	    and may be updated, replaced, or obsoleted by other documents at any
20	    time. It is inappropriate to use Internet-Drafts as reference
21	    material or to cite them other than as "work in progress."

23	    The list of current Internet-Drafts can be accessed at http://
24	    www.ietf.org/ietf/1id-abstracts.txt.

26	    The list of Internet-Draft Shadow Directories can be accessed at
27	    http://www.ietf.org/shadow.html.

29	    This Internet-Draft will expire on May 27, 2004.

31	Copyright Notice

33	    Copyright (C) The Internet Society (2003). All Rights Reserved.

35	Abstract

37	    This memo provides information about potential problems for
38	    applications that use the Unicode Character set in IETF standards. It
39	    especially examines differences between normalization rules in
40	    different versions of the Unicode character set.

42	1. The problem

44	    The Unicode Standard Annex #15 (Unicode Normalization Forms) [3]
45	    specify how the normalization rules are to be applied to strings. In
46	    Annex 12 (Corrigenda) differences between normalization rules between
47	    versions of Unicode are discussed.

49	    The IETF uses these Normalization rules in various standards,
50	    especially the ones creating profiles of stringprep [RFC3454] [2].

52	    The Unicode Consortium has well-defined policies in place to govern
53	    changes that affect backwards compatibility. Once a character is
54	    encoded, its canonical combining class and decomposition mapping will
55	    not be changed in a way that will destabilize normalization.

57	    What this means is: If a string contains only characters from a given
58	    version of the Unicode Standard (e.g., Unicode 3.1.1), and it is put
59	    into a normalized form in accordance with that version of Unicode,
60	    then it will be in normalized form according to any past or future
61	    versions of Unicode.

63	    This guarantee has been in place for Unicode 3.1 and after. It has
64	    been necessary to correct the decompositions of a small number of
65	    characters since Unicode 3.1, as listed in the Normalization
66	    Corrections data file, but such corrections are in accordance with
67	    the above principles: all text normalized on old systems will test as
68	    normalized in future systems. All text normalized in future systems
69	    will test as normalized on past systems. What may change, for those
70	    few characters, is that unnormalized text may normalize differently
71	    on past and future systems.

73	2. Scenario

75	    Assume a client receives a non-normalized string, and then applies
76	    normalization according to normalization rules in a particular
77	    version of Unicode. If the client passes the normalized string to a
78	    server that also has normalized a non-normalized copy of the string,
79	    but has used a different version of the Unicode normalization rules,
80	    the two strings might not match.

82	    Example: In version 3.1 of Unicode, codepoint U+2F874 is normalized
83	    to U+5F33. In version 3.2 U+2F874 is normalized to U+5F53. Say we
84	    have on the Internet nodes A and B. Assume that A is using version
85	    3.1 of Unicode, and B is using version 3.2. U+2F874 is passed to both
86	    A and B. After normalization they will store the strings U+5F33 and
87	    U+5F53 respectively. The end result is that even if the same
88	    codepoint, U+2F874, is passed to both nodes, they will after
89	    normalization have different strings (U+5F33 and U+5F53). If A sends
90	    a message with normalized version of U+2F874 (U+5F33) to B as a
91	    search string, there will be no match at B because B has normalized
92	    the data (U+2F874) to U+5F53.

94	    For the problem to exist, the string (only consisting of the
95	    codepoint U+2F874 in the example above) needs only include at least
96	    one of the codepoints in the correction list (see appendix A). As of
97	    version 4.0.0 of Unicode, the list of corrections (since Unicode 3.1)
98	    consists of exactly 5 codepoints. Over time, as additional errors in
99	    the normalization rules are found, this list will grow. The list is
100	    controlled by the Unicode Consortium; IETF has little or no specific
101	    input into it.

103	3. Recommendation

105	    Applications that implement stringprep or one of its profiles must be
106	    aware of the existence of the corrections table [4]. Version 4.0.0 of
107	    this correction list can be found in Appendix A. If a string that is
108	    to be used for matching includes any of these codepoints, unexpected
109	    results (non-matching when matching should occur) may occur. Because
110	    of this, it is recommended that in sensitive applications /
111	    deployments, special care should be taken.

113	    Examples of problems include (but are not limited to) problems in
114	    protocols which use stringprep and pass a normalized version of
115	    strings received from a human. Such protocols include the DNS [5]
116	    (dispute resolution at the time of domain name registration) and
117	    protocols using domain names (HTTP [6], SMTP [7] etc), LDAP [8]
118	    (characters in the domain name labels as well as searches on
119	    attribute values), Kerberos [9]Kerberos, SASL [10] (authentication
120	    mechanism), iSCSI [11] (names of volumes).

122	    As codepoints can be added to the list at any time, addition of
123	    codepoints can affect already normalized strings. Say a registry
124	    accepts registrations of domain names. If a domain name U+2F868 is to
125	    be registered, according to nameprep profile in Unicode 3.2 the
126	    string U+2136A is to be registered. If later the registry switches to
127	    use version 4.0 of Unicode, the question is whether the registered
128	    string U+2136A is to stay, or whether it should be changed to U+36FC.
129	    It might even be the case that U+36FC is already registered, and by a
130	    different domain name holder. The change in normalization rules in
131	    this case create a potential dispute resolution.

133	3.1 Message to the Unicode Consortium

135	    The IETF strongly encourages the Unicode Consortium to keep the size
136	    and rate of change of the correction list to an absolute minimum, as
137	    it will be impossible for implementations (applications) to know what
138	    version of the normalization tables which are in use. This is
139	    because, in practice, the tables in many cases will be part of the
140	    operating system.  The end user will expect the same normalization
141	    rules to be used in all applications in her environment.

143	3.2 Alternatives for the IETF

145	    When the Stringprep [2] specification is updated in the IETF, there
146	    will be three possible paths forward and a choice must be made:

148	    1.  Stay with use of Unicode 3.2
149	    2.  Change to a later version of Unicode than 3.2, but without the
150	        changes listed in the correction list at that time
151	    3.  Change to a later version of Unicode than 3.2, and accept
152	        incompatible changes in the normalization tables

154	4. Security Considerations

156	    This memo discusses the impact that corrections to the Unicode
157	    normalization rules will have on protocols in the IETF that uses
158	    those rules. Inconsistencies among versions of the rules will create
159	    non-backward compatibility problems.  Even if protocols and
160	    implementations are created correctly, this will lead to strings that
161	    should match in a search or other operation being reported as not
162	    matching.

164	    These false negatives for strings that include the codepoints in the
165	    Unicode Correction Table might lead, for example, to the following
166	    problems:
167	    o  Domain names lookups that should succeed fail instead
168	    o  Collisions between registered domain names occur (i.e., two
169	       different names appear to match, even when there were no
170	       collisions at registration time
171	    o  Searches in LDAP databases fail
172	    o  Searching for iSCSI devices fail
173	    o  Authentication to Kerberos realms (logging in to systems using
174	       Kerberos) fail

176	Normative References

178	    [1]  The Unicode Consortium, "The Unicode Standard", ISBN
179	         0-321-18578-1 The Unicode Standard 4.0, April 2003.

181	    [2]  Hoffman, P. and M. Blanchet, "Preparation of Internationalized
182	         Strings ("stringprep")", RFC 3454, December 2002.

184	    [3]  Davis, M. and M. Durst, "Unicode Normalization Forms", Unicode
185	         Technical Report 15, April 2003.

187	    [4]  The Unicode Consortium, "Normalization Corrections", http://
188	         www.unicode.org/Public/UNIDATA/NormalizationCorrections.txt
189	         Version 4.0.0, April 2003.

191	Informative References

193	    [5]   Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile
194	          for Internationalized Domain Names (IDN)", RFC 3491, March
195	          2003.

197	    [6]   Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L.,
198	          Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol --
199	          HTTP/1.1", RFC 2616, June 1999.

201	    [7]   Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April
202	          2001.

204	    [8]   Zeilenga, K., "LDAP: Internationalized String Preparation",
205	          draft-zeilenga-ldapbis-strprep-00.txt (work in progress), May
206	          2003.

208	    [9]   Altman, J., "Preparation of Internationalized Strings Profile
209	          for Kerberos UTF-8 Strings",
210	          draft-ietf-krb-wg-utf8-profile-01.txt (work in progress),
211	          February 2003.

213	    [10]  Zeilenga, K., "SASLprep: Stringprep profile for user names and
214	          passwords", draft-ietf-sasl-saslprep-03.txt (work in progress),
215	          June 2003.

217	    [11]  Bakke, M., "String Profile for iSCSI Names",
218	          draft-ietf-ips-iscsi-string-prep-04.txt (work in progress),
219	          March 2003.

221	Author's Address

223	    Patrik Faltstroms
224	    Internet Architecture Board

226	    EMail: paf@cisco.com

228	Appendix A. Appendix A

230	    # NormalizationCorrections-4.0.0.txt
231	    #
232	    # This file is a normative contributory data file in the
233	    # Unicode Character Database.
234	    #
235	    # The normalization stabilization policy of the Unicode
236	    # Consortium ordinarily precludes any change to the decomposition
237	    # for any character, once established in a relevant version
238	    # of the UnicodeData.txt data file. However, under certain
239	    # exceptional (and rare) conditions, an error in a decomposition
240	    # mapping may be discovered that is truly just an unintended
241	    # typo in the data, and not a matter of dubious interpretation.
242	    #
243	    # Whenever such an error may be found, and if it meets the
244	    # requirements for possible exceptions to normalization
245	    # stability, the correction is entered in this data file,
246	    # so that any implementation depending on absolute stability
247	    # of normalization, *including* any errors in the data, can
248	    # safely reconstruct the exact state of the data tables at
249	    # any given version of Unicode.
250	    #
251	    # Currently this list has exactly six entries in it, one for the
252	    # typo found and corrected in Corrigendum #3, and five for
253	    # the typos and misidentifications found and corrected in
254	    # Corrigendum #4. All efforts
255	    # will be made to keep the entries limited to just those fixes.
256	    #
257	    # Interpretation of the fields:
258	    #   Field 1: Unicode code point
259	    #   Field 2: Original (erroneous) decomposition
260	    #   Field 3: Corrected decomposition
261	    #   Field 4: Version of Unicode for which the correction was
262	    #            entered into UnicodeData.txt, in n.n.n format.
263	    #   Comment: Indicates the Unicode Corrigendum which documents
264	    #            the correction
265	    #
266	    #
267	    F951;96FB;964B;3.2.0 # Corrigendum 3
268	    2F868;2136A;36FC;4.0.0 # Corrigendum 4
269	    2F874;5F33;5F53;4.0.0 # Corrigendum 4
270	    2F91F;43AB;243AB;4.0.0 # Corrigendum 4
271	    2F95F;7AAE;7AEE;4.0.0 # Corrigendum 4
272	    2F9BF;4D57;45D7;4.0.0 # Corrigendum 4

274	Intellectual Property Statement

276	    The IETF takes no position regarding the validity or scope of any
277	    intellectual property or other rights that might be claimed to
278	    pertain to the implementation or use of the technology described in
279	    this document or the extent to which any license under such rights
280	    might or might not be available; neither does it represent that it
281	    has made any effort to identify any such rights. Information on the
282	    IETF's procedures with respect to rights in standards-track and
283	    standards-related documentation can be found in BCP-11. Copies of
284	    claims of rights made available for publication and any assurances of
285	    licenses to be made available, or the result of an attempt made to
286	    obtain a general license or permission for the use of such
287	    proprietary rights by implementors or users of this specification can
288	    be obtained from the IETF Secretariat.

290	    The IETF invites any interested party to bring to its attention any
291	    copyrights, patents or patent applications, or other proprietary
292	    rights which may cover technology that may be required to practice
293	    this standard. Please address the information to the IETF Executive
294	    Director.

296	Full Copyright Statement

298	    Copyright (C) The Internet Society (2003). All Rights Reserved.

300	    This document and translations of it may be copied and furnished to
301	    others, and derivative works that comment on or otherwise explain it
302	    or assist in its implementation may be prepared, copied, published
303	    and distributed, in whole or in part, without restriction of any
304	    kind, provided that the above copyright notice and this paragraph are
305	    included on all such copies and derivative works. However, this
306	    document itself may not be modified in any way, such as by removing
307	    the copyright notice or references to the Internet Society or other
308	    Internet organizations, except as needed for the purpose of
309	    developing Internet standards in which case the procedures for
310	    copyrights defined in the Internet Standards process must be
311	    followed, or as required to translate it into languages other than
312	    English.

314	    The limited permissions granted above are perpetual and will not be
315	    revoked by the Internet Society or its successors or assignees.

317	    This document and the information contained herein is provided on an
318	    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
319	    TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
320	    BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
321	    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
322	    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

324	Acknowledgment

326	    Funding for the RFC Editor function is currently provided by the
327	    Internet Society.

329	</x-flowed>