idnits 2.17.1 

draft-crispin-collation-unicasemap-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 222.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 233.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 240.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 246.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 266 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 19 instances of too long lines in the document, the longest
     one being 1 character in excess of 72.

  ** The abstract seems to contain references ([BASIC], [COMPARATOR],
     [IMAP-SORT]), which it shouldn't.  Please replace those with straight
     textual mentions of the documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 2, 2007) is 6204 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'IMAP-SORT' is mentioned on line 189, but not defined

  == Missing Reference: 'BASIC' is mentioned on line 184, but not defined

  ** Obsolete normative reference: RFC 3454 (ref. 'STRINGPREP') (Obsoleted by
     RFC 7564)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE'

  -- Possible downref: Non-RFC (?) normative reference: ref.
     'UNICODE-SECURITY'


     Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         M. Crispin
3	Internet-Draft                                  University of Washington
4	Intended status: Proposed Standard                           May 2, 2007
5	Expires: November 2, 2007
6	Document: internet-drafts/draft-crispin-collation-unicasemap-04.txt

8	             i;unicode-casemap - Simple Unicode Collation Algorithm

10	Status of this Memo

12	    By submitting this Internet-Draft, each author represents that
13	    any applicable patent or other IPR claims of which he or she is
14	    aware have been or will be disclosed, and any of which he or she
15	    becomes aware will be disclosed, in accordance with Section 6 of
16	    BCP 79.

18	    Internet-Drafts are working documents of the Internet Engineering
19	    Task Force (IETF), its areas, and its working groups.  Note that
20	    other groups may also distribute working documents as
21	    Internet-Drafts.

23	    Internet-Drafts are draft documents valid for a maximum of six months
24	    and may be updated, replaced, or obsoleted by other documents at any
25	    time.  It is inappropriate to use Internet-Drafts as reference
26	    material or to cite them other than as "work in progress."

28	    The list of current Internet-Drafts can be accessed at
29	    http://www.ietf.org/ietf/1id-abstracts.txt

31	    The list of Internet-Draft Shadow Directories can be accessed at
32	    http://www.ietf.org/shadow.html.

34	    A revised version of this document will be submitted to the RFC
35	    editor as an Informational Document for the Internet Community.

37	    A revised version of this draft document will be submitted to the RFC
38	    editor as a Proposed Standard for the Internet Community.  Discussion
39	    and suggestions for improvement are requested, and should be sent to
40	    ietf-imapext@IMC.ORG.

42	    Distribution of this memo is unlimited.

44	Abstract

46	    This document describes "i;unicode-casemap", a simple
47	    case-insensitive collation for Unicode strings.  It provides
48	    equality, substring and ordering operations.

50	Introduction

52	    The "i;ascii-casemap" collation described in [COMPARATOR] is quite
53	    simple to implement and provides case-independent comparisons for the
54	    26 Latin alphabetics.  It is specified as the default and/or baseline
55	    comparator in some application protocols, e.g., [IMAP-SORT].

57	    It is possible, with a modest extension, to provide a more
58	    sophisticated collation with greater multilingual applicability than
59	    "i;ascii-casemap".

61	    This collation, "i;unicode-casemap", is intended to be an alternative
62	    to, and preferred over, "i;ascii-casemap".  It does not replace the
63	    "i;basic" collation described in [BASIC].

65	1. Unicode Casemap Collation Description

67	    The "i;unicode-casemap" collation is a simple collation which
68	    operates on [UNICODE] strings and is case-insensitive in its
69	    treatment of characters.  It provides equality, substring and
70	    ordering operations.  All input is valid.

72	    The algorithm that describes the behavior of this collation is
73	    specified for Unicode input encoded in [UTF-8].  This is for ease of
74	    description only.  An implementation is free to use another internal
75	    storage format for Unicode strings, as long as it produces the same
76	    result as produced by the algorithm specified in this document for
77	    any set of Unicode strings.

79	    As this collation algorithm is specified for UTF-8 strings, strings
80	    in other character sets and/or encodings can not be used with this
81	    collation unless they are first converted to UTF-8.

83	    Any input that is already in UTF-8 must be checked for invalid UTF-8
84	    sequences, such as overlong sequences.  A UTF-8 string that is
85	    generated from a sequence of Unicode characters according to the
86	    rules in [UTF-8] will not contain such invalid sequences.

88	    For the equality and ordering operations, each input UTF-8 string is
89	    prepared by converting it to "titlecased canonicalized UTF-8", using
90	    UnicodeData.txt distributed by [UNICODE], as follows on a
91	    per-character basis:

93	       (1) If the codepoint has a titlecase property in UnicodeData.txt
94	           (this is normally the same as the uppercase property) the
95	           codepoint is converted to the titlecased codepoint.
96	       (2) If the codepoint has a decomposition property of any type in
97	           UnicodeData.txt the codepoint is converted to the decomposed
98	           codepoints (effectively Normalization Form KD).
99	       (3) The resulting codepoint(s) is/are appended to the titlecased
100	           canonicalized UTF-8 string.

102	    The resulting two titlecased canonicalized UTF-8 strings are then
103	    treated as in i;octet for equality and ordering.

105	    Care should be taken when using OS-supplied functions to implement
106	    this collation as it is not locale sensitive.  Functions such as
107	    strcasecmp and toupper are sometimes locale sensitive and may
108	    inconsistently casemap letters.

110	    The i;unicode-casemap collation is well suited to use with many
111	    Internet protocols and computer languages.  Use with natural language
112	    is often inappropriate; even though the collation apparently supports
113	    languages such as Swahili and English, in real-world use it tends to
114	    mis-sort a number of types of string:

116	    o  people and place names containing scripts that are not collated
117	       according to "alphabetical order".
118	    o  words with characters that have diacriticals.  However,
119	       i;unicode-casemap generally does a better job than i;ascii-casemap
120	       for most (but not all) languages.  For example, German umlaut
121	       letters will sort correctly, but some Scandinavian letters will
122	       not.
123	    o  names such as "Lloyd" (which in Welsh sorts after "Lyon", unlike
124	       in English),
125	    o  strings containing other non-letter symbols; e.g., euro and pound
126	       sterling symbols, quotation marks other than '"', dashes/hyphens,
127	       etc.

129	2. Unicode Casemap Collation Registration

131	    <?xml version='1.0'?>
132	    <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
133	    <collation rfc="XXXX" scope="local" intendedUse="common">
134	      <identifier>i;unicode-casemap</identifier>
135	      <title>Unicode Casemap</title>
136	      <operations>equality order substring</operations>
137	      <specification>RFC XXXX</specification>
138	      <owner>IETF</owner>
139	      <submitter>mrc@cac.washington.edu</submitter>
140	    </collation>

142	3. Security Considerations

144	    Collations will normally be used with UTF-8 strings.  Thus the
145	    security considerations for [UTF-8], [STRINGPREP] and
146	    [UNICODE-SECURITY] also apply and are normative to this
147	    specification.

149	4. IANA Considerations

151	    The i;unicode-casemap collation defined in section 2 should be added
152	    to the registry of collations defined in [COMPARATOR].

154	5. Normative References

156	    The following documents are normative to this document:

158	    [COMPARATOR]          Newman, C., "Internet Appplication Protocol
159	                          Collation Registry", RFC 4790, February 2007.

161	    [STRINGPREP]          Hoffman, P. and M. Blanchet, "Preparation of
162	                          Internationalized Strings ("stringprep")",
163	                          RFC 3454, December 2002.

165	    [UTF-8]               Yergeau, F., "UTF-8, a transformation format
166	                          of ISO 10646", STD 63, RFC 3629, November 2003.

168	    [UNICODE]             <http://www.unicode.org>, UnicodeData.txt

170	                          Although the UnicodeData.txt file referenced
171	                          here is part of the Unicode standard, it is
172	                          subject to change as new characters are added
173	                          to Unicode and errors are corrected in Unicode
174	                          revisions.  As a result, it may be less stable
175	                          than might otherwise be implied by the
176	                          standards status of this specification.

178	    [UNICODE-SECURITY]    Davis, M. and M. Suignard, "Unicode Security
179	                          Considerations", February 2006,
180	                          <http://www.unicode.org/reports/tr36/>.

182	6. Informative References:

184	    [BASIC]               Newman, C., Duerst, M., and Gulbrandsen, A.,
185	                          "i;basic - the Unicode Collation Algorithm",
186	                          draft-gulbrandsen-collation-basic, Work in
187	                          Progress.

189	    [IMAP-SORT]           Crispin, M. "Internet Message Access Protocol -
190	                          SORT and THREAD Extensions",
191	                          draft-ietf-imapext-sort, Work in Progress (in
192	                          RFC Editor queue).

194	Appendices

196	Author's Address

198	    Mark R. Crispin
199	    Networks and Distributed Computing
200	    University of Washington
201	    4545 15th Avenue NE
202	    Seattle, WA  98105-4527

204	    Phone: +1 (206) 543-5762

206	    EMail: MRC@CAC.Washington.EDU

208	Full Copyright Statement

210	    Copyright (C) The IETF Trust (2007).

212	    This document is subject to the rights, licenses and restrictions
213	    contained in BCP 78, and except as set forth therein, the authors
214	    retain all their rights.

216	    This document and the information contained herein are provided on an
217	    "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
218	    OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
219	    THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
220	    OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
221	    THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
222	    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

224	Intellectual Property

226	    The IETF takes no position regarding the validity or scope of any
227	    Intellectual Property Rights or other rights that might be claimed to
228	    pertain to the implementation or use of the technology described in
229	    this document or the extent to which any license under such rights
230	    might or might not be available; nor does it represent that it has
231	    made any independent effort to identify any such rights.  Information
232	    on the procedures with respect to rights in RFC documents can be
233	    found in BCP 78 and BCP 79.

235	    Copies of IPR disclosures made to the IETF Secretariat and any
236	    assurances of licenses to be made available, or the result of an
237	    attempt made to obtain a general license or permission for the use of
238	    such proprietary rights by implementers or users of this
239	    specification can be obtained from the IETF on-line IPR repository at
240	    http://www.ietf.org/ipr.

242	    The IETF invites any interested party to bring to its attention any
243	    copyrights, patents or patent applications, or other proprietary
244	    rights that may cover technology that may be required to implement
245	    this standard.  Please address the information to the IETF at ietf-
246	    ipr@ietf.org.

248	Acknowledgement

250	    Funding for the RFC Editor function is currently provided by the
251	    Internet Society.