technical title for collation

idnits 2.17.1 

draft-newman-i18n-comparator-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1310.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1287.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1294.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1300.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There is 1 instance of too long lines in the document, the longest one
     being 52 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 722 has weird spacing: '...=accent  e, o,...'

  == Line 723 has weird spacing: '...ch=case    e, ...'

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 6, 2005) is 6862 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '4' is defined on line 1204, but no explicit reference
     was found in the text

  == Unused Reference: '12' is defined on line 1234, but no explicit
     reference was found in the text

  == Unused Reference: '13' is defined on line 1238, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RFC XXXX'

  ** Obsolete normative reference: RFC 2234 (ref. '2') (Obsoleted by RFC 4234)

  -- Possible downref: Normative reference to a draft: ref. '4' 

  ** Obsolete normative reference: RFC 3066 (ref. '5') (Obsoleted by RFC
     4646, RFC 4647)

  ** Obsolete normative reference: RFC 3454 (ref. '6') (Obsoleted by RFC 7564)

  ** Obsolete normative reference: RFC 3491 (ref. '7') (Obsoleted by RFC 5891)

  -- Possible downref: Non-RFC (?) normative reference: ref. '8'

  -- Obsolete informational reference (is this intentional?): RFC 2222 (ref.
     '10') (Obsoleted by RFC 4422, RFC 4752)

  -- Obsolete informational reference (is this intentional?): RFC 2434 (ref.
     '12') (Obsoleted by RFC 5226)

  -- Obsolete informational reference (is this intentional?): RFC 2822 (ref.
     '13') (Obsoleted by RFC 5322)

  -- Obsolete informational reference (is this intentional?): RFC 3028 (ref.
     '15') (Obsoleted by RFC 5228, RFC 5429)


     Summary: 8 errors (**), 0 flaws (~~), 8 warnings (==), 15 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                          C. Newman
2	Internet-Draft                                          Sun Microsystems
3	Expires: January 7, 2006                                       M. Duerst
4	                                                                     AGU
5	                                                          A. Gulbrandsen
6	                                                                    Oryx
7	                                                            July 6, 2005

9	            Internet Application Protocol Collation Registry
10	                  draft-newman-i18n-comparator-04.txt

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	   This Internet-Draft will expire on January 7, 2006.

37	Copyright Notice

39	   Copyright (C) The Internet Society (2005).

41	Abstract

43	   Many Internet application protocols include string-based lookup,
44	   searching, or sorting operations.  However the problem space for
45	   searching and sorting international strings is large, not fully
46	   explored, and is outside the area of expertise for the Internet
47	   Engineering Task Force (IETF).  Rather than attempt to solve such a
48	   large problem, this specification creates an abstraction framework so
49	   that application protocols can precisely identify a comparison
50	   function and the repertoire of comparison functions can be extended
51	   in the future.

53	Table of Contents

55	   1.   Introduction . . . . . . . . . . . . . . . . . . . . . . . .   4
56	     1.1  Conventions Used in this Document  . . . . . . . . . . . .   4
57	   2.   Collation Definition and Purpose . . . . . . . . . . . . . .   4
58	     2.1  Definition . . . . . . . . . . . . . . . . . . . . . . . .   4
59	     2.2  Purpose  . . . . . . . . . . . . . . . . . . . . . . . . .   4
60	     2.3  Sort Keys  . . . . . . . . . . . . . . . . . . . . . . . .   5
61	   3.   Collation Name Syntax  . . . . . . . . . . . . . . . . . . .   5
62	     3.1  Basic Syntax . . . . . . . . . . . . . . . . . . . . . . .   5
63	     3.2  Wildcards  . . . . . . . . . . . . . . . . . . . . . . . .   6
64	     3.3  Ordering Direction . . . . . . . . . . . . . . . . . . . .   6
65	     3.4  URIs . . . . . . . . . . . . . . . . . . . . . . . . . . .   6
66	     3.5  Naming Guidelines  . . . . . . . . . . . . . . . . . . . .   7
67	   4.   Collation Specification Requirements . . . . . . . . . . . .   7
68	     4.1  Operations Supported . . . . . . . . . . . . . . . . . . .   7
69	       4.1.1  Equality . . . . . . . . . . . . . . . . . . . . . . .   8
70	     4.2  Substring  . . . . . . . . . . . . . . . . . . . . . . . .   8
71	     4.3  Ordering . . . . . . . . . . . . . . . . . . . . . . . . .   8
72	     4.4  Internal Canonicalization Algorithm  . . . . . . . . . . .   9
73	     4.5  Use of Lookup Tables . . . . . . . . . . . . . . . . . . .   9
74	     4.6  Multi-Value Attributes . . . . . . . . . . . . . . . . . .   9
75	   5.   Application Protocol Requirements  . . . . . . . . . . . . .   9
76	     5.1  Character Encoding . . . . . . . . . . . . . . . . . . . .   9
77	     5.2  Operations . . . . . . . . . . . . . . . . . . . . . . . .  10
78	     5.3  Wildcards  . . . . . . . . . . . . . . . . . . . . . . . .  10
79	     5.4  Canonicalization Function  . . . . . . . . . . . . . . . .  10
80	     5.5  Disconnected Clients . . . . . . . . . . . . . . . . . . .  11
81	     5.6  Error Codes  . . . . . . . . . . . . . . . . . . . . . . .  11
82	     5.7  Octet Collation  . . . . . . . . . . . . . . . . . . . . .  11
83	   6.   Use by ACAP and Sieve  . . . . . . . . . . . . . . . . . . .  11
84	   7.   Collation Registration . . . . . . . . . . . . . . . . . . .  11
85	     7.1  Collation Registration Procedure . . . . . . . . . . . . .  11
86	     7.2  Collation Registration Format  . . . . . . . . . . . . . .  12
87	       7.2.1  Registration Template  . . . . . . . . . . . . . . . .  12
88	       7.2.2  The collation Element  . . . . . . . . . . . . . . . .  13
89	       7.2.3  The name Element . . . . . . . . . . . . . . . . . . .  13
90	       7.2.4  The title Element  . . . . . . . . . . . . . . . . . .  13
91	       7.2.5  The functions Element  . . . . . . . . . . . . . . . .  13
92	       7.2.6  The specification Element  . . . . . . . . . . . . . .  13
93	       7.2.7  The submitter Element  . . . . . . . . . . . . . . . .  14
94	       7.2.8  The owner Element  . . . . . . . . . . . . . . . . . .  14
95	       7.2.9  The version Element  . . . . . . . . . . . . . . . . .  14
96	       7.2.10   The UnicodeVersion Element . . . . . . . . . . . . .  14
97	       7.2.11   The UCAVersion Element . . . . . . . . . . . . . . .  14
98	       7.2.12   The UCAMatchLevel Element  . . . . . . . . . . . . .  14
99	     7.3  DTD for Collation Registration . . . . . . . . . . . . . .  15
100	     7.4  Structure of Collation Registry  . . . . . . . . . . . . .  15
101	     7.5  Example Initial Registry Summary . . . . . . . . . . . . .  16
102	   8.   Guidelines for Expert Reviewer . . . . . . . . . . . . . . .  16
103	   9.   Initial Collations . . . . . . . . . . . . . . . . . . . . .  17
104	     9.1  ASCII Numeric Collation  . . . . . . . . . . . . . . . . .  17
105	       9.1.1  ASCII Numeric Collation Description  . . . . . . . . .  17
106	       9.1.2  ASCII Numeric Collation Registration . . . . . . . . .  18
107	     9.2  ASCII Casemap Collation  . . . . . . . . . . . . . . . . .  18
108	       9.2.1  ASCII Casemap Collation Description  . . . . . . . . .  18
109	       9.2.2  Legacy English Casemap Collation Registration  . . . .  19
110	       9.2.3  English Casemap Collation Registration . . . . . . . .  19
111	     9.3  Nameprep Collation . . . . . . . . . . . . . . . . . . . .  19
112	       9.3.1  Nameprep Collation Description . . . . . . . . . . . .  19
113	       9.3.2  Nameprep Collation Registration  . . . . . . . . . . .  20
114	     9.4  Basic Collation  . . . . . . . . . . . . . . . . . . . . .  20
115	       9.4.1  Basic Collation Description  . . . . . . . . . . . . .  20
116	       9.4.2  Basic Collation Registration . . . . . . . . . . . . .  23
117	       9.4.3  Basic Accent Sensitive Match Collation Registration  .  23
118	       9.4.4  Basic Case Sensitive Match Collation Registration  . .  23
119	     9.5  Octet Collation  . . . . . . . . . . . . . . . . . . . . .  24
120	       9.5.1  Octet Collation Description  . . . . . . . . . . . . .  24
121	       9.5.2  Octet Collation Registration . . . . . . . . . . . . .  25
122	   10.  IANA Considerations  . . . . . . . . . . . . . . . . . . . .  25
123	   11.  Security Considerations  . . . . . . . . . . . . . . . . . .  25
124	   12.  Open Issues  . . . . . . . . . . . . . . . . . . . . . . . .  25
125	   13.  Change Log . . . . . . . . . . . . . . . . . . . . . . . . .  25
126	     13.1   Changes From -03 . . . . . . . . . . . . . . . . . . . .  25
127	     13.2   Changes From -02 . . . . . . . . . . . . . . . . . . . .  26
128	     13.3   Changes From -01 . . . . . . . . . . . . . . . . . . . .  26
129	     13.4   Changes From -00 . . . . . . . . . . . . . . . . . . . .  26
130	   14.  References . . . . . . . . . . . . . . . . . . . . . . . . .  27
131	     14.1   Normative References . . . . . . . . . . . . . . . . . .  27
132	     14.2   Informative References . . . . . . . . . . . . . . . . .  27
133	        Authors' Addresses . . . . . . . . . . . . . . . . . . . . .  28
134	        Intellectual Property and Copyright Statements . . . . . . .  29

136	1.  Introduction

138	   The ACAP [11] specification introduced the concept of a comparator
139	   (which we call collation in this document), but failed to create an
140	   IANA registry.  With the introduction of stringprep [6] and the
141	   Unicode Collation Algorithm [8], it is now time to create that
142	   registry and populate it with some initial values appropriate for an
143	   international community.  This specification replaces and generalizes
144	   the definition of a comparator in ACAP and creates a collation
145	   registry.

147	1.1  Conventions Used in this Document

149	   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
150	   in this document are to be interpreted as defined in "Key words for
151	   use in RFCs to Indicate Requirement Levels" [1].

153	   The attribute syntax specifications use the Augmented Backus-Naur
154	   Form (ABNF) [2] notation including the core rules defined in Appendix
155	   A. This also inherits ABNF rules from Language Tags [5].

157	   The term 'protocol' is used in this memo in a very generic sense, and
158	   includes things such as query languages.

160	2.  Collation Definition and Purpose

162	2.1  Definition

164	   A collation is a named function which takes two arbitrary length
165	   character strings (with the exception of the i;octet (Section 9.5)
166	   collation) as input and can be used to perform one or more of three
167	   basic comparison operations: equality test, substring match, and
168	   ordering test.

170	2.2  Purpose

172	   Collations provide a multi-protocol abstraction layer for comparison
173	   functions so the details of a particular comparison operation can be
174	   specified by someone with appropriate expertise independent of the
175	   application protocol that consumes that collation.  This is similar
176	   to the way a charset [14] separates the details of octet to character
177	   mapping from a protocol specification such as MIME [9] or the way
178	   SASL [10] separates the details of an authentication mechanism from a
179	   protocol specification such as ACAP [11].

181	   Here a small diagram to help illustrate the value of this abstraction
182	   layer:

184	   +-------------------+                         +-----------------+
185	   | IMAP i18n SEARCH  |--+                      | Basic           |
186	   +-------------------+  |                   +--| Collation Spec  |
187	                          |                   |  +-----------------+
188	   +-------------------+  |  +-------------+  |  +-----------------+
189	   | ACAP i18n SEARCH  |--+--| Collation   |--+--| A stringprep    |
190	   +-------------------+  |  | Registry    |  |  | Collation Spec  |
191	                          |  +-------------+  |  +-----------------+
192	   +-------------------+  |                   |  +-----------------+
193	   | ...other protocol |--+                   |  | locale-specific |
194	   +-------------------+                      +--| Collation Spec  |
195	                                                 +-----------------+

197	   Thus IMAP, ACAP and future application protocols with international
198	   search capability simply specify how to interface to the collation
199	   registry instead of each protocol specification having to specify all
200	   the collations it supports.

202	2.3  Sort Keys

204	   One component of a collation is a canonicalization function which can
205	   be pre-applied to single strings and may enhance the performance of
206	   subsequent comparison operations.  Normally, this is an
207	   implementation detail of collations, but at times it may be useful
208	   for an application protocol to expose collation canonicalization over
209	   protocol.  Collation canonicalization can range from an identity
210	   mapping (e.g., the i;octet collation Section 9.5) to a mapping which
211	   makes the string unreadable to a human (e.g., the basic collation).

213	3.  Collation Name Syntax

215	3.1  Basic Syntax

217	   The collation name itself is a single US-ASCII string beginning with
218	   a letter and made up of letters, digits, or one of the following 4
219	   symbols: "-", ";", "=" or ".".  The name MUST NOT be longer than 254
220	   characters.

222	     collation-char  =  ALPHA / DIGIT / "-" / ";" / "=" / "."

224	     collation-name  =  ALPHA *253collation-char

226	3.2  Wildcards

228	   The string a client uses to select a collation MAY contain a wildcard
229	   ("*") character which matches zero or more collation-chars.  Wildcard
230	   characters MUST NOT be adjacent.  Clients which support disconnected
231	   operation SHOULD NOT use wildcards to select a collation, but clients
232	   which provide collation operations only when connected to the server
233	   MAY use wildcards.  If the wildcard string matches multiple
234	   collations, the server SHOULD select the collation with the broadest
235	   scope (preferably international scope), the most recent table
236	   versions and the greatest number of supported operations.  A single
237	   wildcard character ("*") refers to the application protocol collation
238	   behavior that would occur if no explicit negotiation were used.

240	     collation-wild  =  ("*" / (ALPHA ["*"])) *(collation-char ["*"])
241	                         ; MUST NOT exceed 255 characters total

243	3.3  Ordering Direction

245	   When used as a protocol element for ordering, the collation name MAY
246	   be prefixed by either "+" or "-" to explicitly specify an ordering
247	   direction.  As mentioned previously, "+" has no effect on the
248	   ordering function, while "-" negates the result of the ordering
249	   function.  In general, collation-order is used when a client requests
250	   a collation, and collation-sel is used with the server informs the
251	   client of the selected collation.

253	     collation-sel   =  ["+" / "-"] collation-name

255	     collation-order =  ["+" / "-"] collation-wild

257	3.4  URIs

259	   Some protocols are designed to use URIs to refer to collations rather
260	   than simple tokens.  A special section of the IANA web page is
261	   reserved for such usage.  The "collation-uri" form is used to refer
262	   to a specific IANA registry entry for a specific named collation (the
263	   collation registration may not actually be present if it is
264	   experimental).  The "collation-auri" form is an abstract name for an
265	   ordering, a comparator pattern or a vendor private comparator.

267	     collation-uri   =  "http://www.iana.org/assignments/collation/"
268	                        collation-name ".xml"

270	     collation-auri  =  ( "http://www.iana.org/assignments/collation/"
271	                        collation-order [".xml"]) / other-uri

273	     other-uri       =  absoluteURI
274	                     ;  excluding the IANA collation namespace.

276	3.5  Naming Guidelines

278	   While this specification makes no absolute requirements on the
279	   structure of collation names, naming consistency is important, so the
280	   following initial guidelines are provided.

282	   Collation names with an international audience typically begin with
283	   "i;".  Collation names intended for a particular language or locale
284	   typically begin with a language tag [5] followed by a ";".  After the
285	   first ";" is normally the name of the general collation algorithm
286	   followed by a series of algorithm modifications separated by the ";"
287	   delimiter.  Parameterized modifications will use "=" to delimit the
288	   parameter from the value.  The version numbers of any lookup tables
289	   used by the algorithm SHOULD be present as parameterized
290	   modifications.

292	   Collation names of the form *;vnd-domain.com;* are reserved for
293	   vendor-specific collations created by the owner of the domain name
294	   following the "vnd-" prefix.  Registration of such collations (or the
295	   name space as a whole) with intended use of "Vendor" is encouraged
296	   when a public specification or open-source implementation is
297	   available, but is not required.

299	4.  Collation Specification Requirements

301	4.1  Operations Supported

303	   A collation specification MUST state which of the three basic
304	   functions are supported (equality, substring, ordering) and how to
305	   perform each of the supported functions on any two input character
306	   strings including empty strings (with the exception of the i;octet
307	   (Section 9.5) collation).  Collations must be deterministic,
308	   i.e.given a collation with a specific name, and any two fixed input
309	   strings, the result MUST be the same for the same operation.
310	   Collations MUST be transitive.

312	   In general, collation operations should behave as their names
313	   suggest.  While a collation may be new, the operations are not, so
314	   the new collation's algorithm for each operation should be as similar
315	   as possible to those of older collations.  For example, an collator
316	   should not provide a "substring" operator that would morph IMAP
317	   substring SEARCH into another kind of search.

319	4.1.1  Equality

321	   The equality function always returns "match" or "no-match" when
322	   supplied valid input and MAY return "error" if the input strings are
323	   not valid character strings or violate other collation constraints.

325	4.2  Substring

327	   The substring matching function determines if the first string is a
328	   substring of the second string.  A collation which supports substring
329	   matching will automatically support the two special cases of
330	   substring matching: prefix and suffix matching if those special cases
331	   are supported by the application protocol.  It returns "match" or
332	   "no-match" when supplied valid input and returns "error" when
333	   supplied invalid input.

335	   Application protocols MAY return position information for substring
336	   matches.  If this is done, the position information MUST include both
337	   the starting offset and the ending offset in the string.  This is
338	   important because more sophisticated collations can match strings of
339	   unequal length (for example, a pre-composed accented character will
340	   match a decomposed accented character).

342	4.3  Ordering

344	   The ordering function determines how two character strings are
345	   ordered.  It returns "-1" if the first string is listed before the
346	   second string according to the collation, "+1" if the second string
347	   is listed before the first string, and "0" if the two strings are
348	   equal.  If the order of the two strings is reversed, the result of
349	   the ordering function of the collation MUST be reversed, i.e. results
350	   which would be "+1" are instead "-1" and results which would be "-1"
351	   are instead "+1", while results which would be "0" stay "0".  In
352	   general, collations SHOULD NOT return "0" unless the two character
353	   sequences are identical.

355	   Since ordering is normally used to sort a list of items, "error" is
356	   not a useful return value from the ordering function.  Strings with
357	   errors that prevent the sorting algorithm from functioning correctly
358	   should sort to the end of the list.  Thus if the first string is
359	   invalid while the second string is valid, the result will be "+1".
360	   If the second string is invalid while the first string is valid, the
361	   result will be "-1".  If both strings are invalid, the result SHOULD
362	   match the result from the "i;octet" collation.

364	   When the collation is used with a "+" prefix, the behavior is the
365	   same as when used with no prefix.  When the collation is used with a
366	   "-" prefix, the result of the ordering function of the collation MUST
367	   be reversed.

369	4.4  Internal Canonicalization Algorithm

371	   A collation specification MUST describe the internal canonicalization
372	   algorithm.  This algorithm can be applied to individual strings and
373	   the result strings can be stored to potentially optimize future
374	   comparison operations.  A collation MAY specify that the
375	   canonicalization algorithm is the identity function.  The output of
376	   the canonicalization algorithm MAY have no meaning to a human.

378	4.5  Use of Lookup Tables

380	   Collations which use more than one customizable lookup table in a
381	   documented format MUST assign numbers to the tables they use.  This
382	   permits an application protocol command to access the tables used by
383	   a server collation.

385	4.6  Multi-Value Attributes

387	   Some application protocols will permit the use of multi-value
388	   attributes with a collation.  This paragraph describes the rules that
389	   apply unless otherwise specified by the collation or application
390	   protocol.  In the case of the equality and substring operation, the
391	   operations are applied over each pair of single values from the two
392	   inputs.  If any combination produces an error, the result is an
393	   error.  Otherwise, if any combination produces a "match", the result
394	   is a match.  Otherwise the result is "no-match".  For the ordering
395	   function, the smallest ordinal character string from the first set of
396	   values is compared to the smallest ordinal character string from the
397	   second set of values.

399	5.  Application Protocol Requirements

401	   This section describes the requirements and issues that an
402	   application protocol which offers searching, substring matching
403	   and/or sorting and permits the use of characters outside the US-ASCII
404	   charset needs to consider.

406	5.1  Character Encoding

408	   The protocol specification has to make sure that it is clear on which
409	   characters (rather than just octets) the collations are used.  This
410	   can be done by specifying the protocol itself in terms of characters
411	   (e.g. in the case of a query language), by specifying a single
412	   character encoding for the protocol (e.g.  UTF-8 [3]), or by
413	   carefully describing the relevant issues of character encoding
414	   labeling and conversion.  In the later case, details to consider
415	   include how to handle unknown charsets, any charsets which are
416	   mandatory-to-implement, any issues with byte-order that might apply,
417	   and any transfer encodings which need to be supported.

419	5.2  Operations

421	   The protocol must specify which of the operations defined in this
422	   specification (equality matching, substring matching and ordering)
423	   can be invoked in the protocol, and how they are invoked.  There may
424	   be more than one way to invoke an operation.

426	   The protocol MUST provide a mechanism for the client to select the
427	   collation to use with equality matching, substring matching and
428	   ordering.

430	   If the protocol provides positional information for the results of a
431	   substring match, that positional information MUST fully specify the
432	   substring in the result that matches independent of the length of the
433	   search string.  For example, returning both the starting and ending
434	   offset of the match would suffice, as would the starting offset and a
435	   length.  Returning just the starting offset is not acceptable.  This
436	   rule is necessary because advanced collations can treat strings of
437	   different lengths as equal (for example, pre-composed and decomposed
438	   accented characters).

440	5.3  Wildcards

442	   The protocol MUST specify whether it allows the use of wildcards in
443	   collation identifiers or not.  If the protocol allows wildcards,
444	   then:
445	      The protocol MUST specify how comparisons behave in the absence of
446	      explicit collation negotiation or when a collation of "*" is
447	      requested.  The protocol MAY specify that the default collation
448	      used in such circumstances is sensitive to server configuration.
449	      The protocol SHOULD provide a way to list available collations
450	      matching a given wildcard pattern or patterns.

452	5.4  Canonicalization Function

454	   If the protocol provides a canonicalization function for strings,
455	   then use of collations MAY be appropriate for that function.  [Need
456	   to describe how that would be done.]

458	5.5  Disconnected Clients

460	   If the protocol supports disconnected clients, then a mechanism for
461	   the client to precisely replicate the server's collation algorithm is
462	   likely desirable.  Thus the protocol MAY wish to provide a command to
463	   fetch lookup tables used by charset conversions and collations.

465	5.6  Error Codes

467	   The protocol specification should consider assigning protocol error
468	   codes for the following circumstances:
469	   o  The client requests the use of a collation by name or pattern, but
470	      no implemented collation matches that pattern.
471	   o  The client attempts to use a collation for a function that is not
472	      supported by that collation.  For example, attempting to use the
473	      "i;ascii-numeric" collation for a substring matching function.
474	   o  The client uses an equality or substring matching collation and
475	      the result is an error.  It may be appropriate to distinguish
476	      between the two input strings, particularly when one is supplied
477	      by the client and one is stored by the server.  It might also be
478	      appropriate to distinguish the specific case of an invalid UTF-8
479	      string.

481	5.7  Octet Collation

483	   If the protocol permits the use of the i;octet (Section 9.5)
484	   collation, it has to say so.  The octet collation SHOULD NOT be used
485	   unless the protocol uses UTF-8 as its single character encoding.

487	   If the protocol permits the use of collations with data structures
488	   other than strings, the protocol MUST describe the default behavior
489	   for a collation with that data structure.

491	6.  Use by ACAP and Sieve

493	   Both ACAP [11] and Sieve [15] are standards track specifications
494	   which used collations prior to the creation of this specification and
495	   registry.  Those standards do not meet all the application protocol
496	   requirements described in Section 5.  For backwards compatibility,
497	   those protocols use the "i;ascii-casemap" instead of "en;ascii-
498	   casemap".  These protocols allow the use of the i;octet (Section 9.5)
499	   collation working directly on UTF-8 data as used in these protocols.

501	7.  Collation Registration

503	7.1  Collation Registration Procedure

505	   The IETF will create a mailing list, collation@ietf.org, which can be
506	   used for public discussion of collation proposals prior to
507	   registration.  Use of the mailing list is encouraged but not
508	   required.  The actual registration procedure will not begin until the
509	   completed registration template is sent to iana@iana.org.  The IESG
510	   will appoint a designated expert who will monitor the
511	   collation@ietf.org mailing list and review registrations forwarded
512	   from IANA.  The designated expert is expected to tell IANA and the
513	   submitter of the registration within two weeks whether the
514	   registration is approved, approved with minor changes, or rejected
515	   with cause.  When a registration is rejected with cause, it can be
516	   re-submitted if the concerns listed in the cause are addressed.
517	   Decisions made by the designated expert can be appealed to the IESG
518	   and subsequently follow the normal appeals procedure for IESG
519	   decisions.

521	   Collation registrations in a standards track, BCP or IESG-approved
522	   experimental RFC are owned by the IETF, and changes to the
523	   registration follow normal procedures for updating such documents.
524	   Collation registrations in other RFCs are owned by the RFC author(s).
525	   Other collation registrations are owned by the individual(s) listed
526	   in the contact field of the registration and IANA will preserve this
527	   information.  Changes to a registration MUST be approved by the
528	   owner.  In the event the owner cannot be contacted for a period of
529	   one month and a change is deemed necessary, the IESG MAY re-assign
530	   ownership to an appropriate party.

532	7.2  Collation Registration Format

534	   Registration of a collation is done by sending a well-formed XML
535	   document that validates with collationreg.dtd (Section 7.3).

537	7.2.1  Registration Template

539	   Here is a template for the registration:

541	   <?xml version='1.0'?>
542	   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
543	   <collation rfc="YYYY" scope="i18n" intendedUse="common">
544	     <name>collation name</name>
545	     <title>technical title for collation</title>
546	     <functions>equality order substring</functions>
547	     <specification>specification reference</specification>
548	     <owner>email address of owner or IETF</owner>
549	     <submitter>email address of submitter<submitter>
550	     <version>1</version>
551	     <UnicodeVersion>3.2</UnicodeVersion>
552	     <UCAVersion>3.1.1</UCAVersion>
553	   </collation>

555	7.2.2  The collation Element

557	   The root of the registration document MUST be a <collation> element.
558	   The collation element contains the other elements in the
559	   registration, which are described in the following sub-subsections,
560	   in the order given here.

562	   The <collation> element MAY include an "rfc=" attribute if the
563	   specification is in an RFC.  The "rfc=" attribute gives only the
564	   number of the RFC, without any prefix, such as "RFC", or suffix, such
565	   as ".txt".

567	   The <collation> element MUST include a "scope=" attribute, which MUST
568	   have one of the values "i18n", "local" or "other".

570	   The <collation> element MUST include an "intendedUse=" attribute,
571	   which must have one fo the values "common", "limited", "vendor", or
572	   "deprecated".  Collation specifications intended for "common" use are
573	   expected to reference standards from standards bodies with
574	   significant experience dealing with the details of international
575	   character sets.

577	   Be aware that future revisions of this specification may add
578	   additional function types, as well as additional XML attributes and
579	   values.  Any system which automatically parses these XML documents
580	   MUST take this into account to preserve future compatibility.  A DTD
581	   for the current definition of the collation registration template is
582	   given in Section 7.3

584	7.2.3  The name Element

586	   The <name> element gives the precise name of the comparator.  The
587	   <name> element is mandatory.

589	7.2.4  The title Element

591	   The <title> element give the title of the comparator.  The <title>
592	   element is mandatory.

594	7.2.5  The functions Element

596	   The <functions> element lists which of the three functions the
597	   comparator provides.  The <functions> element is mandatory.

599	7.2.6  The specification Element

601	   The <specification> element describes where to find the
602	   specification.  The <specification> element is mandatory.  It MAY
603	   have a URI attribute.  There may be more than one <specification>
604	   elements.  (For example, a collation which has previously been
605	   specified by a vendor may have been published on that vendor's web
606	   site, and subsequently by a standards organization.)

608	   In case the different specifications differ, the RFC is the
609	   definitive specification.

611	7.2.7  The submitter Element

613	   The <submitter> element provides an RFC 2822 email address for the
614	   person who submitted the registration.  It is optional if the <owner>
615	   element contains an email address.

617	   There may be more than one <submitter> elements.

619	7.2.8  The owner Element

621	   The <owner> element contains either the four letters "IETF" or an
622	   email address of the owner of the registration.  The <owner> element
623	   is mandatory.  There may be more than one <owner> elements.  If so,
624	   all owners are equal.  Each owner can speak for all.

626	7.2.9  The version Element

628	   The <version> element is included when the registration is likely to
629	   be revised or has been revised in such a way that the results change
630	   for certain input strings.  The <version> element is optional.

632	7.2.10  The UnicodeVersion Element

634	   The <UnicodeVersion> element indicates the version number of the
635	   UnicodeData file on which the collation is based.  The
636	   <UnicodeVersion> element is optional.

638	7.2.11  The UCAVersion Element

640	   The <UCAVersion> element specifics the version of the Unicode
641	   Collation Algorithm on which the collation is based.  The
642	   <UCAVersion> element is optional.

644	7.2.12  The UCAMatchLevel Element

646	   The <UCAMatchLevel> element specifies the number of Unicode Collation
647	   Algorithm sort key levels used for the equality and substring
648	   operations.  The <UCAMatchLevel> element is optional.

650	7.3  DTD for Collation Registration

652	   <!-
653	     DTD for Collation Registration Document

655	     Data types:

657	     entity      description
658	     ======      ===========
659	     NUMBER      [0-9]+
660	     URI         As defined in RFC YYYY
661	     CTEXT       printable ASCII text (no line-terminators)
662	     TEXT        character data
663	     ->
664	   <!ENTITY % NUMBER        "CDATA">
665	   <!ENTITY % URI           "CDATA">
666	   <!ENTITY % CTEXT         "#PCDATA">
667	   <!ENTITY % TEXT          "#PCDATA">
668	   <!ELEMENT collation      (name,title,functions,specification+,owner+,
669	                             submitter*,version?,UnicodeVersion?,
670	                             UCAVersion?,UCAMatchLevel?)>
671	   <!ATTLIST collation
672	             rfc            %NUMBER;                           "0"
673	             scope          (i18n|local|other)                 #IMPLIED
674	             intendedUse    (common|limited|vendor|deprecated) #IMPLIED>
675	   <!ELEMENT name           (%CTEXT;)>
676	   <!ELEMENT title          (%CTEXT;)>
677	   <!ELEMENT functions      (%CTEXT;)>
678	   <!ELEMENT specification  (%TEXT;)>
679	   <!ATTLIST specification
680	             uri            %URI;                              "">
681	   <!ELEMENT owner          (%CTEXT;)>
682	   <!ELEMENT submitter      (%CTEXT;)>
683	   <!ELEMENT version        (%CTEXT;)>
684	   <!ELEMENT UnicodeVersion (%CTEXT;)>
685	   <!ELEMENT UCAVersion     (%CTEXT;)>
686	   <!ELEMENT UCAMatchLevel  (%CTEXT;)>

688	7.4  Structure of Collation Registry

690	   Once the registration is approved, IANA will store each XML
691	   registration document in a URL of the form
692	   http://www.iana.org/assignments/collation/collation-name.xml where
693	   collation-name is the contents of the name element in the
694	   registration.  Both the submitter and the designated expert is
695	   responsible for verifying that the XML is well-formed and complies
696	   with the DTD.

698	   IANA will also maintain a text summary of the registry under the name
699	   http://www.iana.org/assignments/collation/summary.txt.  This summary
700	   is divided into four sections.  The first section is for collations
701	   intended for common use.  This section is intended for collation
702	   registrations published in IESG approved RFCs or for locally scoped
703	   collations from the primary standards body for that locale.  The
704	   designated expert is encouraged to reject collation registrations
705	   with an intended use of "common" if the expert believes it should be
706	   "limited", as it is desirable to keep the number of "common"
707	   registrations small and high quality.  The second section is reserved
708	   for limited use collations.  The third section is reserved for
709	   registered vendor specific collations.  The final section is reserved
710	   for deprecated collations.

712	7.5  Example Initial Registry Summary

714	   The following is an example of how IANA might structure the initial
715	   registry summary.txt file:

717	     Collation                              Functions Scope Reference
718	     ---------                              --------- ----- ---------
719	   Common Use Collations:
720	     i;nameprep;v=1;uv=3.2                  e, o, s   i18n  [RFC XXXX]
721	     i;basic;uca=3.1.1;uv=3.2               e, o, s   i18n  [RFC XXXX]
722	     i;basic;uca=3.1.1;uv=3.2;match=accent  e, o, s   i18n  [RFC XXXX]
723	     i;basic;uca=3.1.1;uv=3.2;match=case    e, o, s   i18n  [RFC XXXX]
724	     en;ascii-casemap                       e, o, s   Local [RFC XXXX]

726	   Limited Use Collations:
727	     i;octet                                e, o, s   Other [RFC XXXX]
728	     i;ascii-numeric                        e, o      Other [RFC XXXX]

730	   Vendor Collations:

732	   Deprecated Collations:
733	     i;ascii-casemap                        e, o, s   Local [RFC XXXX]

735	   References
736	   ----------
737	   [RFC XXXX]  Newman, C., "Internet Application Protocol Collation
738	               Registry", RFC XXXX, Sun Microsystems, October 2003.

740	8.  Guidelines for Expert Reviewer

742	   The expert reviewer appointed by the IESG has fairly broad latitude
743	   for this registry.  While a number of collations are expected
744	   (particularly customizations of the basic collation for localized
745	   use), an explosion of collations (particularly common use collations)
746	   is not desirable for widespread interoperability.  However, it is
747	   important for the expert reviewer to provide cause when rejecting a
748	   registration, and when possible to describe corrective action to
749	   permit the registration to proceed.  The following table includes
750	   some example reasons to reject a registration with cause:
751	   o  The registration is not a well-formed XML document that follows
752	      the DTD.
753	   o  The registration has intended use of "common", but there is no
754	      evidence the collation will be widely deployed so it should be
755	      listed as "limited".
756	   o  The registration has intended use of "common", but is redundant
757	      with the functionality of a previously registered "common"
758	      collation.
759	   o  The registration has intended use of "common", but the
760	      specification is not detailed enough to allow interoperable
761	      implementations by others.
762	   o  The collation name fails to precisely identify the version numbers
763	      of relevant tables to use.
764	   o  The registration fails to meet one of the "MUST" requirements in
765	      Section 4.
766	   o  The collation name fails to meet the syntax in Section 3.
767	   o  The collation specification referenced in the registration is
768	      vague or has optional features without a clear behavior specified.
769	   o  The referenced specification does not adequately address security
770	      considerations specific to that collation.
771	   o  The regitration's operations are needlessly different from those
772	      of traditional operations.

774	9.  Initial Collations

776	   This section describes an initial set of collations for the collation
777	   registry.

779	9.1  ASCII Numeric Collation

781	9.1.1  ASCII Numeric Collation Description

783	   The "i;ascii-numeric" collation is a simple collation intended for
784	   use with arbitrary sized decimal numbers stored as octet strings of
785	   US-ASCII digits (0x30 to 0x39).  It supports equality and ordering,
786	   but does not support the substring function.  The algorithm is as
787	   follows:
788	   1.  If neither string begins with a digit, return "error" if
789	       matching, or the result of the "i;octet" collation for ordering.

791	   2.  If the first string begins with a digit and the second string
792	       does not, return "error" if matching and "-1" for ordering.
793	   3.  If the second string begins with a digit and the first string
794	       does not, return "error" if matching and "+1" for ordering.
795	   4.  Let "n" be the number of digits at the beginning of the first
796	       string, and "m" be the number of digits at the beginning of the
797	       second string.
798	   5.  If n is equal to m, return the result of the "i;octet" collation.
799	   6.  If n is greater than m, prepend a string of "n - m" zeros to the
800	       second string and return the result of the "i;octet" collation.
801	   7.  If m is greater than n, prepend a string of "m - n" zeros to the
802	       first string and return the result of the "i;octet" collation.

804	   The associated canonicalization algorithm is to truncate the input
805	   string at the first non-digit character.

807	9.1.2  ASCII Numeric Collation Registration

809	   <?xml version='1.0'?>
810	   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
811	   <collation rfc="XXXX" scope="other" intendedUse="limited">
812	     <name>i;ascii-numeric</name>
813	     <title>ASCII Numeric</title>
814	     <functions>equality order</functions>
815	     <specification>RFC XXXX</specification>
816	     <owner>IETF</owner>
817	     <submitter>chris.newman@sun.com<submitter>
818	   </collation>

820	9.2  ASCII Casemap Collation

822	9.2.1  ASCII Casemap Collation Description

824	   The "en;ascii-casemap" collation is a simple collation intended for
825	   use with English language text in pure US-ASCII.  It provides
826	   equality, substring and ordering functions.  The algorithm first
827	   applies a canonicalization algorithm to both input strings which
828	   subtracts 32 (0x20) from all octet values between 97 (0x61) and 122
829	   (0x7A) inclusive.  The result of the collation is then the same as
830	   the result of the "i;octet" collation for the canonicalized strings.
831	   Care should be taken when using OS-supplied functions to implement
832	   this collation as this is not locale sensitive, but functions such as
833	   strcasecmp and toupper can be locale sensitive.

835	   For historical reasons, in the context of ACAP and Sieve, the name
836	   "i;ascii-casemap" is a synonym for this collation.

838	9.2.2  Legacy English Casemap Collation Registration

840	   <?xml version='1.0'?>
841	   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
842	   <collation rfc="XXXX" scope="local" intendedUse="deprecated">
843	     <name>i;ascii-casemap</name>
844	     <title>Legacy English Casemap</title>
845	     <functions>equality order substring</functions>
846	     <specification>RFC XXXX</specification>
847	     <owner>IETF</owner>
848	     <submitter>chris.newman@sun.com<submitter>
849	   </collation>

851	9.2.3  English Casemap Collation Registration

853	   <?xml version='1.0'?>
854	   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
855	   <collation rfc="XXXX" scope="local" intendedUse="common">
856	     <name>en;ascii-casemap</name>
857	     <title>English Casemap</title>
858	     <functions>equality order substring</functions>
859	     <specification>RFC XXXX</specification>
860	     <owner>IETF</owner>
861	     <submitter>chris.newman@sun.com<submitter>
862	   </collation>

864	9.3  Nameprep Collation

866	9.3.1  Nameprep Collation Description

868	   The "i;nameprep;v=1;uv=3.2" collation is an implementation of the
869	   nameprep [7] specification based on normalization tables from Unicode
870	   version 3.2.  This collation applies the nameprep canoncialization
871	   function to both input strings and then returns the result of the
872	   i;octet collation on the canonicalized strings.  While this collation
873	   offers all three functions, the ordering function it provides is
874	   inadequate for use by the majority of the world.

876	   Version number 1 is applied to nameprep as specified in RFC 3491.  If
877	   the nameprep specification is revised without any changes that would
878	   produce different results when given the same pair of input octet
879	   strings, then the version number will remain unchanged.

881	   The table numbers for tables used by nameprep are as follows:

883	                +--------------+-----------------------+
884	                | Table Number | Table Name            |
885	                +--------------+-----------------------+
886	                |            1 | UnicodeData-3.2.0.txt |
887	                |            2 | Table B.1             |
888	                |            3 | Table B.2             |
889	                |            4 | Table C.1.2           |
890	                |            5 | Table C.2.2           |
891	                |            6 | Table C.3             |
892	                |            7 | Table C.4             |
893	                |            8 | Table C.5             |
894	                |            9 | Table C.6             |
895	                |           10 | Table C.7             |
896	                |           11 | Table C.8             |
897	                |           12 | Table C.9             |
898	                +--------------+-----------------------+

900	9.3.2  Nameprep Collation Registration

902	   <?xml version='1.0'?>
903	   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
904	   <collation rfc="XXXX" scope="i18n" intendedUse="common">
905	     <name>i;nameprep;v=1;uv=3.2</name>
906	     <title>Nameprep</title>
907	     <functions>equality order substring</functions>
908	     <specification>RFC XXXX</specification>
909	     <owner>IETF</owner>
910	     <submitter>chris.newman@sun.com<submitter>
911	     <version>1</version>
912	     <UnicodeVersion>3.2</UnicodeVersion>
913	   </collation>

915	9.4  Basic Collation

917	9.4.1  Basic Collation Description

919	   The basic collation is intended to provide tolerable results for a
920	   number of languages for all three functions (equality, substring and
921	   ordering) so it is suitable as a mandatory-to-implement collation for
922	   protocols which include ordering support.  The ordering function of
923	   the basic collation is the Unicode Collation Algorithm [8] version 9
924	   (UCAv9).

926	   The equality and substring functions are created as described in
927	   UCAv9 section 8.  While that section is informative to UCAv9, it is
928	   normative to this collation specification.

930	   This collation is based on Unicode version 3.2, with the following
931	   tables relevant:
932	   1.  For the normalization step,
933	       <http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.txt>
934	       is used.  Column 5 is used to determine the canonical
935	       decomposition, while column 3 contains the canonical combining
936	       classes necessary to attain canonical order.
937	   2.  The table of characters which require a logical order exception
938	       is a subset of the table in
939	       <http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.txt> and
940	       is included here:

942	   0E40..0E44    ; Logical_Order_Exception
943	   # Lo   [5] THAI CHARACTER SARA E..THAI CHARACTER SARA AI MAIMALAI
944	   0EC0..0EC4    ; Logical_Order_Exception
945	   # Lo   [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI

947	   # Total code points: 10

949	   3.  The table used to translate normalized code points to a sort key
950	       is <http://www.unicode.org/reports/tr10/allkeys-3.1.1.txt>.

952	   UCAv9 includes a number of configurable parameters and steps labelled
953	   as potentially optional.  The following list summarizes the defaults
954	   used by this collation:
955	   o  The logical order exception step is mandatory by default to
956	      support the largest number of languages.
957	   o  Steps 2.1.1 to 2.1.3 are mandatory as the repertoire of the basic
958	      collation is intended to be large.
959	   o  The second level in the sort key is evaluated forwards by default.
960	   o  The variable weighting uses the "non-ignorable" option by default.
961	   o  The semi-stable option is not used by default.
962	   o  Support for exactly three levels of collation is the default
963	      behavior.
964	   o  No preprocessing step is used by the basic collation prior to
965	      applying the UCAv9 algorithm.  Note that an application protocol
966	      specification MAY require pre-processing prior to the use of any
967	      collations.
968	   o  The equality and substring algorithms exclude differences at level
969	      2 and 3 by default (thus it is case-insensitive and ignores
970	      accentual distinctions.
971	   o  The equality and substring algorithms use the "Whole Characters
972	      Only" feature described in UCAv9 section 8 by default.

974	   The exact collation name with these defaults is
975	   "i;basic;uca=3.1.1;uv=3.2".  When a specification states that the
976	   basic collation is mandatory-to-implement, only this specific name is
977	   mandatory-to-implement.

979	   In order to allow modification of the optional behaviors, the
980	   following ABNF is used for variations of the basic collation:

982	     basic-collation  =  ("i" / Language-Tag) ";basic;uca=3.1.1;uv=3.2"
983	                         [";match=accent" / ";match=case"]
984	                         [";tailor=" 1*collation-char ]

986	   If multiple modifiers appear, they MUST appear in the order described
987	   above.  The modifiers have the following meanings:
988	   match=accent   Both the first and second levels of the sort keys are
989	                  considered relevant to the equality and substring
990	                  operations (rather than the default of first level
991	                  only).  This makes the matching functions sensitive to
992	                  accentual distinctions.
993	   match=case     The first three levels of sort keys are considered
994	                  relevant to the equality and substring operations.
995	                  This makes the matching functions sensitive to both
996	                  case and accentual distinctions.

998	   The default weighting option is "non-ignorable".  The "semi-stable"
999	   sort key option is not used by default.

1001	   The canonicalization algorithm associated with this collation is the
1002	   output of step 3 of the UCAv9 algorithm (described in section 4.3 of
1003	   the UCA specification).  This canonicalization is not suitable for
1004	   human consumption.

1006	   Finally, the UCAv9 algorithm permits the "allkeys" table to be
1007	   tailored to a language.  People who make quality tailorings are
1008	   encouraged to register those tailorings using the collation registry.
1009	   Tailoring names beginning with "x" are reserved for experimental use,
1010	   are treated as "Limited use" and MUST NOT match wildcards if any
1011	   registered collation is available that does match.

1013	9.4.2  Basic Collation Registration

1015	   <?xml version='1.0'?>
1016	   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
1017	   <collation rfc="XXXX" scope="i18n" intendedUse="common">
1018	     <name>i;basic;uca=3.1.1;uv=3.2</name>
1019	     <title>Basic</title>
1020	     <functions>equality order substring</functions>
1021	     <specification>RFC XXXX</specification>
1022	     <owner>IETF</owner>
1023	     <submitter>chris.newman@sun.com<submitter>
1024	     <UnicodeVersion>3.2</UnicodeVersion>
1025	     <UCAVersion>3.1.1</UCAVersion>
1026	     <UCAMatchLevel>1</UCAMatchLevel>
1027	   </collation>

1029	9.4.3  Basic Accent Sensitive Match Collation Registration

1031	   <?xml version='1.0'?>
1032	   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
1033	   <collation rfc="XXXX" scope="i18n" intendedUse="common">
1034	     <name>i;basic;uca=3.1.1;uv=3.2;match=accent</name>
1035	     <title>Basic Accent Sensitive Match</title>
1036	     <functions>equality order substring</functions>
1037	     <specification>RFC XXXX</specification>
1038	     <owner>IETF</owner>
1039	     <submitter>chris.newman@sun.com<submitter>
1040	     <UnicodeVersion>3.2</UnicodeVersion>
1041	     <UCAVersion>3.1.1</UCAVersion>
1042	     <UCAMatchLevel>2</UCAMatchLevel>
1043	   </collation>

1045	9.4.4  Basic Case Sensitive Match Collation Registration

1047	   <?xml version='1.0'?>
1048	   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
1049	   <collation rfc="XXXX" scope="i18n" intendedUse="common">
1050	     <name>i;basic;uca=3.1.1;uv=3.2;match=case</name>
1051	     <title>Basic Case Sensitive Match</title>
1052	     <functions>equality order substring</functions>
1053	     <specification>RFC XXXX</specification>
1054	     <owner>IETF</owner>
1055	     <submitter>chris.newman@sun.com<submitter>
1056	     <UnicodeVersion>3.2</UnicodeVersion>
1057	     <UCAVersion>3.1.1</UCAVersion>
1058	     <UCAMatchLevel>3</UCAMatchLevel>

1060	   </collation>

1062	9.5  Octet Collation

1064	9.5.1  Octet Collation Description

1066	   The "i;octet" collation is a simple and fast collation intended for
1067	   use on binary octet strings rather than on character data.  It is the
1068	   only such collation; it is not possible to register additional
1069	   collations with this property.  Protocols that want to make this
1070	   collation available have to do so by explicitly allowing it.  If not
1071	   explicitly allowed, it MUST NOT be used.  It never returns an "error"
1072	   result.  It provides equality, substring and ordering functions.

1074	   The ordering algorithm is as follows:
1075	   1.  If both strings are the empty string, return the result "0".
1076	   2.  If the first string is empty and the second is not, return the
1077	       result "-1".
1078	   3.  If the second string is empty and the first is not, return the
1079	       result "+1".
1080	   4.  If both strings begin with the same octet value, remove the first
1081	       octet from both strings and repeat this algorithm from step 1.
1082	   5.  If the unsigned value (0 to 255) of the first octet of the first
1083	       string is less than the unsigned value of the first octet of the
1084	       second string, then return "-1".
1085	   6.  If this step is reached, return "+1".

1087	   This algorithm is roughly equivalent to the C library function memcmp
1088	   with appropriate length checks added.

1090	   The matching function returns "match" if the sorting algorithm would
1091	   return "0".  Otherwise the matching function returns "no-match".

1093	   The substring function returns "match" if the first string is the
1094	   empty string, or if there exists a substring of the second string of
1095	   length equal to the length of the first string which would result in
1096	   a "match" result from the equality function.  Otherwise the substring
1097	   function returns "no-match".

1099	   The associated canonicalization algorithm is the identity function.

1101	9.5.2  Octet Collation Registration

1103	   This collation is defined with intendedUse="limited" because it can
1104	   only be used by protocols that explicitly allow it.

1106	   <?xml version='1.0'?>
1107	   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
1108	   <collation rfc="XXXX" scope="i18n" intendedUse="limited">
1109	     <name>i;octet</name>
1110	     <title>Octet</title>
1111	     <functions>equality order substring</functions>
1112	     <specification>RFC XXXX</specification>
1113	     <owner>IETF</owner>
1114	     <submitter>chris.newman@sun.com<submitter>
1115	   </collation>

1117	10.  IANA Considerations

1119	   Section 7 defines how to register collations with IANA.  Section 9
1120	   defines a list of predefined collations, which should be registered
1121	   when this document is approved and published as an RFC.

1123	11.  Security Considerations

1125	   Collations will normally be used with UTF-8 strings.  Thus the
1126	   security considerations for UTF-8 [3] and stringprep [6] also apply
1127	   and are normative to this specification.

1129	12.  Open Issues

1131	   See http://www.w3.org/2004/08/ietf-collation.

1133	13.  Change Log

1135	13.1  Changes From -03

1137	   (This does not include all changes made.)
1138	   1.  Checked and resolved most issues marked 'check whether this is
1139	       true' or similar.
1140	   2.  Resolved nameprep issue: No.
1141	   3.  Removed NULL for compatibility with existing collations (IMAP
1142	       SORT, Sieve).
1143	   4.  There can be multiple owners and submitters.  Say how.
1144	   5.  Added a requirement that common collations must now be
1145	       interoperable.  Insufficiently detailed specs cannot be "common".

1147	   6.  Added a guideline that the operations provided by new collations
1148	       should be reminiscent of similar operations on existing
1149	       collations.

1151	13.2  Changes From -02

1153	   1.  Changed from data being octet sequences (in UTF-8) to data being
1154	       character sequences (with octet collation as an exception).
1155	   2.  Made XML format description much more structured.
1156	   3.  Changed <submittor> to <submitter>, because this spelling is much
1157	       more common.
1158	   4.  Defined 'protocol' to include query languages.
1159	   5.  Reorganized document, in particular IANA considerations section
1160	       (which newly is just a list of pointers).
1161	   6.  Added subsections, and a 'Structure of this Document' section.
1162	   7.  Updated references.
1163	   8.  Created a 'Change Log' chapter, with sections for each draft.
1164	   9.  Reduced 'Open issues' section, open issues are now maintained at
1165	       http://www.w3.org/2004/08/ietf-collation.

1167	13.3  Changes From -01

1169	   Add IANA comment to open issues.  Otherwise this is just a re-publish
1170	   to keep the document alive.

1172	13.4  Changes From -00

1174	   1.  Replaced the term comparator with collation.  While comparator is
1175	       somewhat more precise because these abstract functions are used
1176	       for matching as well as ordering, collation is the term used by
1177	       other parts of the industry.  Thus I have changed the name to
1178	       collation for consistency.
1179	   2.  Remove all modifiers to the basic collation except for the
1180	       customization and the match rules.  The other behavior
1181	       modifications can be specified in a customization of the
1182	       collation.
1183	   3.  Use ";" instead of "-" as delimiter between parameters to make
1184	       names more URL-ish.
1185	   4.  Add URL form for comparator reference.
1186	   5.  Switched registration template to use XML document.
1187	   6.  Added a number of useful registration template elements related
1188	       to the Unicode Collation Algorithm.
1189	   7.  Switched language from "custom" to "tailor" to match UCA language
1190	       for tailoring of the collation algorithm.

1192	14.  References
1193	14.1  Normative References

1195	   [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
1196	        Levels", BCP 14, RFC 2119, March 1997.

1198	   [2]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
1199	        Specifications: ABNF", RFC 2234, November 1997.

1201	   [3]  Yergeau, F., "UTF-8, a transformation format of ISO 10646",
1202	        STD 63, RFC 3629, November 2003.

1204	   [4]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1205	        Resource Identifier (URI): Generic Syntax",
1206	        draft-fielding-uri-rfc2396bis-07.txt (work in progress),
1207	        April 2004.

1209	   [5]  Alvestrand, H., "Tags for the Identification of Languages",
1210	        BCP 47, RFC 3066, January 2001.

1212	   [6]  Hoffman, P. and M. Blanchet, "Preparation of Internationalized
1213	        Strings ("stringprep")", RFC 3454, December 2002.

1215	   [7]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for
1216	        Internationalized Domain Names (IDN)", RFC 3491, March 2003.

1218	   [8]  Davis, M. and K. Whistler, "Unicode Collation Algorithm version
1219	        9", July 2002,
1220	        <http://www.unicode.org/reports/tr10/tr10-9.html>.

1222	14.2  Informative References

1224	   [9]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
1225	         Extensions (MIME) Part One: Format of Internet Message Bodies",
1226	         RFC 2045, November 1996.

1228	   [10]  Myers, J., "Simple Authentication and Security Layer (SASL)",
1229	         RFC 2222, October 1997.

1231	   [11]  Newman, C. and J. Myers, "ACAP -- Application Configuration
1232	         Access Protocol", RFC 2244, November 1997.

1234	   [12]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
1235	         Considerations Section in RFCs", BCP 26, RFC 2434,
1236	         October 1998.

1238	   [13]  Resnick, P., "Internet Message Format", RFC 2822, April 2001.

1240	   [14]  Freed, N. and J. Postel, "IANA Charset Registration
1241	         Procedures", BCP 19, RFC 2978, October 2000.

1243	   [15]  Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,
1244	         January 2001.

1246	Authors' Addresses

1248	   Chris Newman
1249	   Sun Microsystems
1250	   1050 Lakes Drive
1251	   West Covina, CA  91790
1252	   US

1254	   Email: chris.newman@sun.com

1256	   Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever possible, for example as "D&#252;rst" in XML and HTML.)
1257	   Aoyama Gakuin University
1258	   5-10-1 Fuchinobe
1259	   Sagamihara, Kanagawa  229-8558
1260	   Japan

1262	   Phone: +81 466 49 1170
1263	   Fax:   +81 466 49 1171
1264	   Email: mailto:duerst@it.aoyama.ac.jp
1265	   URI:   http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/

1267	   Arnt Gulbrandsen
1268	   Oryx Mail Systems GmbH
1269	   Schweppermannstr. 8
1270	   Munich  81671
1271	   Germany

1273	   Phone: +49 89 4502 9757
1274	   Fax:   +49 89 4502 9758
1275	   Email: mailto:arnt@oryx.com
1276	   URI:   http://www.oryx.com/arnt/

1278	Intellectual Property Statement

1280	   The IETF takes no position regarding the validity or scope of any
1281	   Intellectual Property Rights or other rights that might be claimed to
1282	   pertain to the implementation or use of the technology described in
1283	   this document or the extent to which any license under such rights
1284	   might or might not be available; nor does it represent that it has
1285	   made any independent effort to identify any such rights.  Information
1286	   on the procedures with respect to rights in RFC documents can be
1287	   found in BCP 78 and BCP 79.

1289	   Copies of IPR disclosures made to the IETF Secretariat and any
1290	   assurances of licenses to be made available, or the result of an
1291	   attempt made to obtain a general license or permission for the use of
1292	   such proprietary rights by implementers or users of this
1293	   specification can be obtained from the IETF on-line IPR repository at
1294	   http://www.ietf.org/ipr.

1296	   The IETF invites any interested party to bring to its attention any
1297	   copyrights, patents or patent applications, or other proprietary
1298	   rights that may cover technology that may be required to implement
1299	   this standard.  Please address the information to the IETF at
1300	   ietf-ipr@ietf.org.

1302	Disclaimer of Validity

1304	   This document and the information contained herein are provided on an
1305	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1306	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1307	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1308	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1309	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1310	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1312	Copyright Statement

1314	   Copyright (C) The Internet Society (2005).  This document is subject
1315	   to the rights, licenses and restrictions contained in BCP 78, and
1316	   except as set forth therein, the authors retain all their rights.

1318	Acknowledgment

1320	   Funding for the RFC Editor function is currently provided by the
1321	   Internet Society.