idnits 2.17.1 draft-newman-i18n-comparator-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1379. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1356. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1363. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1369. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 52 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 13, 2006) is 6397 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2244' is mentioned on line 863, but not defined == Missing Reference: 'SIEVE' is mentioned on line 863, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. 'RFC XXXX' ** Obsolete normative reference: RFC 4234 (ref. '2') (Obsoleted by RFC 5234) ** Obsolete normative reference: RFC 4646 (ref. '5') (Obsoleted by RFC 5646) ** Obsolete normative reference: RFC 3454 (ref. '6') (Obsoleted by RFC 7564) -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Obsolete informational reference (is this intentional?): RFC 2222 (ref. '10') (Obsoleted by RFC 4422, RFC 4752) -- Obsolete informational reference (is this intentional?): RFC 2822 (ref. '12') (Obsoleted by RFC 5322) -- Obsolete informational reference (is this intentional?): RFC 3028 (ref. '14') (Obsoleted by RFC 5228, RFC 5429) -- Obsolete informational reference (is this intentional?): RFC 3501 (ref. '15') (Obsoleted by RFC 9051) == Outdated reference: A later version (-20) exists of draft-ietf-imapext-sort-17 == Outdated reference: A later version (-15) exists of draft-ietf-imapext-i18n-06 Summary: 7 errors (**), 0 flaws (~~), 7 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group C. Newman 2 Internet-Draft Sun Microsystems 3 Expires: March 17, 2007 M. Duerst 4 AGU 5 A. Gulbrandsen 6 Oryx 7 September 13, 2006 9 Internet Application Protocol Collation Registry 10 draft-newman-i18n-comparator-14.txt 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on March 17, 2007. 37 Copyright Notice 39 Copyright (C) The Internet Society (2006). 41 Abstract 43 Many Internet application protocols include string-based lookup, 44 searching, or sorting operations. However the problem space for 45 searching and sorting international strings is large, not fully 46 explored, and is outside the area of expertise for the Internet 47 Engineering Task Force (IETF). Rather than attempt to solve such a 48 large problem, this specification creates an abstraction framework so 49 that application protocols can precisely identify a comparison 50 function and the repertoire of comparison functions can be extended 51 in the future. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 1.1. Conventions Used in this Document . . . . . . . . . . . . 4 57 2. Collation Definition and Purpose . . . . . . . . . . . . . . . 4 58 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . 4 59 2.2. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2.3. Some Other Terms Used in this Document . . . . . . . . . 5 61 2.4. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 5 62 3. Collation Identifier Syntax . . . . . . . . . . . . . . . . . 6 63 3.1. Basic Syntax . . . . . . . . . . . . . . . . . . . . . . 6 64 3.2. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.3. Ordering Direction . . . . . . . . . . . . . . . . . . . 7 66 3.4. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 7 67 3.5. Naming Guidelines . . . . . . . . . . . . . . . . . . . . 7 68 4. Collation Specification Requirements . . . . . . . . . . . . . 8 69 4.1. Collation/Server Interface . . . . . . . . . . . . . . . 8 70 4.2. Operations Supported . . . . . . . . . . . . . . . . . . 8 71 4.2.1. Validity . . . . . . . . . . . . . . . . . . . . . . . 9 72 4.2.2. Equality . . . . . . . . . . . . . . . . . . . . . . . 9 73 4.2.3. Substring . . . . . . . . . . . . . . . . . . . . . . 9 74 4.2.4. Ordering . . . . . . . . . . . . . . . . . . . . . . . 10 75 4.3. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 10 76 4.4. Use of Lookup Tables . . . . . . . . . . . . . . . . . . 11 77 5. Application Protocol Requirements . . . . . . . . . . . . . . 11 78 5.1. Character Encoding . . . . . . . . . . . . . . . . . . . 11 79 5.2. Operations . . . . . . . . . . . . . . . . . . . . . . . 11 80 5.3. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 12 81 5.4. String Comparison . . . . . . . . . . . . . . . . . . . . 12 82 5.5. Disconnected Clients . . . . . . . . . . . . . . . . . . 12 83 5.6. Error Codes . . . . . . . . . . . . . . . . . . . . . . . 13 84 5.7. Octet Collation . . . . . . . . . . . . . . . . . . . . . 13 85 6. Use by Existing Protocols . . . . . . . . . . . . . . . . . . 13 86 7. Collation Registration . . . . . . . . . . . . . . . . . . . . 14 87 7.1. Collation Registration Procedure . . . . . . . . . . . . 14 88 7.2. Collation Registration Format . . . . . . . . . . . . . . 14 89 7.2.1. Registration Template . . . . . . . . . . . . . . . . 15 90 7.2.2. The collation Element . . . . . . . . . . . . . . . . 15 91 7.2.3. The identifier Element . . . . . . . . . . . . . . . . 16 92 7.2.4. The title Element . . . . . . . . . . . . . . . . . . 16 93 7.2.5. The operations Element . . . . . . . . . . . . . . . . 16 94 7.2.6. The specification Element . . . . . . . . . . . . . . 16 95 7.2.7. The submitter Element . . . . . . . . . . . . . . . . 16 96 7.2.8. The owner Element . . . . . . . . . . . . . . . . . . 16 97 7.2.9. The version Element . . . . . . . . . . . . . . . . . 16 98 7.2.10. The variable Element . . . . . . . . . . . . . . . . . 17 99 7.3. Structure of Collation Registry . . . . . . . . . . . . . 17 100 7.4. Example Initial Registry Summary . . . . . . . . . . . . 18 101 8. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18 102 9. Initial Collations . . . . . . . . . . . . . . . . . . . . . . 19 103 9.1. ASCII Numeric Collation . . . . . . . . . . . . . . . . . 19 104 9.1.1. ASCII Numeric Collation Description . . . . . . . . . 19 105 9.1.2. ASCII Numeric Collation Registration . . . . . . . . . 20 106 9.2. ASCII Casemap Collation . . . . . . . . . . . . . . . . . 20 107 9.2.1. ASCII Casemap Collation Description . . . . . . . . . 20 108 9.2.2. ASCII Casemap Collation Registration . . . . . . . . . 21 109 9.3. Octet Collation . . . . . . . . . . . . . . . . . . . . . 21 110 9.3.1. Octet Collation Description . . . . . . . . . . . . . 21 111 9.3.2. Octet Collation Registration . . . . . . . . . . . . . 22 112 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 113 11. Security Considerations . . . . . . . . . . . . . . . . . . . 22 114 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 115 13. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 23 116 14. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 23 117 14.1. Changes From -13 . . . . . . . . . . . . . . . . . . . . 23 118 14.2. Changes From -12 . . . . . . . . . . . . . . . . . . . . 23 119 14.3. Changes From -11 . . . . . . . . . . . . . . . . . . . . 23 120 14.4. Changes From -10 . . . . . . . . . . . . . . . . . . . . 24 121 14.5. Changes From -09 . . . . . . . . . . . . . . . . . . . . 24 122 14.6. Changes From -08 . . . . . . . . . . . . . . . . . . . . 25 123 14.7. Changes From -06 . . . . . . . . . . . . . . . . . . . . 25 124 14.8. Changes From -05 . . . . . . . . . . . . . . . . . . . . 26 125 14.9. Changes From -04 . . . . . . . . . . . . . . . . . . . . 26 126 14.10. Changes From -03 . . . . . . . . . . . . . . . . . . . . 26 127 14.11. Changes From -02 . . . . . . . . . . . . . . . . . . . . 26 128 14.12. Changes From -01 . . . . . . . . . . . . . . . . . . . . 27 129 14.13. Changes From -00 . . . . . . . . . . . . . . . . . . . . 27 130 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 131 15.1. Normative References . . . . . . . . . . . . . . . . . . 27 132 15.2. Informative References . . . . . . . . . . . . . . . . . 28 133 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 134 Intellectual Property and Copyright Statements . . . . . . . . . . 31 136 1. Introduction 138 The ACAP [11] specification introduced the concept of a comparator 139 (which we call collation in this document), but failed to create an 140 IANA registry. With the introduction of stringprep [6] and the 141 Unicode Collation Algorithm [7], it is now time to create that 142 registry and populate it with some initial values appropriate for an 143 international community. This specification replaces and generalizes 144 the definition of a comparator in ACAP and creates a collation 145 registry. 147 1.1. Conventions Used in this Document 149 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" 150 in this document are to be interpreted as defined in "Key words for 151 use in RFCs to Indicate Requirement Levels" [1]. 153 The attribute syntax specifications use the Augmented Backus-Naur 154 Form (ABNF) [2] notation including the core rules defined in Appendix 155 A. This also inherits ABNF rules from Language Tags [5]. 157 2. Collation Definition and Purpose 159 2.1. Definition 161 A collation is a named function which takes two arbitrary length 162 strings as input and can be used to perform one or more of three 163 basic comparison operations: equality test, substring match, and 164 ordering test. 166 2.2. Purpose 168 Collations are an abstraction for comparison functions so that these 169 comparison functions can be used in multiple protocols. The details 170 of a particular comparison operation can be specified by someone with 171 appropriate expertise independent of the application protocols that 172 use that collation. This is similar to the way a charset [13] 173 separates the details of octet to character mapping from a protocol 174 specification such as MIME [9] or the way SASL [10] separates the 175 details of an authentication mechanism from a protocol specification 176 such as ACAP [11]. 178 Here is a small diagram to help illustrate the value of this 179 abstraction: 181 +-------------------+ +-----------------+ 182 | IMAP i18n SEARCH |--+ | Basic | 183 +-------------------+ | +--| Collation Spec | 184 | | +-----------------+ 185 +-------------------+ | +-------------+ | +-----------------+ 186 | ACAP i18n SEARCH |--+--| Collation |--+--| A stringprep | 187 +-------------------+ | | Registry | | | Collation Spec | 188 | +-------------+ | +-----------------+ 189 +-------------------+ | | +-----------------+ 190 | ...other protocol |--+ | | locale-specific | 191 +-------------------+ +--| Collation Spec | 192 +-----------------+ 194 Thus IMAP, ACAP and future application protocols with international 195 search capability simply specify how to interface to the collation 196 registry instead of each protocol specification having to specify all 197 the collations it supports. 199 2.3. Some Other Terms Used in this Document 201 The terms client, server and protocol are used in somewhat unusual 202 senses. 204 Client means a user, or a program acting directly on behalf of a 205 user. This may be an mail reader acting as an IMAP client, or it may 206 be an interactive shell where the user can type protocol commands/ 207 requests directly, or it may be a script or program written by the 208 user. 210 Server means a program that performs services requested by the 211 client. This may be a traditional server such as an HTTP server, or 212 it may be a Sieve [14] interpreter running a Sieve script written by 213 a user. A server needs to use the operations provided by collations 214 in order to fulfill the client's requests. 216 The protocol describes how the client tells the server what it wants 217 done, and (if applicable) how the server tells the client about the 218 results. IMAP is a protocol by this definition, and so is the Sieve 219 language. 221 2.4. Sort Keys 223 One component of a collation is a transformation which turns a string 224 into a sort key, which is then used while sorting. 226 The transformation can range from an identity mapping (e.g., the 227 i;octet collation Section 9.3) to a mapping which makes the string 228 unreadable to a human. 230 This is an implementation detail of collations or servers. A 231 protocol SHOULD NOT expose it to clients, since some collations leave 232 the sort key's format up to the implementation, and current 233 conformant implementations are known to use different formats. 235 3. Collation Identifier Syntax 237 3.1. Basic Syntax 239 The collation identifier itself is a single US-ASCII string. The 240 identifier MUST NOT be longer than 254 characters, and obeys the 241 following grammar: 243 collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "." 245 collation-id = collation-prefix ";" collation-core-name 246 *collation-arg 248 collation-scope = Language-tag / "vnd-" hostname 250 collation-core-name = ALPHA *( ALPHA / DIGIT / "-" ) 252 collation-arg = ";" ALPHA *( ALPHA / DIGIT ) "=" 253 1*( ALPHA / DIGIT / "." ) 255 vendor-tag = "vnd-" hostname 257 There is a special identifier called "default". For protocols which 258 have a default collation, "default" refers to that collation. For 259 other protocols, the identifier "default" MUST match no collations, 260 and servers SHOULD treat it in the same way as they treat nonexistent 261 collations. 263 3.2. Wildcards 265 The string a client uses to select a collation MAY contain one or 266 more wildcard ("*") characters which match zero or more collation- 267 chars. Wildcard characters MUST NOT be adjacent. If the wildcard 268 string matches multiple collations, the server SHOULD attempt to 269 select a widely useful collation in preference to a narrowly useful 270 one. 272 collation-wild = ("*" / (ALPHA ["*"])) *(collation-char ["*"]) 273 ; MUST NOT exceed 254 characters total 275 3.3. Ordering Direction 277 When used as a protocol element for ordering, the collation 278 identifier MAY be prefixed by either "+" or "-" to explicitly specify 279 an ordering direction. "+" has no effect on the ordering operation, 280 while "-" inverts the result of the ordering operation. In general, 281 collation-order is used when a client requests a collation, and 282 collation-selected is used when the server informs the client of the 283 selected collation. 285 collation-selected = ["+" / "-"] collation-id 287 collation-order = ["+" / "-"] collation-wild 289 3.4. URIs 291 Some protocols are designed to use URIs [4] to refer to collations 292 rather than simple tokens. A special section of the IANA URL space 293 is reserved for such usage. The "collation-uri" form is used to 294 refer to a specific named collation (the collation registration may 295 not actually be present). The "collation-auri" form is an abstract 296 name for an ordering, a collation pattern or a vendor private 297 collator. 299 collation-uri = "http://www.iana.org/assignments/collation/" 300 collation-id ".xml" 302 collation-auri = ( "http://www.iana.org/assignments/collation/" 303 collation-order ".xml" ) / other-uri 305 other-uri = 306 ; excluding the IANA collation namespace. 308 3.5. Naming Guidelines 310 While this specification makes no absolute requirements on the 311 structure of collation identifiers, naming consistency is important, 312 so the following initial guidelines are provided. 314 Collation identifiers with an international audience typically begin 315 with "i;". Collation identifiers intended for a particular language 316 or locale typically begin with a language tag [5] followed by a ";". 317 After the first ";" is normally the name of the general collation 318 algorithm, followed by a series of algorithm modifications separated 319 by the ";" delimiter. Parameterized modifications will use "=" to 320 delimit the parameter from the value. The version numbers of any 321 lookup tables used by the algorithm SHOULD be present as 322 parameterized modifications. 324 Collation identifiers of the form *;vnd-domain.com;* are reserved for 325 vendor-specific collations created by the owner of the domain name 326 following the "vnd-" prefix (e.g. vnd-example.com for the vendor 327 example.com). Registration of such collations (or the name space as 328 a whole) with intended use of "Vendor" is encouraged when a public 329 specification or open-source implementation is available, but is not 330 required. 332 4. Collation Specification Requirements 334 4.1. Collation/Server Interface 336 The collation itself defines what it operates on. Most collations 337 are expected to operate on character strings. The i;octet 338 (Section 9.3) collation operates on octet strings. The i;ascii- 339 numeric (Section 9.1) operation operates on numbers. 341 This specification defines the collation interface in terms of octet 342 strings. However, implementations may choose to use character 343 strings instead. Such implementations may not be able to implement 344 e.g. i;octet. Since i;octet is not currently mandatory to implement 345 for any protocol, this should not be a problem. 347 4.2. Operations Supported 349 A collation specification MUST state which of the three basic 350 operations are supported (equality, substring, ordering) and how to 351 perform each of the supported operations on any two input character 352 strings including empty strings. Collations must be deterministic, 353 i.e. given a collation with a specific identifier, and any two fixed 354 input strings, the result MUST be the same for the same operation. 356 In general, collation operations should behave as their names 357 suggest. While a collation may be new, the operations are not, so 358 the new collation's operations should be similar to those of older 359 collations. For example, a date/time collation should not provide a 360 "substring" operation that would morph IMAP substring SEARCH into 361 e.g. a date-range search. 363 A non-obvious consequence of the rules for each collation operation 364 is that for any single collation, either none or all of the 365 operations can return "undefined". For example, it is not possible 366 to have an equality operation that never returns "undefined" and a 367 substring operation that occasionally does. 369 4.2.1. Validity 371 The validity test takes one string as argument. It returns valid if 372 its input string is valid input to collation's other operations, and 373 invalid if not. (In other words, a string is valid if it is equal to 374 itself according to the collation's equality operation.) 376 The validity test is provided by all collations. It MUST NOT be 377 listed separately in the collation registration. 379 4.2.2. Equality 381 The equality test always returns "match" or "no-match" when supplied 382 valid input, and MAY return "undefined" if one or both input strings 383 are not valid. 385 The equality test MUST be reflexive and symmetric. For valid input, 386 it MUST be transitive. 388 If a collation provides either a substring or an ordering test, it 389 MUST also provide an equality test. The substring and/or ordering 390 tests MUST be consistent with the equality test. 392 The return values of the equality test are called "match", "no-match" 393 and "undefined" in this document. 395 4.2.3. Substring 397 The substring matching operation determines if the first string is a 398 substring of the second string, i.e. if one or more substrings of the 399 second string is equal to the first, as defined by the collation's 400 equality operation. 402 A collation which supports substring matching will automatically 403 support two special cases of substring matching: prefix and suffix 404 matching if those special cases are supported by the application 405 protocol. It returns "match" or "no-match" when supplied valid input 406 and returns "undefined" when supplied invalid input. 408 Application protocols MAY return position information for substring 409 matches. If this is done, the position information SHOULD include 410 both the starting offset and the ending offset for each match. This 411 is important because more sophisticated collations can match strings 412 of unequal length (for example, a pre-composed accented character can 413 match a decomposed accented character). In general, overlapping 414 matches SHOULD be reported (as when "ana" occurs twice within 415 "banana") although there are cases where a collation may decide not 416 to. For example, in a collation which treats all whitespace 417 sequences as identical, the substring operation could be defined such 418 that " 1 " (SP "1" SP) is reported just once within " 1 " (SP SP 419 "1" SP SP), not four times (SP SP 1 SP, SP 1 SP, SP 1 SP SP and SP SP 420 1 SP SP), since the four matches are in a sense the same match. 422 A string is a substring of itself. The empty string is a substring 423 of all strings. 425 Note that the substring operation of some collations can match 426 strings of unequal length. For example, a pre-composed accented 427 character can match a decomposed accented character. The Unicode 428 Collation Algorithm [7] discusses this in more detail. 430 The return values of the substring operation are called "match", "no- 431 match" and "undefined" in this document. 433 4.2.4. Ordering 435 The ordering operation determines how two strings are ordered. It 436 MUST be reflexive. For valid input, it MUST be transitive and 437 trichotomous. 439 Ordering returns "less" if the first string is listed before the 440 second string according to the collation, "greater" if the second 441 string is listed before the first string, and "equal" if the two 442 strings are equal as defined by the collation's equality operation. 443 If one or both strings are invalid, the result of ordering is 444 "undefined". 446 When the collation is used with a "+" prefix, the behavior is the 447 same as when used with no prefix. When the collation is used with a 448 "-" prefix, the result of the ordering operation of the collation 449 MUST be reversed. 451 The return values of the ordering operation are called "less", 452 "equal", "greater" and "undefined" in this document. 454 4.3. Sort Keys 456 A collation specification SHOULD describe the internal transformation 457 algorithm to generate sort keys. This algorithm can be applied to 458 individual strings and the result can be stored to potentially 459 optimize future comparison operations. A collation MAY specify that 460 the sort key is generated by the identity function. The sort key may 461 have no meaning to a human. The sort key may not be valid input to 462 the collation. 464 4.4. Use of Lookup Tables 466 Some collations use customizable lookup tables, e.g. because the 467 tables depend on locale and may be modified after shipping the 468 software. Collations which use more than one customizable lookup 469 table in a documented format MUST assign numbers to the tables they 470 use. This permits an application protocol command to access the 471 tables used by a server collation, so that clients and servers use 472 the same tables. 474 5. Application Protocol Requirements 476 This section describes the requirements and issues that an 477 application protocol needs to consider if it offers searching, 478 substring matching and/or sorting, and permits the use of characters 479 outside the US-ASCII charset. 481 5.1. Character Encoding 483 The protocol specification has to make sure that it is clear on which 484 characters (rather than just octets) the collations are used. This 485 can be done by specifying the protocol itself in terms of characters 486 (e.g. in the case of a query language), by specifying a single 487 character encoding for the protocol (e.g. UTF-8 [3]), or by 488 carefully describing the relevant issues of character encoding 489 labeling and conversion. In the later case, details to consider 490 include how to handle unknown charsets, any charsets which are 491 mandatory-to-implement, any issues with byte-order that might apply, 492 and any transfer encodings which need to be supported. 494 5.2. Operations 496 The protocol must specify which of the operations defined in this 497 specification (equality matching, substring matching and ordering) 498 can be invoked in the protocol, and how they are invoked. There may 499 be more than one way to invoke an operation. 501 The protocol MUST provide a mechanism for the client to select the 502 collation to use with equality matching, substring matching and 503 ordering. 505 If a protocol needs a total ordering and the collation chosen does 506 not provide it because the ordering operation returns "undefined" at 507 least once, the recommended fallback is to sort all invalid strings 508 after the valid ones, and use i;octet to order the invalid strings. 510 Although the collation's substring function provides a list of 511 matches, a protocol need not provide all that to the client. It may 512 provide only the first matching substring, or even just the 513 information that the substring search matched. In this way, 514 collations can be used with protocols that are defined such that |x 515 is a substring of y" returns true-false. 517 If the protocol provides positional information for the results of a 518 substring match, that positional information SHOULD fully specify the 519 substring(s) in the result that matches independent of the length of 520 the search string. For example, returning both the starting and 521 ending offset of the match would suffice, as would the starting 522 offset and a length. Returning just the starting offset is not 523 acceptable. This rule is necessary because advanced collations can 524 treat strings of different lengths as equal (for example, pre- 525 composed and decomposed accented characters). 527 5.3. Wildcards 529 The protocol MUST specify whether it allows the use of wildcards in 530 collation identifiers or not. If the protocol allows wildcards, 531 then: 532 The protocol MUST specify how comparisons behave in the absence of 533 explicit collation negotiation or when a collation of "default" is 534 requested. The protocol MAY specify that the default collation 535 used in such circumstances is sensitive to server configuration. 536 The protocol SHOULD provide a way to list available collations 537 matching a given wildcard pattern or patterns. 539 5.4. String Comparison 541 If a protocol compares strings in any nontrivial way, using a 542 collation may be appropriate. As an example, many protocols use 543 case-independent strings. In many cases, a simple ASCII mapping to 544 upper/lower case works well. In other cases, it may be better to use 545 a specifiable collation, for example so that a server can treat "i" 546 and "I" as equivalent in Italy and different in Turkey (Turkey also 547 has dotted upper-case I and dotless lower-case i). 549 Protocol designers should consider in each case whether to use a 550 specifiable collation. Keywords often have other needs than user 551 variables, and search arguments may be different again. 553 5.5. Disconnected Clients 555 If the protocol supports disconnected clients and a collation is used 556 which can use configurable tables (e.g. to support locale-specific 557 extensions), then the client may not be able to reproduce the 558 server's collation operations while offline. 560 A mechanism to download such tables has been discussed. Such a 561 mechanism is not included in the present specification, since the 562 problem is not yet well understood. 564 5.6. Error Codes 566 The protocol specification should consider assigning protocol error 567 codes for the following circumstances: 568 o The client requests the use of a collation by identifier or 569 pattern, but no implemented collation matches that pattern. 570 o The client attempts to use a collation for an operation that is 571 not supported by that collation. For example, attempting to use 572 the "i;ascii-numeric" collation for substring matching. 573 o The client uses an equality or substring matching collation and 574 the result is an error. It may be appropriate to distinguish 575 between the two input strings, particularly when one is supplied 576 by the client and one is stored by the server. It might also be 577 appropriate to distinguish the specific case of an invalid UTF-8 578 string. 580 5.7. Octet Collation 582 The i;octet (Section 9.3) collation is only usable with protocols 583 based on octet-strings. Clients and servers MUST NOT use i;octet 584 with other protocols. 586 If the protocol permits the use of collations with data structures 587 other than strings, the protocol MUST describe the default behavior 588 for a collation with those data structures. 590 6. Use by Existing Protocols 592 Both ACAP [11] and Sieve [14] are standards track specifications 593 which used collations prior to the creation of this specification and 594 registry. Those standards do not meet all the application protocol 595 requirements described in Section 5. 597 These protocols allow the use of the i;octet (Section 9.3) collation 598 working directly on UTF-8 data as used in these protocols. 600 In Sieve, all matches are either true and false. Accordingly, Sieve 601 servers must treat "undefined" and "no-match" results of the equality 602 and substring operations as false, and only "match" as true. 604 In ACAP and Sieve, there are no invalid strings. In this document's 605 terms, invalid strings sort after valid strings. 607 IMAP [15] also collates, although that is explicit only when the 608 COMPARATOR [17] extension is used. The built-in IMAP substring 609 operation and the ordering provided by the SORT [16] extension may 610 not meet the requirements made in this document. 612 Other protocols may be in a similar position. 614 In IMAP, the default collation is i;ascii-casemap, because its 615 operations are understood to match's IMAP's built-in operations. 617 7. Collation Registration 619 7.1. Collation Registration Procedure 621 The IETF will create a mailing list, collation@ietf.org, which can be 622 used for public discussion of collation proposals prior to 623 registration. Use of the mailing list is strongly encouraged. The 624 IESG will appoint a designated expert who will monitor the 625 collation@ietf.org mailing list and review registrations. 627 The registration procedure begins when a completed registration 628 template is sent to iana@iana.org and collation@ietf.org. The 629 designated expert is expected to tell IANA and the submitter of the 630 registration within two weeks whether the registration is approved, 631 approved with minor changes, or rejected with cause. When a 632 registration is rejected with cause, it can be re-submitted if the 633 concerns listed in the cause are addressed. Decisions made by the 634 designated expert can be appealed to IESG Applications Area Director, 635 then to the IESG. They follow the normal appeals procedure for IESG 636 decisions. 638 Collation registrations in a standards track, BCP or IESG-approved 639 experimental RFC are owned by the IETF, and changes to the 640 registration follow normal procedures for updating such documents. 641 Collation registrations in other RFCs are owned by the RFC author(s). 642 Other collation registrations are owned by the individual(s) listed 643 in the contact field of the registration and IANA will preserve this 644 information. 646 If the registration is a change of an existing collation, it MUST be 647 approved by the owner. In the event the owner cannot be contacted 648 for a period of one month and the designated expert deems the change 649 necessary, the IESG MAY re-assign ownership to an appropriate party. 651 7.2. Collation Registration Format 653 Registration of a collation is done by sending a well-formed XML 654 document to collation@ietf.org and iana@iana.org. 656 7.2.1. Registration Template 658 Here is a template for the registration: 660 661 662 663 collation identifier 664 technical title for collation 665 equality order substring 666 specification reference 667 email address of owner or IETF 668 email address of submitter 669 1 670 672 7.2.2. The collation Element 674 The root of the registration document MUST be a element. 675 The collation element contains the other elements in the 676 registration, which are described in the following sub-subsections, 677 in the order given here. 679 The element MAY include an "rfc=" attribute if the 680 specification is in an RFC. The "rfc=" attribute gives only the 681 number of the RFC, without any prefix, such as "RFC", or suffix, such 682 as ".txt". 684 The element MUST include a "scope=" attribute, which MUST 685 have one of the values "global", "local" or "other". 687 The element MUST include an "intendedUse=" attribute, 688 which must have one of the values "common", "limited", "vendor", or 689 "deprecated". Collation specifications intended for "common" use are 690 expected to reference standards from standards bodies with 691 significant experience dealing with the details of international 692 character sets. 694 Be aware that future revisions of this specification may add 695 additional function types, as well as additional XML attributes, 696 values and elements. Any system which automatically parses these XML 697 documents MUST take this into account to preserve future 698 compatibility. 700 7.2.3. The identifier Element 702 The element gives the precise identifier of the 703 collation, e.g. i;ascii-casemap. The element is 704 mandatory. 706 7.2.4. The title Element 708 The element gives the title of the collation. The <title> 709 element is mandatory. 711 7.2.5. The operations Element 713 The <operations> element lists which of the three operations 714 ("equality", "order" or "substring") the collation provides, 715 separated by single spaces. The <operations> element is mandatory. 717 7.2.6. The specification Element 719 The <specification> element describes where to find the 720 specification. The <specification> element is mandatory. It MAY 721 have a URI attribute. There may be more than one <specification> 722 elements, in which case they together form the specification. 724 If it is discovered that parts of a collation specification conflict, 725 a new revision of the collation is necessary, and the 726 collation@ietf.org mailing list should be notified. 728 7.2.7. The submitter Element 730 The <submitter> element provides an RFC 2822 [12] email address for 731 the person who submitted the registration. It is optional if the 732 <owner> element contains an email address. 734 There may be more than one <submitter> element. 736 7.2.8. The owner Element 738 The <owner> element contains either the four letters "IETF" or an 739 email address of the owner of the registration. The <owner> element 740 is mandatory. There may be more than one <owner> element. If so, 741 all owners are equal. Each owner can speak for all. 743 7.2.9. The version Element 745 The <version> element MUST be included when the registration is 746 likely to be revised or has been revised in such a way that the 747 results change for certain input strings. The <version> element is 748 optional. 750 7.2.10. The variable Element 752 The <variable> element specifies an optional variable using which the 753 collation's behaviour can be tailored. The <variable> element is 754 optional. When it is used, it must contain <name> and <default> 755 elements and may contain one or more <value> elements. 757 7.2.10.1. The name Element 759 The <name> element specifies the name value of a variable. The 760 <name> element is mandatory. 762 7.2.10.2. The default Element 764 The <default> element specifies the default value of a variable. The 765 <default> element is mandatory. 767 7.2.10.3. The value Element 769 The <value> element specifies a legal value of a variable. The 770 <value> element is optional. If one or more <value> elements are 771 present, only those values are legal. If none is, then the 772 variable's legal values do not form an enumerated set, and the rules 773 MUST be specified in an RFC accompanying the registration. 775 7.3. Structure of Collation Registry 777 Once the registration is approved, IANA will store each XML 778 registration document in a URL of the form 779 http://www.iana.org/assignments/collation/collation-id.xml where 780 collation-id is the contents of the identifier element in the 781 registration. Both the submitter and the designated expert are 782 responsible for verifying that the XML is well-formed. The 783 registration document should avoid using new elements. If any are 784 necessary, it is important to be consistent with other registrations. 786 IANA will also maintain a text summary of the registry under the name 787 http://www.iana.org/assignments/collation/summary.txt. This summary 788 is divided into four sections. The first section is for collations 789 intended for common use. This section is intended for collation 790 registrations published in IESG approved RFCs or for locally scoped 791 collations from the primary standards body for that locale. The 792 designated expert is encouraged to reject collation registrations 793 with an intended use of "common" if the expert believes it should be 794 "limited", as it is desirable to keep the number of "common" 795 registrations small and high quality. The second section is reserved 796 for limited use collations. The third section is reserved for 797 registered vendor specific collations. The final section is reserved 798 for deprecated collations. 800 7.4. Example Initial Registry Summary 802 The following is an example of how IANA might structure the initial 803 registry summary.txt file: 805 Collation Functions Scope Reference 806 --------- --------- ----- --------- 807 Common Use Collations: 808 i;ascii-casemap e, o, s Local [RFC XXXX] 810 Limited Use Collations: 811 i;octet e, o, s Other [RFC XXXX] 812 i;ascii-numeric e, o Other [RFC XXXX] 814 Vendor Collations: 816 Deprecated Collations: 818 References 819 ---------- 820 [RFC XXXX] Newman, C., Duerst, M., Gulbrandsen, A., "Internet 821 Application Protocol Collation Registry", RFC XXXX, 822 Sun Microsystems, October 2013. 824 8. Guidelines for Expert Reviewer 826 The expert reviewer appointed by the IESG has fairly broad latitude 827 for this registry. While a number of collations are expected 828 (particularly customizations of the UCA for localized use), an 829 explosion of collations (particularly common use collations) is not 830 desirable for widespread interoperability. However, it is important 831 for the expert reviewer to provide cause when rejecting a 832 registration, and when possible to describe corrective action to 833 permit the registration to proceed. The following table includes 834 some example reasons to reject a registration with cause: 835 o The registration is not a well-formed XML document. 836 o The registration has an intended use of "common", but there is no 837 evidence the collation will be widely deployed, so it should be 838 listed as "limited". 839 o The registration has an intended use of "common", but it is 840 redundant with the functionality of a previously registered 841 "common" collation. 843 o The registration has an intended use of "common", but the 844 specification is not detailed enough to allow interoperable 845 implementations by others. 846 o The collation identifier fails to precisely identify the version 847 numbers of relevant tables to use. 848 o The registration fails to meet one of the "MUST" requirements in 849 Section 4. 850 o The collation identifier fails to meet the syntax in Section 3. 851 o The collation specification referenced in the registration is 852 vague or has optional features without a clear behavior specified. 853 o The referenced specification does not adequately address security 854 considerations specific to that collation. 855 o The registration's operations are needlessly different from those 856 of traditional operations. 857 o The registration's XML is needlessly different from that of 858 already registered collations. 860 9. Initial Collations 862 This section registers the three collations that were originally 863 defined in [RFC2244] and are implemented in most [SIEVE] engines. 864 Some of the behaviour of these collations is perhaps not ideal, such 865 as i;ascii-casemap accepting non-ASCII input. Compatibility with 866 widely deployed code was judged more important than Some of the 867 perhaps surprising aspects of these collations are necessary to 868 maintain compatibility with widely deployed code. 870 9.1. ASCII Numeric Collation 872 9.1.1. ASCII Numeric Collation Description 874 The "i;ascii-numeric" collation is a simple collation intended for 875 use with arbitrary sized unsigned decimal integer numbers stored as 876 octet strings. US-ASCII digits (0x30 to 0x39) represent digits of 877 the numbers. Before converting from string to integer, the input 878 string is truncated at the first non-digit character. All input is 879 valid; strings which do not start with a digit represent positive 880 infinity. 882 The collation supports equality and ordering, but does not support 883 the substring operation. 885 The equality operation returns "match" if the two strings represent 886 the same number (i.e. leading zeroes and trailing non-digits are 887 disregarded) and "no-match" if the two strings represent different 888 numbers. 890 The ordering operation returns "less" if the first string represents 891 a smaller number than the second, "equal" if they represent the same 892 number, and "greater" if the first string represents a larger number 893 than the second. 895 Some examples: "0" is less than "1", and "1" is less than 896 "4294967298". "4294967298", "04294967298" and "4294967298b" are all 897 equal. "04294967298" is less than "". "", "x" and "y" are equal. 899 9.1.2. ASCII Numeric Collation Registration 901 <?xml version='1.0'?> 902 <!DOCTYPE collation SYSTEM 'collationreg.dtd'> 903 <collation rfc="XXXX" scope="other" intendedUse="limited"> 904 <identifier>i;ascii-numeric</identifier> 905 <title>ASCII Numeric 906 equality order 907 RFC XXXX 908 IETF 909 chris.newman@sun.com 910 912 9.2. ASCII Casemap Collation 914 9.2.1. ASCII Casemap Collation Description 916 The "i;ascii-casemap" collation is a simple collation which operates 917 on octet strings and treats US-ASCII letters case-insensitively. It 918 provides equality, substring and ordering operations. All input is 919 valid. Note that letters outside ASCII are not treated case- 920 insensitively. 922 Its equality, ordering and substring operations are as for i;octet, 923 except that first, the lower-case letters (octet values 97-122) in 924 each input string are changed to upper case (octet values 65-90). 926 Care should be taken when using OS-supplied functions to implement 927 this collation as it is not locale sensitive. Functions such as 928 strcasecmp and toupper are sometimes locale sensitive and may 929 inappropriately map lower-case letters other than a-z to upper case. 931 The i;ascii-casemap collation is well suited to to use with many 932 Internet protocols and computer languages. Use with natural language 933 is often inappropriate: even though the collation apparently supports 934 languages such as Swahili and English, in real-world use it tends to 935 mis-sort a number of types of string: 937 o people and place names containing non-ASCII, 938 o words such as "naive" (if spelled with an accent, the accented 939 character could push the word to the wrong spot in a sorted list), 940 o names such as "Lloyd" (which in Welsh sorts after "Lyon", unlike 941 in English), 942 o strings containing euro and pound sterling symbols, quotation 943 marks other than '"', dashes/hyphens, etc. 945 9.2.2. ASCII Casemap Collation Registration 947 948 949 950 i;ascii-casemap 951 ASCII Casemap 952 equality order substring 953 RFC XXXX 954 IETF 955 chris.newman@sun.com 956 958 9.3. Octet Collation 960 9.3.1. Octet Collation Description 962 The "i;octet" collation is a simple and fast collation intended for 963 use on binary octet strings rather than on character data. Protocols 964 that want to make this collation available have to do so by 965 explicitly allowing it. If not explicitly allowed, it MUST NOT be 966 used. It never returns an "undefined" result. It provides equality, 967 substring and ordering operations. 969 The ordering algorithm is as follows: 970 1. If both strings are the empty string, return the result "equal". 971 2. If the first string is empty and the second is not, return the 972 result "less". 973 3. If the second string is empty and the first is not, return the 974 result "greater". 975 4. If both strings begin with the same octet value, remove the first 976 octet from both strings and repeat this algorithm from step 1. 977 5. If the unsigned value (0 to 255) of the first octet of the first 978 string is less than the unsigned value of the first octet of the 979 second string, then return "less". 980 6. If this step is reached, return "greater". 982 This algorithm is roughly equivalent to the C library function memcmp 983 with appropriate length checks added. 985 The matching operation returns "match" if the sorting algorithm would 986 return "equal". Otherwise the matching operation returns "no-match". 988 The substring operation returns "match" if the first string is the 989 empty string, or if there exists a substring of the second string of 990 length equal to the length of the first string which would result in 991 a "match" result from the equality function. Otherwise the substring 992 operation returns "no-match". 994 9.3.2. Octet Collation Registration 996 This collation is defined with intendedUse="limited" because it can 997 only be used by protocols that explicitly allow it. 999 1000 1001 1002 i;octet 1003 Octet 1004 equality order substring 1005 RFC XXXX 1006 IETF 1007 chris.newman@sun.com 1008 1010 10. IANA Considerations 1012 Section 7 defines how to register collations with IANA. Section 9 1013 defines a list of predefined collations, which should be registered 1014 when this document is approved and published as an RFC. 1016 11. Security Considerations 1018 Collations will normally be used with UTF-8 strings. Thus the 1019 security considerations for UTF-8 [3], stringprep [6] and Unicode 1020 TR-36 [8] also apply and are normative to this specification. 1022 12. Acknowledgements 1024 The authors want to thank all who have contributed to this document, 1025 including at least Brian Carpenter, John Cowan, Dave Cridland, Mark 1026 Davis, Spencer Dawkins, Lisa Dusseault, Lars Eggert, Frank Ellermann, 1027 Philip Guenther, Tony Hansen, Ted Hardie, Sam Hartman, Kjetil Torgrim 1028 Homme, Michael Kay, John Klensin, Alexey Melnikov, Jim Melton and 1029 Abhijit Menon-Sen. 1031 13. Open Issues 1033 Dear RFC Editor, please do the following: 1034 1. Move the parenthetical request after Martin Duerst's name to be a 1035 separate paragrah between Martin's URI and Arnt's name. 1036 2. Remove section 13 (Open Issues) and section 14 (Change Log). 1038 14. Change Log 1040 14.1. Changes From -13 1041 1. Simpler language in the text describing how to select a 1042 collation baed on a wildcard. 1043 2. Trichotomy is only required for valid input to the ordering 1044 operation. 1045 3. Make it clear that registering a new version of a collation 1046 counts as a registration, with the same procedure. Add a MUST 1047 for the version element in that case. 1048 4. Attended to nits and stuff from Lars Eggert 1049 5. Simpler language wrt. the names of return values. 1050 6. Talk about why protocols don't have to return all the 1051 information substring returns. 1052 7. Use bullet points rather than rambling text about i;ascii- 1053 casemap and natural language. 1054 8. Reworded sections 5.4 and 5.5 after discussion with Sam Hartman. 1055 5.4 could mislead into thinking that the server should use the 1056 sort key. 5.5 was just plain uninformative text once the rest of 1057 table download had been removed. 1058 9. Removed i;nameprep for possible publication as a separate draft/ 1059 RFC. It was broken, and it's also out of this document's 1060 natural scope (define registry, populate with legacy values). 1061 10. Changed the grammar of collation names to match the textual 1062 description better. 1063 11. Refer to RFC 4646, not 3066. 1065 14.2. Changes From -12 1066 1. Remove i;basic, to publish it as a separate RFC. Many documents 1067 are held up by this document, and this document is only help up 1068 by i;basic. 1069 2. Get rid of all the typoes I could find. Added one. 1070 3. Specifically note that the "same" substring match need not always 1071 be returned in each of its guises. 1073 14.3. Changes From -11 1074 1. Remove the DTD. Permit well-considered extension of the XML. 1075 Enable the designated expert to block registrations due to 1076 inappropriate or overly aggressive extension. 1078 2. Rename collation names to collation identifiers. Having both 1079 names and titles wasn't good. 1080 3. Removed some open issues after trying to edit, and deciding that 1081 the existing text was good. 1082 4. Note that in Sieve, invalid strings sort after valid ones. 1083 5. Make i;ascii-numeric as in RFC2244. The task of this document is 1084 to establish the registry, not change existing collations. 1086 14.4. Changes From -10 1087 1. Updated contact details for Martin Duerst. 1088 2. Various textual improvements. 1089 3. The registration's file name now has a mandatory .xml extension. 1090 4. Removed binding MUST for Sieve; it's more appropriate to put that 1091 in 3028bis. 1092 5. Syntax fix in registration example. 1093 6. When there are multiple specifications, they now act in concert, 1094 so it's possible to have e.g. a main specification and multiple 1095 locale-specific supplements. It is not possible to name multiple 1096 locations for the same specification any more. That'll return as 1097 a comment feature. 1098 7. Hopefully clearer exposition of i;ascii-casemap. 1099 8. The ban on registering octet-based collations is lifted. One 1100 hopes that the collation mailing list will present a suitable 1101 threshold - not too high, not too low. 1102 9. The DTD is published where IE can see it while looking at the 1103 registrations. 1105 14.5. Changes From -09 1106 1. Rename "error" to "undefined", as suggested by Mark Davis. The 1107 new name makes for nicer prose IMO. 1108 2. 7b=7 according to i;ascii-numeric. ACAP/Sieve need it. 1109 3. Clarified that even though the collation specification returns a 1110 list of substrings, the protocol/server need not use all of that 1111 information. (As indeed IMAP SEARCH does not.) 1112 4. Registrations go directly to the collation list _and_ to the 1113 IANA, not to the IANA and from there forwarded to designated 1114 expert. 1115 5. Added an acknowledgements list and populated it with a quick grep 1116 from my mailbox and memory. Surely incomplete. 1117 6. Noted that in sieve, "no-match" and "undefined" must be treated 1118 in the same way by the engine. 1119 7. Finish the rename from canonical to sort key. 1120 8. Don't fall back to i;octet from any other collation. Return 1121 undefined instead. Note that protocols may fall back to i;octet 1122 to provide total ordering, if necessary. 1123 9. Call the things operations everywhere, not operators/operations. 1125 14.6. Changes From -08 1126 1. i;ascii-casemap instead of en;ascii-casemap. 1127 2. UCA v 14. Changing to "latest version of UCA" was suggested, 1128 but rejected since IETF standards reference stable 1129 specifications, and "latest" is a moving target. 1130 3. Removed all text on multi-valued attributes. Can be added once 1131 there is a concrete need for it, either in an update to this 1132 document or in the protocol that needs it. 1133 4. "Collations MUST specify the canonicalization". Well, the UCA 1134 doesn't, so I changed that to a MAY. 1135 5. Add some text explaining why one might want to download tables. 1136 6. Changed the remaining instances of "canonicalization" to talk 1137 about sort keys. Added a note that a collation's sort key need 1138 not be valid input to the same collation. 1139 7. Reserve the word "default" and use it to name a protocol's 1140 default collation, provided that protocol has a default 1141 collation. In earlier versions of the draft, "*" was used to 1142 name the default collation, but "*" also was implicitly defined 1143 as the most general collation available. 1144 8. Reinstate the different-length example of substring match. 1145 Explain what an overlapping match is, by the canonical example. 1146 9. Avoid the word "contain" when talking about substring matches. 1147 Fewer terms is better. 1148 10. Until -07, both a collation and equality/substring/sort was 1149 called functions. In -07, the trio was renamed as operations. 1150 Now, the DTD is updated to match. 1151 11. Appeals go to the Apps AD before the general AD, as suggested by 1152 Spencer Dawkins. 1154 14.7. Changes From -06 1155 1. Clarified equality and identity: equality is as defined by a 1156 collation, identity is stronger. 1157 2. Added reference to 1158 http://www.unicode.org/reports/tr10/#Searching. 1159 3. Don't describe sort keys as a canonical representation of the 1160 string. 1161 4. Permit disconnected clients to use wildcards. (A disconnected 1162 client has to resolve the wildcard itself, in the same way that a 1163 server would.) 1164 5. Change collation-wild to have the same length limit as collation. 1165 6. Change to use "less" instead of "-1", etc., and specify that it's 1166 just phrasing, not specification. 1167 7. Don't describe the equality, substring and ordering operations as 1168 functions. The definition of collation uses the word function 1169 about the collation itself. A function that has three functions? 1170 Something has to give. 1172 8. Strike a requirement that selecting '*' is the same as not 1173 selecting any collation. It restricted the protocol's default 1174 too much. Existing code wasn't listening. 1175 9. Left out the canonicalization/sort keys. 1177 14.8. Changes From -05 1178 1. Added definitions of client, server and protocol, and prose to 1179 specify that while the IANA registrations of collations are 1180 written in terms octet strings, implementations may do it 1181 differently. 1182 2. Changed the wording for ascii-numeric to treat the numbers as 1183 numbers, etc. 1184 3. Added explicit property requirements for the three functions, 1185 e.g. that equality be symmetric. Added requirements that the 1186 three functions be consistent, and that if any operations are 1187 present, equality must be (needed for consistency). 1188 4. Random editing, e.g. changing 'numbers' for ascii-numeric to 1189 'integer numbers'. 1190 5. Gave IMAP/SORT/COMPARATOR the same grandfather treatment as ACAP 1191 and SIEVE. 1193 14.9. Changes From -04 1195 Grammar and clarity changes only. One (weak) example added. No 1196 substantive changes. 1198 14.10. Changes From -03 1200 (This does not include all changes made.) 1201 1. Checked and resolved most issues marked 'check whether this is 1202 true' or similar. 1203 2. Resolved nameprep issue: No. 1204 3. Removed NULL for compatibility with existing collations (IMAP 1205 SORT, Sieve). 1206 4. There can be multiple owners and submitters. Say how. 1207 5. Added a requirement that common collations must now be 1208 interoperable. Insufficiently detailed specs cannot be "common". 1209 6. Added a guideline that the operations provided by new collations 1210 should be reminiscent of similar operations on existing 1211 collations. 1213 14.11. Changes From -02 1215 1. Changed from data being octet sequences (in UTF-8) to data being 1216 character sequences (with octet collation as an exception). 1217 2. Made XML format description much more structured. 1219 3. Changed to , because this spelling is much 1220 more common. 1221 4. Defined 'protocol' to include query languages. 1222 5. Reorganized document, in particular IANA considerations section 1223 (which newly is just a list of pointers). 1224 6. Added subsections, and a 'Structure of this Document' section. 1225 7. Updated references. 1226 8. Created a 'Change Log' chapter, with sections for each draft. 1227 9. Reduced 'Open issues' section, open issues are now maintained at 1228 http://www.w3.org/2004/08/ietf-collation. 1230 14.12. Changes From -01 1232 Add IANA comment to open issues. Otherwise this is just a re-publish 1233 to keep the document alive. 1235 14.13. Changes From -00 1237 1. Replaced the term comparator with collation. While comparator is 1238 somewhat more precise because these abstract functions are used 1239 for matching as well as ordering, collation is the term used by 1240 other parts of the industry. Thus I have changed the name to 1241 collation for consistency. 1242 2. Remove all modifiers to the basic collation except for the 1243 customization and the match rules. The other behavior 1244 modifications can be specified in a customization of the 1245 collation. 1246 3. Use ";" instead of "-" as delimiter between parameters to make 1247 names more URL-ish. 1248 4. Add URL form for comparator reference. 1249 5. Switched registration template to use XML document. 1250 6. Added a number of useful registration template elements related 1251 to the Unicode Collation Algorithm. 1252 7. Switched language from "custom" to "tailor" to match UCA language 1253 for tailoring of the collation algorithm. 1255 15. References 1257 15.1. Normative References 1259 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1260 Levels", BCP 14, RFC 2119, March 1997. 1262 [2] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1263 Specifications: ABNF", RFC 4234, October 2005. 1265 [3] Yergeau, F., "UTF-8, a transformation format of ISO 10646", 1266 STD 63, RFC 3629, November 2003. 1268 [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1269 Resource Identifier (URI): Generic Syntax", RFC 3986, 1270 January 2005. 1272 [5] Phillips, A. and M. Davis, "Tags for Identifying Languages", 1273 BCP 47, RFC 4646, September 2006. 1275 [6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized 1276 Strings ("stringprep")", RFC 3454, December 2002. 1278 [7] Davis, M. and K. Whistler, "Unicode Collation Algorithm version 1279 14", May 2005, 1280 . 1282 [8] Davis, M. and M. Suignard, "Unicode Security Considerations", 1283 February 2006, . 1285 15.2. Informative References 1287 [9] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1288 Extensions (MIME) Part One: Format of Internet Message Bodies", 1289 RFC 2045, November 1996. 1291 [10] Myers, J., "Simple Authentication and Security Layer (SASL)", 1292 RFC 2222, October 1997. 1294 [11] Newman, C. and J. Myers, "ACAP -- Application Configuration 1295 Access Protocol", RFC 2244, November 1997. 1297 [12] Resnick, P., "Internet Message Format", RFC 2822, April 2001. 1299 [13] Freed, N. and J. Postel, "IANA Charset Registration 1300 Procedures", BCP 19, RFC 2978, October 2000. 1302 [14] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028, 1303 January 2001. 1305 [15] Crispin, M., "Internet Message Access Protocol - Version 1306 4rev1", RFC 3501, March 2003. 1308 [16] Crispin, M. and K. Murchison, "Internet Message Access Protocol 1309 - Sort and Thread Extensions", draft-ietf-imapext-sort-17.txt 1310 (work in progress), May 2004. 1312 [17] Newman, C. and A. Gulbrandsen, "Internet Message Access 1313 Protocol Internationalization", draft-ietf-imapext-i18n-06.txt 1314 (work in progress), January 2006. 1316 Authors' Addresses 1318 Chris Newman 1319 Sun Microsystems 1320 1050 Lakes Drive 1321 West Covina, CA 91790 1322 US 1324 Email: chris.newman@sun.com 1326 Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever possible, for example as "Dürst" in XML and HTML.) 1327 Aoyama Gakuin University 1328 5-10-1 Fuchinobe 1329 Sagamihara, Kanagawa 229-8558 1330 Japan 1332 Phone: +81 42 759 6329 1333 Fax: +81 42 759 6495 1334 Email: mailto:duerst@it.aoyama.ac.jp 1335 URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ 1337 Arnt Gulbrandsen 1338 Oryx Mail Systems GmbH 1339 Schweppermannstr. 8 1340 Munich 81671 1341 Germany 1343 Fax: +49 89 4502 9758 1344 Email: mailto:arnt@oryx.com 1345 URI: http://www.oryx.com/arnt/ 1347 Intellectual Property Statement 1349 The IETF takes no position regarding the validity or scope of any 1350 Intellectual Property Rights or other rights that might be claimed to 1351 pertain to the implementation or use of the technology described in 1352 this document or the extent to which any license under such rights 1353 might or might not be available; nor does it represent that it has 1354 made any independent effort to identify any such rights. Information 1355 on the procedures with respect to rights in RFC documents can be 1356 found in BCP 78 and BCP 79. 1358 Copies of IPR disclosures made to the IETF Secretariat and any 1359 assurances of licenses to be made available, or the result of an 1360 attempt made to obtain a general license or permission for the use of 1361 such proprietary rights by implementers or users of this 1362 specification can be obtained from the IETF on-line IPR repository at 1363 http://www.ietf.org/ipr. 1365 The IETF invites any interested party to bring to its attention any 1366 copyrights, patents or patent applications, or other proprietary 1367 rights that may cover technology that may be required to implement 1368 this standard. Please address the information to the IETF at 1369 ietf-ipr@ietf.org. 1371 Disclaimer of Validity 1373 This document and the information contained herein are provided on an 1374 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1375 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1376 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1377 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1378 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1379 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1381 Copyright Statement 1383 Copyright (C) The Internet Society (2006). This document is subject 1384 to the rights, licenses and restrictions contained in BCP 78, and 1385 except as set forth therein, the authors retain all their rights. 1387 Acknowledgment 1389 Funding for the RFC Editor function is currently provided by the 1390 Internet Society.