idnits 2.17.1 draft-newman-i18n-comparator-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1310. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1287. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1294. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1300. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 52 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 722 has weird spacing: '...=accent e, o,...' == Line 723 has weird spacing: '...ch=case e, ...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 6, 2005) is 6862 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '4' is defined on line 1204, but no explicit reference was found in the text == Unused Reference: '12' is defined on line 1234, but no explicit reference was found in the text == Unused Reference: '13' is defined on line 1238, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'RFC XXXX' ** Obsolete normative reference: RFC 2234 (ref. '2') (Obsoleted by RFC 4234) -- Possible downref: Normative reference to a draft: ref. '4' ** Obsolete normative reference: RFC 3066 (ref. '5') (Obsoleted by RFC 4646, RFC 4647) ** Obsolete normative reference: RFC 3454 (ref. '6') (Obsoleted by RFC 7564) ** Obsolete normative reference: RFC 3491 (ref. '7') (Obsoleted by RFC 5891) -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Obsolete informational reference (is this intentional?): RFC 2222 (ref. '10') (Obsoleted by RFC 4422, RFC 4752) -- Obsolete informational reference (is this intentional?): RFC 2434 (ref. '12') (Obsoleted by RFC 5226) -- Obsolete informational reference (is this intentional?): RFC 2822 (ref. '13') (Obsoleted by RFC 5322) -- Obsolete informational reference (is this intentional?): RFC 3028 (ref. '15') (Obsoleted by RFC 5228, RFC 5429) Summary: 8 errors (**), 0 flaws (~~), 8 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group C. Newman 2 Internet-Draft Sun Microsystems 3 Expires: January 7, 2006 M. Duerst 4 AGU 5 A. Gulbrandsen 6 Oryx 7 July 6, 2005 9 Internet Application Protocol Collation Registry 10 draft-newman-i18n-comparator-04.txt 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on January 7, 2006. 37 Copyright Notice 39 Copyright (C) The Internet Society (2005). 41 Abstract 43 Many Internet application protocols include string-based lookup, 44 searching, or sorting operations. However the problem space for 45 searching and sorting international strings is large, not fully 46 explored, and is outside the area of expertise for the Internet 47 Engineering Task Force (IETF). Rather than attempt to solve such a 48 large problem, this specification creates an abstraction framework so 49 that application protocols can precisely identify a comparison 50 function and the repertoire of comparison functions can be extended 51 in the future. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 56 1.1 Conventions Used in this Document . . . . . . . . . . . . 4 57 2. Collation Definition and Purpose . . . . . . . . . . . . . . 4 58 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2.3 Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 5 61 3. Collation Name Syntax . . . . . . . . . . . . . . . . . . . 5 62 3.1 Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . 5 63 3.2 Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 6 64 3.3 Ordering Direction . . . . . . . . . . . . . . . . . . . . 6 65 3.4 URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 3.5 Naming Guidelines . . . . . . . . . . . . . . . . . . . . 7 67 4. Collation Specification Requirements . . . . . . . . . . . . 7 68 4.1 Operations Supported . . . . . . . . . . . . . . . . . . . 7 69 4.1.1 Equality . . . . . . . . . . . . . . . . . . . . . . . 8 70 4.2 Substring . . . . . . . . . . . . . . . . . . . . . . . . 8 71 4.3 Ordering . . . . . . . . . . . . . . . . . . . . . . . . . 8 72 4.4 Internal Canonicalization Algorithm . . . . . . . . . . . 9 73 4.5 Use of Lookup Tables . . . . . . . . . . . . . . . . . . . 9 74 4.6 Multi-Value Attributes . . . . . . . . . . . . . . . . . . 9 75 5. Application Protocol Requirements . . . . . . . . . . . . . 9 76 5.1 Character Encoding . . . . . . . . . . . . . . . . . . . . 9 77 5.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . 10 78 5.3 Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 10 79 5.4 Canonicalization Function . . . . . . . . . . . . . . . . 10 80 5.5 Disconnected Clients . . . . . . . . . . . . . . . . . . . 11 81 5.6 Error Codes . . . . . . . . . . . . . . . . . . . . . . . 11 82 5.7 Octet Collation . . . . . . . . . . . . . . . . . . . . . 11 83 6. Use by ACAP and Sieve . . . . . . . . . . . . . . . . . . . 11 84 7. Collation Registration . . . . . . . . . . . . . . . . . . . 11 85 7.1 Collation Registration Procedure . . . . . . . . . . . . . 11 86 7.2 Collation Registration Format . . . . . . . . . . . . . . 12 87 7.2.1 Registration Template . . . . . . . . . . . . . . . . 12 88 7.2.2 The collation Element . . . . . . . . . . . . . . . . 13 89 7.2.3 The name Element . . . . . . . . . . . . . . . . . . . 13 90 7.2.4 The title Element . . . . . . . . . . . . . . . . . . 13 91 7.2.5 The functions Element . . . . . . . . . . . . . . . . 13 92 7.2.6 The specification Element . . . . . . . . . . . . . . 13 93 7.2.7 The submitter Element . . . . . . . . . . . . . . . . 14 94 7.2.8 The owner Element . . . . . . . . . . . . . . . . . . 14 95 7.2.9 The version Element . . . . . . . . . . . . . . . . . 14 96 7.2.10 The UnicodeVersion Element . . . . . . . . . . . . . 14 97 7.2.11 The UCAVersion Element . . . . . . . . . . . . . . . 14 98 7.2.12 The UCAMatchLevel Element . . . . . . . . . . . . . 14 99 7.3 DTD for Collation Registration . . . . . . . . . . . . . . 15 100 7.4 Structure of Collation Registry . . . . . . . . . . . . . 15 101 7.5 Example Initial Registry Summary . . . . . . . . . . . . . 16 102 8. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . 16 103 9. Initial Collations . . . . . . . . . . . . . . . . . . . . . 17 104 9.1 ASCII Numeric Collation . . . . . . . . . . . . . . . . . 17 105 9.1.1 ASCII Numeric Collation Description . . . . . . . . . 17 106 9.1.2 ASCII Numeric Collation Registration . . . . . . . . . 18 107 9.2 ASCII Casemap Collation . . . . . . . . . . . . . . . . . 18 108 9.2.1 ASCII Casemap Collation Description . . . . . . . . . 18 109 9.2.2 Legacy English Casemap Collation Registration . . . . 19 110 9.2.3 English Casemap Collation Registration . . . . . . . . 19 111 9.3 Nameprep Collation . . . . . . . . . . . . . . . . . . . . 19 112 9.3.1 Nameprep Collation Description . . . . . . . . . . . . 19 113 9.3.2 Nameprep Collation Registration . . . . . . . . . . . 20 114 9.4 Basic Collation . . . . . . . . . . . . . . . . . . . . . 20 115 9.4.1 Basic Collation Description . . . . . . . . . . . . . 20 116 9.4.2 Basic Collation Registration . . . . . . . . . . . . . 23 117 9.4.3 Basic Accent Sensitive Match Collation Registration . 23 118 9.4.4 Basic Case Sensitive Match Collation Registration . . 23 119 9.5 Octet Collation . . . . . . . . . . . . . . . . . . . . . 24 120 9.5.1 Octet Collation Description . . . . . . . . . . . . . 24 121 9.5.2 Octet Collation Registration . . . . . . . . . . . . . 25 122 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . 25 123 11. Security Considerations . . . . . . . . . . . . . . . . . . 25 124 12. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . 25 125 13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . 25 126 13.1 Changes From -03 . . . . . . . . . . . . . . . . . . . . 25 127 13.2 Changes From -02 . . . . . . . . . . . . . . . . . . . . 26 128 13.3 Changes From -01 . . . . . . . . . . . . . . . . . . . . 26 129 13.4 Changes From -00 . . . . . . . . . . . . . . . . . . . . 26 130 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 131 14.1 Normative References . . . . . . . . . . . . . . . . . . 27 132 14.2 Informative References . . . . . . . . . . . . . . . . . 27 133 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 28 134 Intellectual Property and Copyright Statements . . . . . . . 29 136 1. Introduction 138 The ACAP [11] specification introduced the concept of a comparator 139 (which we call collation in this document), but failed to create an 140 IANA registry. With the introduction of stringprep [6] and the 141 Unicode Collation Algorithm [8], it is now time to create that 142 registry and populate it with some initial values appropriate for an 143 international community. This specification replaces and generalizes 144 the definition of a comparator in ACAP and creates a collation 145 registry. 147 1.1 Conventions Used in this Document 149 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" 150 in this document are to be interpreted as defined in "Key words for 151 use in RFCs to Indicate Requirement Levels" [1]. 153 The attribute syntax specifications use the Augmented Backus-Naur 154 Form (ABNF) [2] notation including the core rules defined in Appendix 155 A. This also inherits ABNF rules from Language Tags [5]. 157 The term 'protocol' is used in this memo in a very generic sense, and 158 includes things such as query languages. 160 2. Collation Definition and Purpose 162 2.1 Definition 164 A collation is a named function which takes two arbitrary length 165 character strings (with the exception of the i;octet (Section 9.5) 166 collation) as input and can be used to perform one or more of three 167 basic comparison operations: equality test, substring match, and 168 ordering test. 170 2.2 Purpose 172 Collations provide a multi-protocol abstraction layer for comparison 173 functions so the details of a particular comparison operation can be 174 specified by someone with appropriate expertise independent of the 175 application protocol that consumes that collation. This is similar 176 to the way a charset [14] separates the details of octet to character 177 mapping from a protocol specification such as MIME [9] or the way 178 SASL [10] separates the details of an authentication mechanism from a 179 protocol specification such as ACAP [11]. 181 Here a small diagram to help illustrate the value of this abstraction 182 layer: 184 +-------------------+ +-----------------+ 185 | IMAP i18n SEARCH |--+ | Basic | 186 +-------------------+ | +--| Collation Spec | 187 | | +-----------------+ 188 +-------------------+ | +-------------+ | +-----------------+ 189 | ACAP i18n SEARCH |--+--| Collation |--+--| A stringprep | 190 +-------------------+ | | Registry | | | Collation Spec | 191 | +-------------+ | +-----------------+ 192 +-------------------+ | | +-----------------+ 193 | ...other protocol |--+ | | locale-specific | 194 +-------------------+ +--| Collation Spec | 195 +-----------------+ 197 Thus IMAP, ACAP and future application protocols with international 198 search capability simply specify how to interface to the collation 199 registry instead of each protocol specification having to specify all 200 the collations it supports. 202 2.3 Sort Keys 204 One component of a collation is a canonicalization function which can 205 be pre-applied to single strings and may enhance the performance of 206 subsequent comparison operations. Normally, this is an 207 implementation detail of collations, but at times it may be useful 208 for an application protocol to expose collation canonicalization over 209 protocol. Collation canonicalization can range from an identity 210 mapping (e.g., the i;octet collation Section 9.5) to a mapping which 211 makes the string unreadable to a human (e.g., the basic collation). 213 3. Collation Name Syntax 215 3.1 Basic Syntax 217 The collation name itself is a single US-ASCII string beginning with 218 a letter and made up of letters, digits, or one of the following 4 219 symbols: "-", ";", "=" or ".". The name MUST NOT be longer than 254 220 characters. 222 collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "." 224 collation-name = ALPHA *253collation-char 226 3.2 Wildcards 228 The string a client uses to select a collation MAY contain a wildcard 229 ("*") character which matches zero or more collation-chars. Wildcard 230 characters MUST NOT be adjacent. Clients which support disconnected 231 operation SHOULD NOT use wildcards to select a collation, but clients 232 which provide collation operations only when connected to the server 233 MAY use wildcards. If the wildcard string matches multiple 234 collations, the server SHOULD select the collation with the broadest 235 scope (preferably international scope), the most recent table 236 versions and the greatest number of supported operations. A single 237 wildcard character ("*") refers to the application protocol collation 238 behavior that would occur if no explicit negotiation were used. 240 collation-wild = ("*" / (ALPHA ["*"])) *(collation-char ["*"]) 241 ; MUST NOT exceed 255 characters total 243 3.3 Ordering Direction 245 When used as a protocol element for ordering, the collation name MAY 246 be prefixed by either "+" or "-" to explicitly specify an ordering 247 direction. As mentioned previously, "+" has no effect on the 248 ordering function, while "-" negates the result of the ordering 249 function. In general, collation-order is used when a client requests 250 a collation, and collation-sel is used with the server informs the 251 client of the selected collation. 253 collation-sel = ["+" / "-"] collation-name 255 collation-order = ["+" / "-"] collation-wild 257 3.4 URIs 259 Some protocols are designed to use URIs to refer to collations rather 260 than simple tokens. A special section of the IANA web page is 261 reserved for such usage. The "collation-uri" form is used to refer 262 to a specific IANA registry entry for a specific named collation (the 263 collation registration may not actually be present if it is 264 experimental). The "collation-auri" form is an abstract name for an 265 ordering, a comparator pattern or a vendor private comparator. 267 collation-uri = "http://www.iana.org/assignments/collation/" 268 collation-name ".xml" 270 collation-auri = ( "http://www.iana.org/assignments/collation/" 271 collation-order [".xml"]) / other-uri 273 other-uri = absoluteURI 274 ; excluding the IANA collation namespace. 276 3.5 Naming Guidelines 278 While this specification makes no absolute requirements on the 279 structure of collation names, naming consistency is important, so the 280 following initial guidelines are provided. 282 Collation names with an international audience typically begin with 283 "i;". Collation names intended for a particular language or locale 284 typically begin with a language tag [5] followed by a ";". After the 285 first ";" is normally the name of the general collation algorithm 286 followed by a series of algorithm modifications separated by the ";" 287 delimiter. Parameterized modifications will use "=" to delimit the 288 parameter from the value. The version numbers of any lookup tables 289 used by the algorithm SHOULD be present as parameterized 290 modifications. 292 Collation names of the form *;vnd-domain.com;* are reserved for 293 vendor-specific collations created by the owner of the domain name 294 following the "vnd-" prefix. Registration of such collations (or the 295 name space as a whole) with intended use of "Vendor" is encouraged 296 when a public specification or open-source implementation is 297 available, but is not required. 299 4. Collation Specification Requirements 301 4.1 Operations Supported 303 A collation specification MUST state which of the three basic 304 functions are supported (equality, substring, ordering) and how to 305 perform each of the supported functions on any two input character 306 strings including empty strings (with the exception of the i;octet 307 (Section 9.5) collation). Collations must be deterministic, 308 i.e.given a collation with a specific name, and any two fixed input 309 strings, the result MUST be the same for the same operation. 310 Collations MUST be transitive. 312 In general, collation operations should behave as their names 313 suggest. While a collation may be new, the operations are not, so 314 the new collation's algorithm for each operation should be as similar 315 as possible to those of older collations. For example, an collator 316 should not provide a "substring" operator that would morph IMAP 317 substring SEARCH into another kind of search. 319 4.1.1 Equality 321 The equality function always returns "match" or "no-match" when 322 supplied valid input and MAY return "error" if the input strings are 323 not valid character strings or violate other collation constraints. 325 4.2 Substring 327 The substring matching function determines if the first string is a 328 substring of the second string. A collation which supports substring 329 matching will automatically support the two special cases of 330 substring matching: prefix and suffix matching if those special cases 331 are supported by the application protocol. It returns "match" or 332 "no-match" when supplied valid input and returns "error" when 333 supplied invalid input. 335 Application protocols MAY return position information for substring 336 matches. If this is done, the position information MUST include both 337 the starting offset and the ending offset in the string. This is 338 important because more sophisticated collations can match strings of 339 unequal length (for example, a pre-composed accented character will 340 match a decomposed accented character). 342 4.3 Ordering 344 The ordering function determines how two character strings are 345 ordered. It returns "-1" if the first string is listed before the 346 second string according to the collation, "+1" if the second string 347 is listed before the first string, and "0" if the two strings are 348 equal. If the order of the two strings is reversed, the result of 349 the ordering function of the collation MUST be reversed, i.e. results 350 which would be "+1" are instead "-1" and results which would be "-1" 351 are instead "+1", while results which would be "0" stay "0". In 352 general, collations SHOULD NOT return "0" unless the two character 353 sequences are identical. 355 Since ordering is normally used to sort a list of items, "error" is 356 not a useful return value from the ordering function. Strings with 357 errors that prevent the sorting algorithm from functioning correctly 358 should sort to the end of the list. Thus if the first string is 359 invalid while the second string is valid, the result will be "+1". 360 If the second string is invalid while the first string is valid, the 361 result will be "-1". If both strings are invalid, the result SHOULD 362 match the result from the "i;octet" collation. 364 When the collation is used with a "+" prefix, the behavior is the 365 same as when used with no prefix. When the collation is used with a 366 "-" prefix, the result of the ordering function of the collation MUST 367 be reversed. 369 4.4 Internal Canonicalization Algorithm 371 A collation specification MUST describe the internal canonicalization 372 algorithm. This algorithm can be applied to individual strings and 373 the result strings can be stored to potentially optimize future 374 comparison operations. A collation MAY specify that the 375 canonicalization algorithm is the identity function. The output of 376 the canonicalization algorithm MAY have no meaning to a human. 378 4.5 Use of Lookup Tables 380 Collations which use more than one customizable lookup table in a 381 documented format MUST assign numbers to the tables they use. This 382 permits an application protocol command to access the tables used by 383 a server collation. 385 4.6 Multi-Value Attributes 387 Some application protocols will permit the use of multi-value 388 attributes with a collation. This paragraph describes the rules that 389 apply unless otherwise specified by the collation or application 390 protocol. In the case of the equality and substring operation, the 391 operations are applied over each pair of single values from the two 392 inputs. If any combination produces an error, the result is an 393 error. Otherwise, if any combination produces a "match", the result 394 is a match. Otherwise the result is "no-match". For the ordering 395 function, the smallest ordinal character string from the first set of 396 values is compared to the smallest ordinal character string from the 397 second set of values. 399 5. Application Protocol Requirements 401 This section describes the requirements and issues that an 402 application protocol which offers searching, substring matching 403 and/or sorting and permits the use of characters outside the US-ASCII 404 charset needs to consider. 406 5.1 Character Encoding 408 The protocol specification has to make sure that it is clear on which 409 characters (rather than just octets) the collations are used. This 410 can be done by specifying the protocol itself in terms of characters 411 (e.g. in the case of a query language), by specifying a single 412 character encoding for the protocol (e.g. UTF-8 [3]), or by 413 carefully describing the relevant issues of character encoding 414 labeling and conversion. In the later case, details to consider 415 include how to handle unknown charsets, any charsets which are 416 mandatory-to-implement, any issues with byte-order that might apply, 417 and any transfer encodings which need to be supported. 419 5.2 Operations 421 The protocol must specify which of the operations defined in this 422 specification (equality matching, substring matching and ordering) 423 can be invoked in the protocol, and how they are invoked. There may 424 be more than one way to invoke an operation. 426 The protocol MUST provide a mechanism for the client to select the 427 collation to use with equality matching, substring matching and 428 ordering. 430 If the protocol provides positional information for the results of a 431 substring match, that positional information MUST fully specify the 432 substring in the result that matches independent of the length of the 433 search string. For example, returning both the starting and ending 434 offset of the match would suffice, as would the starting offset and a 435 length. Returning just the starting offset is not acceptable. This 436 rule is necessary because advanced collations can treat strings of 437 different lengths as equal (for example, pre-composed and decomposed 438 accented characters). 440 5.3 Wildcards 442 The protocol MUST specify whether it allows the use of wildcards in 443 collation identifiers or not. If the protocol allows wildcards, 444 then: 445 The protocol MUST specify how comparisons behave in the absence of 446 explicit collation negotiation or when a collation of "*" is 447 requested. The protocol MAY specify that the default collation 448 used in such circumstances is sensitive to server configuration. 449 The protocol SHOULD provide a way to list available collations 450 matching a given wildcard pattern or patterns. 452 5.4 Canonicalization Function 454 If the protocol provides a canonicalization function for strings, 455 then use of collations MAY be appropriate for that function. [Need 456 to describe how that would be done.] 458 5.5 Disconnected Clients 460 If the protocol supports disconnected clients, then a mechanism for 461 the client to precisely replicate the server's collation algorithm is 462 likely desirable. Thus the protocol MAY wish to provide a command to 463 fetch lookup tables used by charset conversions and collations. 465 5.6 Error Codes 467 The protocol specification should consider assigning protocol error 468 codes for the following circumstances: 469 o The client requests the use of a collation by name or pattern, but 470 no implemented collation matches that pattern. 471 o The client attempts to use a collation for a function that is not 472 supported by that collation. For example, attempting to use the 473 "i;ascii-numeric" collation for a substring matching function. 474 o The client uses an equality or substring matching collation and 475 the result is an error. It may be appropriate to distinguish 476 between the two input strings, particularly when one is supplied 477 by the client and one is stored by the server. It might also be 478 appropriate to distinguish the specific case of an invalid UTF-8 479 string. 481 5.7 Octet Collation 483 If the protocol permits the use of the i;octet (Section 9.5) 484 collation, it has to say so. The octet collation SHOULD NOT be used 485 unless the protocol uses UTF-8 as its single character encoding. 487 If the protocol permits the use of collations with data structures 488 other than strings, the protocol MUST describe the default behavior 489 for a collation with that data structure. 491 6. Use by ACAP and Sieve 493 Both ACAP [11] and Sieve [15] are standards track specifications 494 which used collations prior to the creation of this specification and 495 registry. Those standards do not meet all the application protocol 496 requirements described in Section 5. For backwards compatibility, 497 those protocols use the "i;ascii-casemap" instead of "en;ascii- 498 casemap". These protocols allow the use of the i;octet (Section 9.5) 499 collation working directly on UTF-8 data as used in these protocols. 501 7. Collation Registration 503 7.1 Collation Registration Procedure 505 The IETF will create a mailing list, collation@ietf.org, which can be 506 used for public discussion of collation proposals prior to 507 registration. Use of the mailing list is encouraged but not 508 required. The actual registration procedure will not begin until the 509 completed registration template is sent to iana@iana.org. The IESG 510 will appoint a designated expert who will monitor the 511 collation@ietf.org mailing list and review registrations forwarded 512 from IANA. The designated expert is expected to tell IANA and the 513 submitter of the registration within two weeks whether the 514 registration is approved, approved with minor changes, or rejected 515 with cause. When a registration is rejected with cause, it can be 516 re-submitted if the concerns listed in the cause are addressed. 517 Decisions made by the designated expert can be appealed to the IESG 518 and subsequently follow the normal appeals procedure for IESG 519 decisions. 521 Collation registrations in a standards track, BCP or IESG-approved 522 experimental RFC are owned by the IETF, and changes to the 523 registration follow normal procedures for updating such documents. 524 Collation registrations in other RFCs are owned by the RFC author(s). 525 Other collation registrations are owned by the individual(s) listed 526 in the contact field of the registration and IANA will preserve this 527 information. Changes to a registration MUST be approved by the 528 owner. In the event the owner cannot be contacted for a period of 529 one month and a change is deemed necessary, the IESG MAY re-assign 530 ownership to an appropriate party. 532 7.2 Collation Registration Format 534 Registration of a collation is done by sending a well-formed XML 535 document that validates with collationreg.dtd (Section 7.3). 537 7.2.1 Registration Template 539 Here is a template for the registration: 541 542 543 544 collation name 545 technical title for collation 546 equality order substring 547 specification reference 548 email address of owner or IETF 549 email address of submitter 550 1 551 3.2 552 3.1.1 553 555 7.2.2 The collation Element 557 The root of the registration document MUST be a element. 558 The collation element contains the other elements in the 559 registration, which are described in the following sub-subsections, 560 in the order given here. 562 The element MAY include an "rfc=" attribute if the 563 specification is in an RFC. The "rfc=" attribute gives only the 564 number of the RFC, without any prefix, such as "RFC", or suffix, such 565 as ".txt". 567 The element MUST include a "scope=" attribute, which MUST 568 have one of the values "i18n", "local" or "other". 570 The element MUST include an "intendedUse=" attribute, 571 which must have one fo the values "common", "limited", "vendor", or 572 "deprecated". Collation specifications intended for "common" use are 573 expected to reference standards from standards bodies with 574 significant experience dealing with the details of international 575 character sets. 577 Be aware that future revisions of this specification may add 578 additional function types, as well as additional XML attributes and 579 values. Any system which automatically parses these XML documents 580 MUST take this into account to preserve future compatibility. A DTD 581 for the current definition of the collation registration template is 582 given in Section 7.3 584 7.2.3 The name Element 586 The element gives the precise name of the comparator. The 587 element is mandatory. 589 7.2.4 The title Element 591 The element give the title of the comparator. The <title> 592 element is mandatory. 594 7.2.5 The functions Element 596 The <functions> element lists which of the three functions the 597 comparator provides. The <functions> element is mandatory. 599 7.2.6 The specification Element 601 The <specification> element describes where to find the 602 specification. The <specification> element is mandatory. It MAY 603 have a URI attribute. There may be more than one <specification> 604 elements. (For example, a collation which has previously been 605 specified by a vendor may have been published on that vendor's web 606 site, and subsequently by a standards organization.) 608 In case the different specifications differ, the RFC is the 609 definitive specification. 611 7.2.7 The submitter Element 613 The <submitter> element provides an RFC 2822 email address for the 614 person who submitted the registration. It is optional if the <owner> 615 element contains an email address. 617 There may be more than one <submitter> elements. 619 7.2.8 The owner Element 621 The <owner> element contains either the four letters "IETF" or an 622 email address of the owner of the registration. The <owner> element 623 is mandatory. There may be more than one <owner> elements. If so, 624 all owners are equal. Each owner can speak for all. 626 7.2.9 The version Element 628 The <version> element is included when the registration is likely to 629 be revised or has been revised in such a way that the results change 630 for certain input strings. The <version> element is optional. 632 7.2.10 The UnicodeVersion Element 634 The <UnicodeVersion> element indicates the version number of the 635 UnicodeData file on which the collation is based. The 636 <UnicodeVersion> element is optional. 638 7.2.11 The UCAVersion Element 640 The <UCAVersion> element specifics the version of the Unicode 641 Collation Algorithm on which the collation is based. The 642 <UCAVersion> element is optional. 644 7.2.12 The UCAMatchLevel Element 646 The <UCAMatchLevel> element specifies the number of Unicode Collation 647 Algorithm sort key levels used for the equality and substring 648 operations. The <UCAMatchLevel> element is optional. 650 7.3 DTD for Collation Registration 652 <!- 653 DTD for Collation Registration Document 655 Data types: 657 entity description 658 ====== =========== 659 NUMBER [0-9]+ 660 URI As defined in RFC YYYY 661 CTEXT printable ASCII text (no line-terminators) 662 TEXT character data 663 -> 664 <!ENTITY % NUMBER "CDATA"> 665 <!ENTITY % URI "CDATA"> 666 <!ENTITY % CTEXT "#PCDATA"> 667 <!ENTITY % TEXT "#PCDATA"> 668 <!ELEMENT collation (name,title,functions,specification+,owner+, 669 submitter*,version?,UnicodeVersion?, 670 UCAVersion?,UCAMatchLevel?)> 671 <!ATTLIST collation 672 rfc %NUMBER; "0" 673 scope (i18n|local|other) #IMPLIED 674 intendedUse (common|limited|vendor|deprecated) #IMPLIED> 675 <!ELEMENT name (%CTEXT;)> 676 <!ELEMENT title (%CTEXT;)> 677 <!ELEMENT functions (%CTEXT;)> 678 <!ELEMENT specification (%TEXT;)> 679 <!ATTLIST specification 680 uri %URI; ""> 681 <!ELEMENT owner (%CTEXT;)> 682 <!ELEMENT submitter (%CTEXT;)> 683 <!ELEMENT version (%CTEXT;)> 684 <!ELEMENT UnicodeVersion (%CTEXT;)> 685 <!ELEMENT UCAVersion (%CTEXT;)> 686 <!ELEMENT UCAMatchLevel (%CTEXT;)> 688 7.4 Structure of Collation Registry 690 Once the registration is approved, IANA will store each XML 691 registration document in a URL of the form 692 http://www.iana.org/assignments/collation/collation-name.xml where 693 collation-name is the contents of the name element in the 694 registration. Both the submitter and the designated expert is 695 responsible for verifying that the XML is well-formed and complies 696 with the DTD. 698 IANA will also maintain a text summary of the registry under the name 699 http://www.iana.org/assignments/collation/summary.txt. This summary 700 is divided into four sections. The first section is for collations 701 intended for common use. This section is intended for collation 702 registrations published in IESG approved RFCs or for locally scoped 703 collations from the primary standards body for that locale. The 704 designated expert is encouraged to reject collation registrations 705 with an intended use of "common" if the expert believes it should be 706 "limited", as it is desirable to keep the number of "common" 707 registrations small and high quality. The second section is reserved 708 for limited use collations. The third section is reserved for 709 registered vendor specific collations. The final section is reserved 710 for deprecated collations. 712 7.5 Example Initial Registry Summary 714 The following is an example of how IANA might structure the initial 715 registry summary.txt file: 717 Collation Functions Scope Reference 718 --------- --------- ----- --------- 719 Common Use Collations: 720 i;nameprep;v=1;uv=3.2 e, o, s i18n [RFC XXXX] 721 i;basic;uca=3.1.1;uv=3.2 e, o, s i18n [RFC XXXX] 722 i;basic;uca=3.1.1;uv=3.2;match=accent e, o, s i18n [RFC XXXX] 723 i;basic;uca=3.1.1;uv=3.2;match=case e, o, s i18n [RFC XXXX] 724 en;ascii-casemap e, o, s Local [RFC XXXX] 726 Limited Use Collations: 727 i;octet e, o, s Other [RFC XXXX] 728 i;ascii-numeric e, o Other [RFC XXXX] 730 Vendor Collations: 732 Deprecated Collations: 733 i;ascii-casemap e, o, s Local [RFC XXXX] 735 References 736 ---------- 737 [RFC XXXX] Newman, C., "Internet Application Protocol Collation 738 Registry", RFC XXXX, Sun Microsystems, October 2003. 740 8. Guidelines for Expert Reviewer 742 The expert reviewer appointed by the IESG has fairly broad latitude 743 for this registry. While a number of collations are expected 744 (particularly customizations of the basic collation for localized 745 use), an explosion of collations (particularly common use collations) 746 is not desirable for widespread interoperability. However, it is 747 important for the expert reviewer to provide cause when rejecting a 748 registration, and when possible to describe corrective action to 749 permit the registration to proceed. The following table includes 750 some example reasons to reject a registration with cause: 751 o The registration is not a well-formed XML document that follows 752 the DTD. 753 o The registration has intended use of "common", but there is no 754 evidence the collation will be widely deployed so it should be 755 listed as "limited". 756 o The registration has intended use of "common", but is redundant 757 with the functionality of a previously registered "common" 758 collation. 759 o The registration has intended use of "common", but the 760 specification is not detailed enough to allow interoperable 761 implementations by others. 762 o The collation name fails to precisely identify the version numbers 763 of relevant tables to use. 764 o The registration fails to meet one of the "MUST" requirements in 765 Section 4. 766 o The collation name fails to meet the syntax in Section 3. 767 o The collation specification referenced in the registration is 768 vague or has optional features without a clear behavior specified. 769 o The referenced specification does not adequately address security 770 considerations specific to that collation. 771 o The regitration's operations are needlessly different from those 772 of traditional operations. 774 9. Initial Collations 776 This section describes an initial set of collations for the collation 777 registry. 779 9.1 ASCII Numeric Collation 781 9.1.1 ASCII Numeric Collation Description 783 The "i;ascii-numeric" collation is a simple collation intended for 784 use with arbitrary sized decimal numbers stored as octet strings of 785 US-ASCII digits (0x30 to 0x39). It supports equality and ordering, 786 but does not support the substring function. The algorithm is as 787 follows: 788 1. If neither string begins with a digit, return "error" if 789 matching, or the result of the "i;octet" collation for ordering. 791 2. If the first string begins with a digit and the second string 792 does not, return "error" if matching and "-1" for ordering. 793 3. If the second string begins with a digit and the first string 794 does not, return "error" if matching and "+1" for ordering. 795 4. Let "n" be the number of digits at the beginning of the first 796 string, and "m" be the number of digits at the beginning of the 797 second string. 798 5. If n is equal to m, return the result of the "i;octet" collation. 799 6. If n is greater than m, prepend a string of "n - m" zeros to the 800 second string and return the result of the "i;octet" collation. 801 7. If m is greater than n, prepend a string of "m - n" zeros to the 802 first string and return the result of the "i;octet" collation. 804 The associated canonicalization algorithm is to truncate the input 805 string at the first non-digit character. 807 9.1.2 ASCII Numeric Collation Registration 809 <?xml version='1.0'?> 810 <!DOCTYPE rfc SYSTEM 'collationreg.dtd'> 811 <collation rfc="XXXX" scope="other" intendedUse="limited"> 812 <name>i;ascii-numeric</name> 813 <title>ASCII Numeric 814 equality order 815 RFC XXXX 816 IETF 817 chris.newman@sun.com 818 820 9.2 ASCII Casemap Collation 822 9.2.1 ASCII Casemap Collation Description 824 The "en;ascii-casemap" collation is a simple collation intended for 825 use with English language text in pure US-ASCII. It provides 826 equality, substring and ordering functions. The algorithm first 827 applies a canonicalization algorithm to both input strings which 828 subtracts 32 (0x20) from all octet values between 97 (0x61) and 122 829 (0x7A) inclusive. The result of the collation is then the same as 830 the result of the "i;octet" collation for the canonicalized strings. 831 Care should be taken when using OS-supplied functions to implement 832 this collation as this is not locale sensitive, but functions such as 833 strcasecmp and toupper can be locale sensitive. 835 For historical reasons, in the context of ACAP and Sieve, the name 836 "i;ascii-casemap" is a synonym for this collation. 838 9.2.2 Legacy English Casemap Collation Registration 840 841 842 843 i;ascii-casemap 844 Legacy English Casemap 845 equality order substring 846 RFC XXXX 847 IETF 848 chris.newman@sun.com 849 851 9.2.3 English Casemap Collation Registration 853 854 855 856 en;ascii-casemap 857 English Casemap 858 equality order substring 859 RFC XXXX 860 IETF 861 chris.newman@sun.com 862 864 9.3 Nameprep Collation 866 9.3.1 Nameprep Collation Description 868 The "i;nameprep;v=1;uv=3.2" collation is an implementation of the 869 nameprep [7] specification based on normalization tables from Unicode 870 version 3.2. This collation applies the nameprep canoncialization 871 function to both input strings and then returns the result of the 872 i;octet collation on the canonicalized strings. While this collation 873 offers all three functions, the ordering function it provides is 874 inadequate for use by the majority of the world. 876 Version number 1 is applied to nameprep as specified in RFC 3491. If 877 the nameprep specification is revised without any changes that would 878 produce different results when given the same pair of input octet 879 strings, then the version number will remain unchanged. 881 The table numbers for tables used by nameprep are as follows: 883 +--------------+-----------------------+ 884 | Table Number | Table Name | 885 +--------------+-----------------------+ 886 | 1 | UnicodeData-3.2.0.txt | 887 | 2 | Table B.1 | 888 | 3 | Table B.2 | 889 | 4 | Table C.1.2 | 890 | 5 | Table C.2.2 | 891 | 6 | Table C.3 | 892 | 7 | Table C.4 | 893 | 8 | Table C.5 | 894 | 9 | Table C.6 | 895 | 10 | Table C.7 | 896 | 11 | Table C.8 | 897 | 12 | Table C.9 | 898 +--------------+-----------------------+ 900 9.3.2 Nameprep Collation Registration 902 903 904 905 i;nameprep;v=1;uv=3.2 906 Nameprep 907 equality order substring 908 RFC XXXX 909 IETF 910 chris.newman@sun.com 911 1 912 3.2 913 915 9.4 Basic Collation 917 9.4.1 Basic Collation Description 919 The basic collation is intended to provide tolerable results for a 920 number of languages for all three functions (equality, substring and 921 ordering) so it is suitable as a mandatory-to-implement collation for 922 protocols which include ordering support. The ordering function of 923 the basic collation is the Unicode Collation Algorithm [8] version 9 924 (UCAv9). 926 The equality and substring functions are created as described in 927 UCAv9 section 8. While that section is informative to UCAv9, it is 928 normative to this collation specification. 930 This collation is based on Unicode version 3.2, with the following 931 tables relevant: 932 1. For the normalization step, 933 934 is used. Column 5 is used to determine the canonical 935 decomposition, while column 3 contains the canonical combining 936 classes necessary to attain canonical order. 937 2. The table of characters which require a logical order exception 938 is a subset of the table in 939 and 940 is included here: 942 0E40..0E44 ; Logical_Order_Exception 943 # Lo [5] THAI CHARACTER SARA E..THAI CHARACTER SARA AI MAIMALAI 944 0EC0..0EC4 ; Logical_Order_Exception 945 # Lo [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI 947 # Total code points: 10 949 3. The table used to translate normalized code points to a sort key 950 is . 952 UCAv9 includes a number of configurable parameters and steps labelled 953 as potentially optional. The following list summarizes the defaults 954 used by this collation: 955 o The logical order exception step is mandatory by default to 956 support the largest number of languages. 957 o Steps 2.1.1 to 2.1.3 are mandatory as the repertoire of the basic 958 collation is intended to be large. 959 o The second level in the sort key is evaluated forwards by default. 960 o The variable weighting uses the "non-ignorable" option by default. 961 o The semi-stable option is not used by default. 962 o Support for exactly three levels of collation is the default 963 behavior. 964 o No preprocessing step is used by the basic collation prior to 965 applying the UCAv9 algorithm. Note that an application protocol 966 specification MAY require pre-processing prior to the use of any 967 collations. 968 o The equality and substring algorithms exclude differences at level 969 2 and 3 by default (thus it is case-insensitive and ignores 970 accentual distinctions. 971 o The equality and substring algorithms use the "Whole Characters 972 Only" feature described in UCAv9 section 8 by default. 974 The exact collation name with these defaults is 975 "i;basic;uca=3.1.1;uv=3.2". When a specification states that the 976 basic collation is mandatory-to-implement, only this specific name is 977 mandatory-to-implement. 979 In order to allow modification of the optional behaviors, the 980 following ABNF is used for variations of the basic collation: 982 basic-collation = ("i" / Language-Tag) ";basic;uca=3.1.1;uv=3.2" 983 [";match=accent" / ";match=case"] 984 [";tailor=" 1*collation-char ] 986 If multiple modifiers appear, they MUST appear in the order described 987 above. The modifiers have the following meanings: 988 match=accent Both the first and second levels of the sort keys are 989 considered relevant to the equality and substring 990 operations (rather than the default of first level 991 only). This makes the matching functions sensitive to 992 accentual distinctions. 993 match=case The first three levels of sort keys are considered 994 relevant to the equality and substring operations. 995 This makes the matching functions sensitive to both 996 case and accentual distinctions. 998 The default weighting option is "non-ignorable". The "semi-stable" 999 sort key option is not used by default. 1001 The canonicalization algorithm associated with this collation is the 1002 output of step 3 of the UCAv9 algorithm (described in section 4.3 of 1003 the UCA specification). This canonicalization is not suitable for 1004 human consumption. 1006 Finally, the UCAv9 algorithm permits the "allkeys" table to be 1007 tailored to a language. People who make quality tailorings are 1008 encouraged to register those tailorings using the collation registry. 1009 Tailoring names beginning with "x" are reserved for experimental use, 1010 are treated as "Limited use" and MUST NOT match wildcards if any 1011 registered collation is available that does match. 1013 9.4.2 Basic Collation Registration 1015 1016 1017 1018 i;basic;uca=3.1.1;uv=3.2 1019 Basic 1020 equality order substring 1021 RFC XXXX 1022 IETF 1023 chris.newman@sun.com 1024 3.2 1025 3.1.1 1026 1 1027 1029 9.4.3 Basic Accent Sensitive Match Collation Registration 1031 1032 1033 1034 i;basic;uca=3.1.1;uv=3.2;match=accent 1035 Basic Accent Sensitive Match 1036 equality order substring 1037 RFC XXXX 1038 IETF 1039 chris.newman@sun.com 1040 3.2 1041 3.1.1 1042 2 1043 1045 9.4.4 Basic Case Sensitive Match Collation Registration 1047 1048 1049 1050 i;basic;uca=3.1.1;uv=3.2;match=case 1051 Basic Case Sensitive Match 1052 equality order substring 1053 RFC XXXX 1054 IETF 1055 chris.newman@sun.com 1056 3.2 1057 3.1.1 1058 3 1060 1062 9.5 Octet Collation 1064 9.5.1 Octet Collation Description 1066 The "i;octet" collation is a simple and fast collation intended for 1067 use on binary octet strings rather than on character data. It is the 1068 only such collation; it is not possible to register additional 1069 collations with this property. Protocols that want to make this 1070 collation available have to do so by explicitly allowing it. If not 1071 explicitly allowed, it MUST NOT be used. It never returns an "error" 1072 result. It provides equality, substring and ordering functions. 1074 The ordering algorithm is as follows: 1075 1. If both strings are the empty string, return the result "0". 1076 2. If the first string is empty and the second is not, return the 1077 result "-1". 1078 3. If the second string is empty and the first is not, return the 1079 result "+1". 1080 4. If both strings begin with the same octet value, remove the first 1081 octet from both strings and repeat this algorithm from step 1. 1082 5. If the unsigned value (0 to 255) of the first octet of the first 1083 string is less than the unsigned value of the first octet of the 1084 second string, then return "-1". 1085 6. If this step is reached, return "+1". 1087 This algorithm is roughly equivalent to the C library function memcmp 1088 with appropriate length checks added. 1090 The matching function returns "match" if the sorting algorithm would 1091 return "0". Otherwise the matching function returns "no-match". 1093 The substring function returns "match" if the first string is the 1094 empty string, or if there exists a substring of the second string of 1095 length equal to the length of the first string which would result in 1096 a "match" result from the equality function. Otherwise the substring 1097 function returns "no-match". 1099 The associated canonicalization algorithm is the identity function. 1101 9.5.2 Octet Collation Registration 1103 This collation is defined with intendedUse="limited" because it can 1104 only be used by protocols that explicitly allow it. 1106 1107 1108 1109 i;octet 1110 Octet 1111 equality order substring 1112 RFC XXXX 1113 IETF 1114 chris.newman@sun.com 1115 1117 10. IANA Considerations 1119 Section 7 defines how to register collations with IANA. Section 9 1120 defines a list of predefined collations, which should be registered 1121 when this document is approved and published as an RFC. 1123 11. Security Considerations 1125 Collations will normally be used with UTF-8 strings. Thus the 1126 security considerations for UTF-8 [3] and stringprep [6] also apply 1127 and are normative to this specification. 1129 12. Open Issues 1131 See http://www.w3.org/2004/08/ietf-collation. 1133 13. Change Log 1135 13.1 Changes From -03 1137 (This does not include all changes made.) 1138 1. Checked and resolved most issues marked 'check whether this is 1139 true' or similar. 1140 2. Resolved nameprep issue: No. 1141 3. Removed NULL for compatibility with existing collations (IMAP 1142 SORT, Sieve). 1143 4. There can be multiple owners and submitters. Say how. 1144 5. Added a requirement that common collations must now be 1145 interoperable. Insufficiently detailed specs cannot be "common". 1147 6. Added a guideline that the operations provided by new collations 1148 should be reminiscent of similar operations on existing 1149 collations. 1151 13.2 Changes From -02 1153 1. Changed from data being octet sequences (in UTF-8) to data being 1154 character sequences (with octet collation as an exception). 1155 2. Made XML format description much more structured. 1156 3. Changed to , because this spelling is much 1157 more common. 1158 4. Defined 'protocol' to include query languages. 1159 5. Reorganized document, in particular IANA considerations section 1160 (which newly is just a list of pointers). 1161 6. Added subsections, and a 'Structure of this Document' section. 1162 7. Updated references. 1163 8. Created a 'Change Log' chapter, with sections for each draft. 1164 9. Reduced 'Open issues' section, open issues are now maintained at 1165 http://www.w3.org/2004/08/ietf-collation. 1167 13.3 Changes From -01 1169 Add IANA comment to open issues. Otherwise this is just a re-publish 1170 to keep the document alive. 1172 13.4 Changes From -00 1174 1. Replaced the term comparator with collation. While comparator is 1175 somewhat more precise because these abstract functions are used 1176 for matching as well as ordering, collation is the term used by 1177 other parts of the industry. Thus I have changed the name to 1178 collation for consistency. 1179 2. Remove all modifiers to the basic collation except for the 1180 customization and the match rules. The other behavior 1181 modifications can be specified in a customization of the 1182 collation. 1183 3. Use ";" instead of "-" as delimiter between parameters to make 1184 names more URL-ish. 1185 4. Add URL form for comparator reference. 1186 5. Switched registration template to use XML document. 1187 6. Added a number of useful registration template elements related 1188 to the Unicode Collation Algorithm. 1189 7. Switched language from "custom" to "tailor" to match UCA language 1190 for tailoring of the collation algorithm. 1192 14. References 1193 14.1 Normative References 1195 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1196 Levels", BCP 14, RFC 2119, March 1997. 1198 [2] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1199 Specifications: ABNF", RFC 2234, November 1997. 1201 [3] Yergeau, F., "UTF-8, a transformation format of ISO 10646", 1202 STD 63, RFC 3629, November 2003. 1204 [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1205 Resource Identifier (URI): Generic Syntax", 1206 draft-fielding-uri-rfc2396bis-07.txt (work in progress), 1207 April 2004. 1209 [5] Alvestrand, H., "Tags for the Identification of Languages", 1210 BCP 47, RFC 3066, January 2001. 1212 [6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized 1213 Strings ("stringprep")", RFC 3454, December 2002. 1215 [7] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for 1216 Internationalized Domain Names (IDN)", RFC 3491, March 2003. 1218 [8] Davis, M. and K. Whistler, "Unicode Collation Algorithm version 1219 9", July 2002, 1220 . 1222 14.2 Informative References 1224 [9] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1225 Extensions (MIME) Part One: Format of Internet Message Bodies", 1226 RFC 2045, November 1996. 1228 [10] Myers, J., "Simple Authentication and Security Layer (SASL)", 1229 RFC 2222, October 1997. 1231 [11] Newman, C. and J. Myers, "ACAP -- Application Configuration 1232 Access Protocol", RFC 2244, November 1997. 1234 [12] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA 1235 Considerations Section in RFCs", BCP 26, RFC 2434, 1236 October 1998. 1238 [13] Resnick, P., "Internet Message Format", RFC 2822, April 2001. 1240 [14] Freed, N. and J. Postel, "IANA Charset Registration 1241 Procedures", BCP 19, RFC 2978, October 2000. 1243 [15] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028, 1244 January 2001. 1246 Authors' Addresses 1248 Chris Newman 1249 Sun Microsystems 1250 1050 Lakes Drive 1251 West Covina, CA 91790 1252 US 1254 Email: chris.newman@sun.com 1256 Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever possible, for example as "Dürst" in XML and HTML.) 1257 Aoyama Gakuin University 1258 5-10-1 Fuchinobe 1259 Sagamihara, Kanagawa 229-8558 1260 Japan 1262 Phone: +81 466 49 1170 1263 Fax: +81 466 49 1171 1264 Email: mailto:duerst@it.aoyama.ac.jp 1265 URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ 1267 Arnt Gulbrandsen 1268 Oryx Mail Systems GmbH 1269 Schweppermannstr. 8 1270 Munich 81671 1271 Germany 1273 Phone: +49 89 4502 9757 1274 Fax: +49 89 4502 9758 1275 Email: mailto:arnt@oryx.com 1276 URI: http://www.oryx.com/arnt/ 1278 Intellectual Property Statement 1280 The IETF takes no position regarding the validity or scope of any 1281 Intellectual Property Rights or other rights that might be claimed to 1282 pertain to the implementation or use of the technology described in 1283 this document or the extent to which any license under such rights 1284 might or might not be available; nor does it represent that it has 1285 made any independent effort to identify any such rights. Information 1286 on the procedures with respect to rights in RFC documents can be 1287 found in BCP 78 and BCP 79. 1289 Copies of IPR disclosures made to the IETF Secretariat and any 1290 assurances of licenses to be made available, or the result of an 1291 attempt made to obtain a general license or permission for the use of 1292 such proprietary rights by implementers or users of this 1293 specification can be obtained from the IETF on-line IPR repository at 1294 http://www.ietf.org/ipr. 1296 The IETF invites any interested party to bring to its attention any 1297 copyrights, patents or patent applications, or other proprietary 1298 rights that may cover technology that may be required to implement 1299 this standard. Please address the information to the IETF at 1300 ietf-ipr@ietf.org. 1302 Disclaimer of Validity 1304 This document and the information contained herein are provided on an 1305 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1306 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1307 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1308 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1309 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1310 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1312 Copyright Statement 1314 Copyright (C) The Internet Society (2005). This document is subject 1315 to the rights, licenses and restrictions contained in BCP 78, and 1316 except as set forth therein, the authors retain all their rights. 1318 Acknowledgment 1320 Funding for the RFC Editor function is currently provided by the 1321 Internet Society.