idnits 2.17.1 draft-ietf-ltru-matching-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 790. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 767. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 774. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 780. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 23, 2006) is 6635 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2616errata' is defined on line 707, but no explicit reference was found in the text ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Duplicate reference: RFC2616, mentioned in 'RFC2616errata', was also mentioned in 'RFC2616'. -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc 4 Obsoletes: 3066 (if approved) M. Davis, Ed. 5 Expires: August 27, 2006 Google 6 February 23, 2006 8 Matching of Language Tags 9 draft-ietf-ltru-matching-10 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on August 27, 2006. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 This document describes different mechanisms for comparing, matching, 43 and evaluating language tags. Possible algorithms for language 44 negotiation or content selection, filtering, and lookup are 45 described. This document, in combination with RFC 3066bis (Ed.: 46 replace "3066bis" with the RFC number assigned to 47 draft-ietf-ltru-registry-14), replaces RFC 3066, which replaced RFC 48 1766. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 2. The Language Range . . . . . . . . . . . . . . . . . . . . . . 4 54 2.1. Basic Language Range . . . . . . . . . . . . . . . . . . . 4 55 2.2. Extended Language Range . . . . . . . . . . . . . . . . . 5 56 2.3. The Language Priority List . . . . . . . . . . . . . . . . 5 57 3. Types of Matching . . . . . . . . . . . . . . . . . . . . . . 7 58 3.1. Choosing a Type of Matching . . . . . . . . . . . . . . . 7 59 3.2. Filtering . . . . . . . . . . . . . . . . . . . . . . . . 8 60 3.2.1. Basic Filtering . . . . . . . . . . . . . . . . . . . 9 61 3.2.2. Extended Filtering . . . . . . . . . . . . . . . . . . 10 62 3.3. Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . 10 63 4. Other Considerations . . . . . . . . . . . . . . . . . . . . . 14 64 4.1. Choosing Language Ranges . . . . . . . . . . . . . . . . . 14 65 4.2. Meaning of Language Tags and Ranges . . . . . . . . . . . 15 66 4.3. Considerations for Private Use Subtags . . . . . . . . . . 15 67 4.4. Length Considerations in Matching . . . . . . . . . . . . 15 68 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 69 6. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 70 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 71 8. Character Set Considerations . . . . . . . . . . . . . . . . . 20 72 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 73 9.1. Normative References . . . . . . . . . . . . . . . . . . . 21 74 9.2. Informative References . . . . . . . . . . . . . . . . . . 21 75 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 22 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 77 Intellectual Property and Copyright Statements . . . . . . . . . . 24 79 1. Introduction 81 Human beings on our planet have, past and present, used a number of 82 languages. There are many reasons why one would want to identify the 83 language used when presenting or requesting information or in some 84 specific set of information items or "content". 86 One use for language identifiers, such as those defined in 87 [RFC3066bis], is to select content by matching the associated 88 language tags to a user's language preferences. 90 This document defines a syntax (called a language range (Section 2)) 91 for specifying items in the user's language preferences (called a 92 language priority list (Section 2.3)), as well as several schemes for 93 selecting or filtering sets of content by comparing the content's 94 language tags to the user's preferences. Applications, protocols, or 95 specifications will have varying needs and requirements that affect 96 the choice of a suitable matching scheme. Depending on the choice of 97 scheme, there are various options left to the implementation. 98 Protocols that implement a matching scheme either need to specify 99 each particular choice or indicate the options that are left to the 100 implementation to decide. 102 This document is divided into three main sections. One describes how 103 to indicate a user's preferences using language ranges. Then a 104 section describes various schemes for matching these ranges to a set 105 of language tags. There is also a section that deals with various 106 practical considerations that apply to implementing and using these 107 schemes. 109 This document, in combination with [RFC3066bis] (Ed.: replace 110 "3066bis" globally in this document with the RFC number assigned to 111 draft-ietf-ltru-registry-14), replaces [RFC3066], which replaced 112 [RFC1766]. 114 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 116 document are to be interpreted as described in [RFC2119]. 118 2. The Language Range 120 Language Tags [RFC3066bis] are used to identify the language of some 121 information item or content. Applications or protocols that use 122 language tags are often faced with the problem of identifying sets of 123 content that share certain language attributes. For example, 124 HTTP/1.1 [RFC2616] describes one such mechanism in its discussion of 125 the Accept-Language header (Section 14.4), which is used when 126 selecting content from servers based on the language of that content. 128 When selecting content according to its language, it is useful to 129 have a mechanism for identifying sets of language tags that share 130 specific attributes. This allows users to select or filter content 131 based on specific requirements. Such an identifier is called a 132 "Language Range". 134 There are different types of language range, whose specific 135 attributes vary to match their application. Language ranges are 136 similar in content to language tags: they consist of a sequence of 137 subtags separated by hyphens. In a language range, each subtag MUST 138 either be a sequence of ASCII alphanumeric characters or the single 139 character '*' (%2A, ASTERISK). The character '*' is a "wildcard" 140 that matches any sequence of subtags. Restrictions on the meaning 141 and use of wildcards vary according to the type of language range. 143 Language tags and thus language ranges are to be treated as case- 144 insensitive: there exist conventions for the capitalization of some 145 of the subtags, but these MUST NOT be taken to carry meaning. 146 Matching of language tags to language ranges MUST be done in a case- 147 insensitive manner. 149 2.1. Basic Language Range 151 A "basic language range" identifies the set of language tags that all 152 begin with the same sequence of subtags. Each range consists of a 153 sequence of alphanumeric subtags separated by hyphens. The basic 154 language range is defined by the following ABNF [RFC4234]: 156 language-range = (1*8ALPHA *("-" 1*8alphanum)) / "*" 157 alphanum = ALPHA / DIGIT 159 Basic language ranges (originally described by HTTP/1.1 [RFC2616] and 160 later [RFC3066]) have the same syntax as an [RFC3066] language tag or 161 are the single character "*". They differ from the language tags 162 defined in [RFC3066bis] only in that there is no requirement that 163 they be "well-formed" or be validated against the IANA Language 164 Subtag Registry (although such ill-formed ranges will probably not 165 match anything). (Note that the ABNF [RFC4234] in [RFC2616] is 166 incorrect, since it disallows the use of digits anywhere in the 167 'language-range': this is mentioned in the errata) 169 Use of a basic language range seems to imply that there is a semantic 170 relationship between language tags that share the same prefix. While 171 this is often the case, it is not always true and users should note 172 that the set of language tags that match a specific language range 173 may not represent mutually intelligible languages. 175 2.2. Extended Language Range 177 Basic language ranges allow users to specify a set of language tags 178 that share the same initial subtags. Occasionally users will wish to 179 select a set of language tags based on the presence of specific 180 subtags. For example, a user might wish to select all language tags 181 that contains the region subtag 'CH'. Extended language ranges are 182 useful in specifying a particular sequence of subtags that appear in 183 the set of matching tags without having to specify all of the 184 intervening subtags. 186 An extended language range can be represented by the following ABNF: 188 extended-language-range = (1*8ALPHA / "*") 189 *("-" (1*8alphanum / "*")) 191 Figure 2: Extended Language Range 193 The wildcard subtag '*' MAY occur in any position in the extended 194 language range, where it matches any sequence of subtags that might 195 occur in that position in a language tag. However wildcards outside 196 the first position in an extended language range are ignored by most 197 matching schemes. Use of multiple wildcards SHOULD NOT be taken to 198 imply that a certain number of subtags will appear in the matching 199 set of language tags. 201 Implementations that specify basic ranges MAY map extended language 202 ranges to basic language ranges: if the first subtag is a "*" then 203 the entire range is treated as "*" (which matches the default 204 content), otherwise each wildcard subtag is removed. For example, if 205 the language range were "en-*-US", then the range would be mapped to 206 "en-US". 208 2.3. The Language Priority List 210 When users specify a language preference they often need to specify a 211 prioritized list of language ranges in order to best reflect their 212 language preferences. This is especially true for speakers of 213 minority languages. A speaker of Breton in France, for example, may 214 specify "be" followed by "fr", meaning that if Breton is available, 215 it is preferred, but otherwise French is the best alternative. It 216 can get more complex: a speaker may wish to fall back from Skolt Sami 217 to Northern Sami to Finnish. 219 A "Language Priority List" is a prioritized or weighted list of 220 language ranges. One well known example of such a list is the 221 "Accept-Language" header defined in RFC 2616 [RFC2616] (see Section 222 14.4) and RFC 3282 [RFC3282]. A simple list of ranges, i.e. one that 223 contains no weighting information, is considered to be in descending 224 order of priority. 226 The various matching operations described in this document include 227 considerations for using a language priority list. This document 228 does not define any syntax for a language priority list; defining 229 such a syntax is the responsibility of the protocol, application, or 230 implementation that uses it. When given as examples in this 231 document, language priority lists will be shown as a quoted sequence 232 of ranges separated by semicolons, like this: "en; fr; zh-Hant" 233 (which would be read as "English before French before Chinese as 234 written in the Traditional script"). 236 Where a language priority list provides "quality weights" for the 237 language ranges, such as the use of Q weights in the syntax of the 238 "Accept-Language" header (defined in [RFC2616], Section 14.4, and 239 [RFC3282]), language ranges without a weight are given values equal 240 to the value of the previous language range (processing from first to 241 last). If the first language range has no weight, it is given a 242 value of 1.0. Then language ranges with zero weights are removed. 243 For example, "fr, en;q=0.5, de, it" becomes "fr;q=1.0, en;q=0.5, 244 de;q=0.5, it;q=0.5". The language priority list is then sorted from 245 highest priority to lowest, with language ranges that share the same 246 weights remain in the same order as in the original language priority 247 list. 249 3. Types of Matching 251 Matching language ranges to language tags can be done in a number of 252 different ways. This section describes several different matching 253 schemes, as well as the considerations for choosing between them. 254 Protocols and specifications SHOULD clearly indicate the particular 255 mechanism used in selecting or matching language tags. 257 There are several types of matching scheme. This document presents 258 two types: those that produce zero or more information items (called 259 "filtering") and those that produce a single information item for a 260 given request (called "lookup"). 262 Implementations or protocols MAY use different matching schemes than 263 the ones described in this document, as long as those mechanisms are 264 clearly specified. 266 3.1. Choosing a Type of Matching 268 Applications, protocols, and specifications are faced with the 269 decision of what type of matching to use. Sometimes, different 270 styles of matching might be suited for different kinds of processing 271 within a particular application or protocol. 273 Language tag matching is a tool, and does not by itself specify a 274 complete procedure for the use of language tags. Such procedures are 275 intimately tied to the application protocol in which they occur. 276 When specifying a protocol operation using matching, the protocol 277 MUST specify: 279 o Which type(s) of language tag matching it uses 281 o Whether the operation returns a single result (lookup) or a 282 possibly empty set of results (filtering) 284 o For lookup, what the result is when no matching tag is found. For 285 instance, a protocol might define the result as failure of the 286 operation, an empty value, returning some protocol defined or 287 implementation defined default, or returning i-default [RFC2277]. 289 This document describes three types of matching: 291 1. Basic Filtering (Section 3.2.1) matches a language priority list 292 consisting of basic language ranges (Section 2.1) to sets of 293 language tags. 295 2. Extended Filtering (Section 3.2.2) matches a language priority 296 list consisting of extended language ranges (Section 2.2) to sets 297 of language tags. 299 3. Lookup (Section 3.3) matches a language priority list consisting 300 of basic language ranges to sets of language tags find the 301 _exactly_ one language tag that best matches the range. 303 Both types of filtering can be used to produce a set of results (such 304 as a collection of documents) by comparing the user's preferences to 305 language tags associated with the set of content. For example, when 306 performing a search, one might use filtering to limit the results to 307 documents tagged as being written in French. They might also be used 308 when deciding whether to perform a language-sensitive process on some 309 content. For example, a process might cause paragraphs whose 310 language tag matched the language range "nl" to be displayed in 311 italics within a document. 313 Lookup produces the single result that best matches a given set of 314 user preferences, so it is useful in cases in which only a single 315 item can be returned. For example, if a process were to insert a 316 human readable error message into a protocol header, it might select 317 the text based on the user's language priority list. Since the 318 process can return only one item, it must choose a single item and it 319 must return some item, even if no content's language tag matches the 320 language priority list supplied by the user. 322 The types of matching in this document are designed so that 323 implementations are not required to validate or understand any of the 324 semantics of the language tags or ranges or of the subtags in them. 325 None of them require access to the IANA Language Subtag Registry (see 326 Section 3 in [RFC3066bis]). This simplifies and speeds the 327 performance of implementations. 329 Regardless of the matching scheme chosen, protocols and 330 implementations MAY canonicalize language tags and ranges by mapping 331 grandfathered and obsolete tags or subtags into modern equivalents. 332 If an implementation canonicalizes either ranges or tags, then the 333 implementation will require the IANA Language Subtag Registry 334 information for that purpose. Implementations MAY also use semantic 335 information external to the registry when matching tags. For 336 example, the primary language subtags 'nn' (Nynorsk Norwegian) and 337 'nb' (Bokmal Norwegian) might both be usefully matched to the more 338 general subtag 'no' (Norwegian). Or an implementation might infer 339 that content labeled "zh-CN" is more likely to match the range "zh- 340 Hans" than equivalent content labeled "zh-TW". 342 3.2. Filtering 344 Filtering is used to select the set of language tags that matches a 345 given language priority list and return the associated content. It 346 is called "filtering" because this set might contain no items at all 347 or it might return an arbitrarily large number of matching items: as 348 many items as match the language priority list, thus "filtering out" 349 the non-matching items. 351 In filtering, the language range represents the _least_ specific 352 (that is, the fewest number of subtags) language tag which is an 353 acceptable match. All of the language tags in the matching set of 354 tags will have an equal or greater number of subtags than the 355 language range. Every non-wildcard subtag in the language range will 356 appear in every one of the matching language tags. For example, if 357 the language priority list consists of the range "de-CH", one might 358 see tags such as "de-CH-1996" but one will never see a tag such as 359 "de" (because the 'CH' subtag is missing). 361 If the language priority list (see Section 2.3) contains more than 362 one range, the content returned is typically ordered in descending 363 level of preference. 365 Some examples of applications where filtering might be appropriate 366 include: 368 o Applying a style to sections of a document in a particular set of 369 languages. 371 o Displaying the set of documents containing a particular set of 372 keywords written in a specific set of languages. 374 o Selecting all email items written in a specific set of languages. 376 The content returned MAY either be ordered or unordered according to 377 the priority in the language priority list (and other criteria), 378 according to the needs of the application or protocol. 380 3.2.1. Basic Filtering 382 When filtering using basic language ranges, each basic language range 383 in the language priority list is considered in turn, according to 384 priority. A particular language tag matches a language range if it 385 exactly equals the tag, or if it exactly equals a prefix of the tag 386 such that the first character following the prefix is "-". For 387 example, the language-range "de-de" matches the language tag "de-DE- 388 1996", but not the language tags "de-Deva" or "de-Latn-DE". 390 The special range "*" in a language priority list matches any tag. A 391 protocol which uses language ranges MAY specify additional rules 392 about the semantics of "*"; for instance, HTTP/1.1 [RFC2616] 393 specifies that the range "*" matches only languages not matched by 394 any other range within an "Accept-Language" header. 396 3.2.2. Extended Filtering 398 When filtering using extended language ranges, each extended language 399 range in the language priority list is considered in turn, according 400 to priority. A particular language range is compared to each 401 language tag using the following process: 403 Compare the first subtag in the extended language tag to the first 404 subtag in the language tag in a case insensitive manner. If the 405 first subtag in the range is "*", it matches any value. Otherwise 406 the two values must match or the overall match fails. 408 Take each non-wildcard subtag in the language range and compare it to 409 the next subtag in the language tag in turn until a matching subtag 410 is found or the langauge tag is exhausted. If the end of the 411 language tag is found first, the match fails. If a match is found, 412 this step is repeated with the next non-wildcard subtag in the 413 language range (and beginning with the next subtag in the language 414 tag) until the list of subtags in the language range is exhausted or 415 the match fails. 417 Subtags not specified, including those at the end of the language 418 range, are thus treated as if assigned the wildcard value "*". 419 Extended filtering works, therefore, much like basic filtering. For 420 example, the extended language range "de-*-DE" matches all of the 421 following tags: 423 de-DE 425 de-Latn-DE 427 de-Latf-DE 429 de-DE-x-goethe 431 de-Latn-DE-1996 433 3.3. Lookup 435 Lookup is used to select the single language tag that best matches 436 the language priority list for a given request and return the 437 associated content. When performing lookup, each language range in 438 the language priority list is considered in turn, according to 439 priority. By contrast with filtering, each language range represents 440 the _most_ specific tag which is an acceptable match. The first 441 content found with a matching tag, according to the user's priority, 442 is considered the closest match and is the content returned. For 443 example, if the language range is "de-ch", a lookup operation might 444 produce content with the tags "de" or "de-CH" but never one with the 445 tag "de-CH-1996". Usually if no content matches the request, the 446 "default" content is returned. 448 For example, if an application inserts some dynamic content into a 449 document, returning an empty string if there is no exact match is not 450 an option. Instead, the application "falls back" until it finds a 451 matching language tag associated with a suitable piece of content to 452 insert. Examples of lookup might include: 454 o Selection of a template containing the text for an automated email 455 response. 457 o Selection of a item containing some text for inclusion in a 458 particular Web page. 460 o Selection of a string of text for inclusion in an error log. 462 In the lookup scheme, the language range is progressively truncated 463 from the end until a matching piece of content is located. For 464 example, starting with the range "zh-Hant-CN-x-private", the lookup 465 progressively searches for content as shown below: 467 Range to match: zh-Hant-CN-x-private 468 1. zh-Hant-CN-x-private 469 2. zh-Hant-CN 470 3. zh-Hant 471 4. zh 472 5. (default content) 474 Figure 3: Example of a Lookup Fallback Pattern 476 This scheme allows some flexibility in finding a match. For example, 477 lookup provides better results for cases in which content is not 478 available that exactly matches the user request than if the default 479 language for the system or content were returned immediately. Not 480 every specific level of tag granularity is usually available or 481 language content may be sparsely populated, so "falling back" through 482 the subtag sequence provides more opportunity to find a match between 483 available language tags and the user's request. 485 The default behavior when no tag matches the language priority list 486 is implementation defined. An implementation might, for example, 487 return content with no language tag; might supply content with an 488 empty language tag value (the built-in attribute xml:lang in [XML10] 489 permits the empty value); might be a particular language designated 490 for the bit of content being selected; or it might select the tag 491 "i-default" (see [RFC2277]). When performing lookup using a language 492 priority list, the progressive search MUST proceed to consider each 493 language range in the list before finding the default content or 494 empty tag. 496 One common way for an application or implementation to provide for a 497 default is to allow a specific language range to be set as the 498 default for a specific type of request. This language range is then 499 treated as if it were appended to the end of the language priority 500 list as a whole, rather than after each item in the language priority 501 list. 503 For example, if a particular user's language priority list were 504 "fr-FR; zh-Hant" and the program doing the matching had a default 505 language range of "ja-JP", the program would search for content as 506 follows: 507 1. fr-FR 508 2. fr 509 3. zh-Hant // next language 510 4. zh 511 5. (search for the default content) 512 a. ja-JP 513 b. ja 514 c. (implementation defined default) 516 Figure 4: Lookup Using a Language Priority List 518 Implementations SHOULD ignore extensions and unrecognized private-use 519 subtags when performing lookup, since these subtags are usually 520 orthogonal to the user's request. 522 The special language range "*" matches any language tag. In the 523 lookup scheme, this range does not convey enough information by 524 itself to determine which content is most appropriate, since it 525 matches everything. If the language range "*" is the only one in the 526 language priority list, it matches the default content. If the 527 language range "*" is followed by other language ranges, it should be 528 skipped. 530 In some cases, the language priority list might contain one or more 531 extended language ranges (as, for example, when the same language 532 priority list is used as input for both lookup and filtering 533 operations). Wildcard values in an extended language range normally 534 match any value that occurs in that position in a language tag. 535 Since only one item can be returned for any given lookup request, 536 wildcards in a language range have to be processed in a consistent 537 manner or the same request will produce widely varying results. 538 Implementations that accept extended language ranges MUST define 539 which content is returned when more than one item matches the 540 extended language range. 542 For example, an implementation could return the matching tag that is 543 first in ASCII-order. If the language range were "*-CH" and the set 544 of tags included "de-CH", "fr-CH", and "it-CH", then the tag "de-CH" 545 would be returned. Another example would be for an implementation to 546 map the extended language ranges to basic ranges. 548 4. Other Considerations 550 When working with language ranges and matching schemes, there are 551 some additional points that may influence the choice of either. 553 4.1. Choosing Language Ranges 555 Users indicate their language preferences via the choice of a 556 language range or the list of language ranges in a language priority 557 list. The type of matching affects what the best choice is for a 558 given user. 560 Most matching schemes make no attempt to process the semantic meaning 561 of the subtags. The language range (or its subtags) is usually 562 compared in a case-insensitive manner to each language tag being 563 matched, using basic string processing. 565 Users SHOULD avoid subtags that add no distinguishing value to a 566 language range. Generally, the fewer subtags that appear in the 567 language range, the more content the range will match. 569 Most notably, script subtags SHOULD NOT be used to form a language 570 range in combination with language subtags that have a matching 571 Suppress-Script field in their registry entry. Thus the language 572 range "en-Latn" is probably inappropriate in most cases (because the 573 vast majority of English documents are written in the Latin script 574 and thus the 'en' language subtag has a Suppress-Script field for 575 'Latn' in the registry). 577 When working with tags and ranges, note that extensions and most 578 private-use subtags are orthogonal to language tag matching, in that 579 they specify additional attributes of the text not related to the 580 goals of most matching schemes. Users SHOULD avoid using these 581 subtags in language ranges, since they interfere with the selection 582 of available content. When used in language tags (as opposed to 583 ranges), these subtags normally do not interfere with filtering 584 (Section 3), since they appear at the end of the tag and will match 585 all prefixes. 587 Private-use and Extension subtags are normally orthogonal to language 588 tag fallback. Implementations or specifications that use a lookup 589 (Section 3.3) matching scheme often ignore unrecognized private-use 590 and extension subtags when performing language tag fallback. In 591 addition, since these subtags are always at the end of the sequence 592 of subtags, their use in language tags normally doesn't interfere 593 with the use of ranges that omit them in the filtering (Section 3.2) 594 matching schemes described below. However, they do interfere with 595 filtering when used in language ranges and SHOULD be avoided in 596 ranges as a result. 598 Applications, specifications, or protocols that choose not to 599 interpret one or more private-use or extension subtags SHOULD NOT 600 remove or modify these extensions in content that they are 601 processing. When a language tag instance is to be used in a 602 specific, known protocol, and is not being passed through to other 603 protocols, language tags MAY be filtered to remove subtags and 604 extensions that are not supported by that protocol. Such filtering 605 SHOULD be avoided, if possible, since it removes information that 606 might be relevant to services on the other end of the protocol that 607 would make use of that information. 609 Some applications of language tags might want or need to consider 610 extensions and private-use subtags when matching tags. If extensions 611 and private-use subtags are included in a matching or filtering 612 process that utilizes one of the schemes described in this document, 613 then the implementation SHOULD canonicalize the language tags and/or 614 ranges before performing the matching. Note that language tag 615 processors that claim to be "well-formed" processors as defined in 616 [RFC3066bis] generally fall into this category. 618 4.2. Meaning of Language Tags and Ranges 620 Selecting content using language ranges requires some understanding 621 by users of what they are selecting. The meaning of the various 622 subtags in a language range are identical to their meaning in a 623 language tag (see Section 4.2 in [RFC3066bis]), with the addition 624 that the wildcard "*" represents any matching sequence of values. 626 4.3. Considerations for Private Use Subtags 628 Private-use subtags require private agreement between the parties 629 that intend to use or exchange language tags that use them and great 630 caution SHOULD be used in employing them in content or protocols 631 intended for general use. Private-use subtags are simply useless for 632 information exchange without prior arrangement. 634 The value and semantic meaning of private-use tags and of the subtags 635 used within such a language tag are not defined. Matching private- 636 use tags using language ranges or extended language ranges can result 637 in unpredictable content being returned. 639 4.4. Length Considerations in Matching 641 Language ranges are very similar to language tags in terms of content 642 and usage. The same types of restrictions on length that apply to 643 language tags could also apply to language ranges. Implementation, 644 protocol, and specificiation authors SHOULD apply the considerations 645 in [RFC3066bis] Section 4.3 (Length Considerations) where appropriate 646 to language ranges and language priority lists. 648 5. IANA Considerations 650 This document presents no new or existing considerations for IANA. 652 6. Changes 654 This is the first version of this document. 656 7. Security Considerations 658 Language ranges used in content negotiation might be used to infer 659 the nationality of the sender, and thus identify potential targets 660 for surveillance. In addition, unique or highly unusual language 661 ranges or combinations of language ranges might be used to track a 662 specific individual's activities. 664 This is a special case of the general problem that anything you send 665 is visible to the receiving party. It is useful to be aware that 666 such concerns can exist in some cases. 668 The evaluation of the exact magnitude of the threat, and any possible 669 countermeasures, is left to each application or protocol. 671 8. Character Set Considerations 673 Language tags permit only the characters A-Z, a-z, 0-9, and HYPHEN- 674 MINUS (%x2D). Language ranges also use the character ASTERISK 675 (%x2A). These characters are present in most character sets, so 676 presentation or exchange of language tags or ranges should not be 677 constrained by character set issues. 679 9. References 681 9.1. Normative References 683 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 684 Requirement Levels", BCP 14, RFC 2119, March 1997. 686 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 687 Languages", BCP 18, RFC 2277, January 1998. 689 [RFC3066bis] 690 Phillips, A., Ed. and M. Davis, Ed., "Tags for the 691 Identification of Languages", October 2005, . 695 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 696 Specifications: ABNF", RFC 4234, October 2005. 698 9.2. Informative References 700 [RFC1766] Alvestrand, H., "Tags for the Identification of 701 Languages", RFC 1766, March 1995. 703 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 704 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 705 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 707 [RFC2616errata] 708 IETF, "HTTP/1.1 Specification Errata", 10 2004, 709 . 711 [RFC3066] Alvestrand, H., "Tags for the Identification of 712 Languages", BCP 47, RFC 3066, January 2001. 714 [RFC3282] Alvestrand, H., "Content Language Headers", RFC 3282, 715 May 2002. 717 [XML10] Bray (et al), T., "Extensible Markup Language (XML) 1.0", 718 02 2004. 720 Appendix A. Acknowledgements 722 Any list of contributors is bound to be incomplete; please regard the 723 following as only a selection from the group of people who have 724 contributed to make this document what it is today. 726 The contributors to [RFC3066bis], [RFC3066] and [RFC1766], each of 727 which is a precursor to this document, made enormous contributions 728 directly or indirectly to this document and are generally responsible 729 for the success of language tags. 731 The following people (in alphabetical order by family name) 732 contributed to this document: 734 Harald Alvestrand, Jeremy Carroll, John Cowan, Martin Duerst, Frank 735 Ellermann, Doug Ewell, Marion Gunn, Kent Karlsson, Ira McDonald, M. 736 Patton, Randy Presuhn, Eric van der Poel, Markus Scherer, and many, 737 many others. 739 Very special thanks must go to Harald Tveit Alvestrand, who 740 originated RFCs 1766 and 3066, and without whom this document would 741 not have been possible. 743 For this particular document, John Cowan originated the scoring 744 scheme. Mark Davis originated the scheme described in Section 3.3. 746 Authors' Addresses 748 Addison Phillips (editor) 749 Yahoo! Inc 751 Email: addison at inter dash locale dot com 753 Mark Davis (editor) 754 Google 756 Email: mark dot davis at macchiato dot com 758 Intellectual Property Statement 760 The IETF takes no position regarding the validity or scope of any 761 Intellectual Property Rights or other rights that might be claimed to 762 pertain to the implementation or use of the technology described in 763 this document or the extent to which any license under such rights 764 might or might not be available; nor does it represent that it has 765 made any independent effort to identify any such rights. Information 766 on the procedures with respect to rights in RFC documents can be 767 found in BCP 78 and BCP 79. 769 Copies of IPR disclosures made to the IETF Secretariat and any 770 assurances of licenses to be made available, or the result of an 771 attempt made to obtain a general license or permission for the use of 772 such proprietary rights by implementers or users of this 773 specification can be obtained from the IETF on-line IPR repository at 774 http://www.ietf.org/ipr. 776 The IETF invites any interested party to bring to its attention any 777 copyrights, patents or patent applications, or other proprietary 778 rights that may cover technology that may be required to implement 779 this standard. Please address the information to the IETF at 780 ietf-ipr@ietf.org. 782 Disclaimer of Validity 784 This document and the information contained herein are provided on an 785 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 786 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 787 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 788 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 789 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 790 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 792 Copyright Statement 794 Copyright (C) The Internet Society (2006). This document is subject 795 to the rights, licenses and restrictions contained in BCP 78, and 796 except as set forth therein, the authors retain all their rights. 798 Acknowledgment 800 Funding for the RFC Editor function is currently provided by the 801 Internet Society.