idnits 2.17.1 draft-ietf-ltru-matching-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 698. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 675. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 682. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 688. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC3066], [19], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 170 has weird spacing: '...schemes that ...' == Line 171 has weird spacing: '...ing and looku...' == Line 373 has weird spacing: '...age tag being...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 13, 2005) is 6922 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'RFC 3066' on line 46 -- Looks like a reference, but probably isn't: 'RFC 2119' on line 101 == Unused Reference: '2' is defined on line 546, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 549, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 554, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 560, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 564, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 567, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 571, but no explicit reference was found in the text == Unused Reference: '11' is defined on line 579, but no explicit reference was found in the text == Unused Reference: '12' is defined on line 583, but no explicit reference was found in the text == Unused Reference: '13' is defined on line 588, but no explicit reference was found in the text == Unused Reference: '14' is defined on line 592, but no explicit reference was found in the text == Unused Reference: '15' is defined on line 596, but no explicit reference was found in the text == Unused Reference: '16' is defined on line 599, but no explicit reference was found in the text == Unused Reference: '17' is defined on line 603, but no explicit reference was found in the text == Unused Reference: '18' is defined on line 608, but no explicit reference was found in the text == Unused Reference: '20' is defined on line 614, but no explicit reference was found in the text == Outdated reference: A later version (-14) exists of draft-ietf-ltru-registry-01 ** Obsolete normative reference: RFC 1327 (ref. '2') (Obsoleted by RFC 2156) ** Obsolete normative reference: RFC 1521 (ref. '3') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 2028 (ref. '4') (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2234 (ref. '7') (Obsoleted by RFC 4234) ** Obsolete normative reference: RFC 2396 (ref. '8') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2434 (ref. '9') (Obsoleted by RFC 5226) ** Obsolete normative reference: RFC 2616 (ref. '10') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Downref: Normative reference to an Informational RFC: RFC 2860 (ref. '11') -- Obsolete informational reference (is this intentional?): RFC 1766 (ref. '18') (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (ref. '19') (Obsoleted by RFC 4646, RFC 4647) Summary: 12 errors (**), 0 flaws (~~), 23 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Quest Software 4 Expires: November 14, 2005 M. Davis, Ed. 5 IBM 6 May 13, 2005 8 Matching Language Identifiers 9 draft-ietf-ltru-matching-00 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on November 14, 2005. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). 40 Abstract 42 This document describes different mechanisms for comparing and 43 matching the tags for the identification of languages defined by [RFC 44 3066bis] [1]. Possible algorithms for language negotiation and 45 content selection are described. Portions of this document obsolete 46 [RFC 3066] [19]. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 2. The Language Range . . . . . . . . . . . . . . . . . . . . . . 4 52 2.1 Basic Language Range . . . . . . . . . . . . . . . . . . . 4 53 2.1.1 Matching . . . . . . . . . . . . . . . . . . . . . . . 5 54 2.1.2 Lookup . . . . . . . . . . . . . . . . . . . . . . . . 5 55 2.2 Extended Language Range . . . . . . . . . . . . . . . . . 6 56 2.2.1 Extended Range Matching . . . . . . . . . . . . . . . 7 57 2.2.2 Extended Range Lookup . . . . . . . . . . . . . . . . 8 58 2.2.3 Scored Matching . . . . . . . . . . . . . . . . . . . 9 59 2.3 Meaning of Language Tags and Ranges . . . . . . . . . . . 10 60 2.4 Choosing Between Alternate Matching Schemes . . . . . . . 11 61 2.5 Considerations for Private Use Subtags . . . . . . . . . . 11 62 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 63 4. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 64 5. Security Considerations . . . . . . . . . . . . . . . . . . . 15 65 6. Character Set Considerations . . . . . . . . . . . . . . . . . 16 66 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 67 7.1 Normative References . . . . . . . . . . . . . . . . . . . 17 68 7.2 Informative References . . . . . . . . . . . . . . . . . . 18 69 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 18 70 A. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 71 Intellectual Property and Copyright Statements . . . . . . . . 20 73 1. Introduction 75 Human beings on our planet have, past and present, used a number of 76 languages. There are many reasons why one would want to identify the 77 language used when presenting or requesting information. 79 Information about a user's language preferences commonly needs to be 80 identified so that appropriate processing can be applied. For 81 example, the user's language preferences in a browser can be used to 82 select web pages appropriately. A choice of language preference can 83 also be used to select among tools (such as dictionaries) to assist 84 in the processing or understanding of content in different languages. 86 Given a set of language identifiers, such as those defined in 87 RFC3066bis, various mechanisms can be envisioned for performing 88 language negotiation and tag matching. The suitability of a 89 particular mechanism to a particular application depends on the needs 90 of that application. 92 This document defines language ranges and syntax for specifying user 93 preferences in a request for language content. It also specifies a 94 default algorithm for matching language ranges to content (language 95 tags), as well as alternate mechanisms suitable for certain 96 applications. 98 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 99 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 100 document are to be interpreted as described in [RFC 2119] [5]. 102 2. The Language Range 104 Language Tags are used to identify the language of some information 105 item or content. Applications that use language tags are often faced 106 with the problem of identifying sets of content that share certain 107 language attributes. For example, HTTP 1.1 [10] describes language 108 ranges in its discussion of the Accept-Language header (Section 109 14.4), which is used for selecting content from servers based on the 110 language of that content. 112 When selecting content according to its language, it is useful to 113 have a mechanism for identifying sets of language tags that share 114 specific attributes. This allows users to select or filter content 115 based on specific requirements. Such an identifier is called a 116 "Language Range". 118 2.1 Basic Language Range 120 A basic language range (such as described in RFC 3066 [19] and HTTP 121 1.1 [10]) is a set of languages whose tags all begin with the same 122 sequence of subtags. A basic language range can be represented by a 123 'language-range' tag, by using the definition from HTTP/1.1 [10] : 124 language-range = language-tag / "*" 126 That is, a language-range has the same syntax as a language-tag or is 127 the single character "*". This definition of language-range implies 128 that there is a semantic relationship between tags that share the 129 same prefix. 131 In particular, the set of language tags that match a specific 132 language-range may not all be mutually intelligible. The use of a 133 prefix when matching tags to language ranges does not imply that 134 language tags are assigned to languages in such a way that it is 135 always true that if a user understands a language with a certain tag, 136 then this user will also understand all languages with tags for which 137 this tag is a prefix. The prefix rule simply allows the use of 138 prefix tags if this is the case. 140 When working with tags and ranges you should also note the following: 142 1. Private-use and Extension subtags are normally orthogonal to 143 language tag fallback. Implementations should ignore 144 unrecognized private-use and extension subtags when performing 145 language tag fallback. Since these subtags are always at the end 146 of the sequence of subtags, they naturally fall out of the 147 default fallback pattern (above). Thus a request to match the 148 tag "en-US-boont-x-1943" would produce exactly the same 149 information content as the example above. 151 2. Implementations that choose not to interpret one or more private- 152 use or extension subtags should not remove or modify these 153 extensions in content that they are processing. When a language 154 tag instance is to be used in a specific, known protocol, and is 155 not being passed through to other protocols, language tags may be 156 filtered to remove subtags and extensions that are not supported 157 by that protocol. This should be done with caution, since it it 158 is removing information that may be relevant if services on the 159 other end of the protocol would make use of that information. 161 3. Some applications of language tags may want or need to consider 162 extensions and private-use subtags when matching tags. If 163 extensions and private-use subtags are included in a matching 164 process that utilizes the default fallback mechanism, then the 165 implementation should canonicalize the language tags and/or 166 ranges before performing the matching. Note that language tag 167 processors that claim to be "well-formed" processors as defined 168 in [1] generally fall into this category. 170 There are two matching schemes that are commonly associated with 171 basic language ranges: matching and lookup. 173 2.1.1 Matching 175 Language tag matching is used to select all content that matches a 176 given prefix. In matching, the language range represents the least 177 specific tag which is an acceptable match and every piece of content 178 that matches is returned. 180 For example, if an application is applying a style to all content in 181 a web page in a particular language, it might use language tag 182 matching to perform the matching. 184 A language-range matches a language-tag if it exactly equals the tag, 185 or if it exactly equals a prefix of the tag such that the first 186 character following the prefix is "-". (That is, the language-range 187 "en-de" matches the language tag "en-DE-boont", but not the language 188 tag "en-Deva".) 190 The special range "*" matches any tag. A protocol which uses 191 language ranges may specify additional rules about the semantics of 192 "*"; for instance, HTTP/1.1 specifies that the range "*" matches only 193 languages not matched by any other range within an "Accept-Language:" 194 header. 196 2.1.2 Lookup 198 Content lookup is used to select the single information item that 199 best matches the language range for a given request. In lookup, the 200 language range represents the most specific tag which is an 201 acceptable match and only the closest matching item is returned. 203 For example, if an application inserts some dynamic content into a 204 web page, returning an empty string if there is no exact match is not 205 an option. Instead, the application "falls back". 207 When performing lookup, the language range is progressively truncated 208 from the end until a matching piece of content is located. For 209 example, starting with the range "zh-Hant-CN-x-wadegile", the lookup 210 would progressively search for content as shown below: 212 Range to match: zh-Hant-CN-x-wadegile 213 1. zh-Hant-CN-x-wadegile 214 2. zh-Hant-CN 215 3. zh-Hant 216 4. zh 217 5. (default content or the empty tag) 219 Figure 2: Default Fallback Pattern Example 221 This scheme allows some flexibility in finding content. It also 222 typically provides better results when data is not available at a 223 specific level of tag granularity or is sparsely populated (than if 224 the default language for the system or content were used). 226 2.2 Extended Language Range 228 Prefix matching using a Basic Language Range, as described above, is 229 not always the most appropriate way to access the information 230 contained in language tags when selecting or filtering content. Some 231 applications may wish to define a more granular matching scheme and 232 such a matching scheme requires the ability to specify the various 233 attributes of a language tag in the language range. An extended 234 language range can be represented by the following ABNF: 235 extended-language-range = grandfathered / privateuse / range 236 range = ( lang [ "-" script ] [ "-" region ] *( "-" variant ) 237 [ "-" privateuse ] ) 238 lang = ( 2*8ALPHA *[ *( "-" extlang ] ) ) / "*" 239 extlang = 3ALPHA / "*" 240 script = 4ALPHA / "*" 241 region = 2ALPHA / 3DIGIT / "*" 242 variant = 5*8alphanum / ( DIGIT 3alphanum ) / "*" 243 privateuse = ( "x" / "X" ) 1*( "-" ( 1*8alphanum ) ) 244 grandfathered = 1*3ALPHA 1*2( "-" ( 2*8alphanum ) ) 245 alphanum = ( ALPHA / DIGIT ) 246 In an extended language range, the identifier takes the form of a 247 series of subtags which must consist of well-formed subtags or the 248 special subtag "*". For example, the language range "en-*-US" 249 specifies a primary language of 'en', followed by any script subtag, 250 followed by the region subtag 'US'. 252 A field not present in the middle of an extended language range MAY 253 be treated as if the field contained a "*". For example, the range 254 "en-US" MAY be considered to be equivalent to the range "en-*-US". 256 There are several matching algorithms or schemes which may be applied 257 when matching extended language ranges to language tags. 259 2.2.1 Extended Range Matching 261 In extended range matching, the subtags in a language tag are 262 compared to the corresponding subtags in the extended language range. 263 A subtag is considered to match if it exactly matches the 264 corresponding subtag in the range or the range contains a subtag with 265 the value "*" (which matches all subtags, including the empty 266 subtag). Extended Range Matching is an extension of basic matching 267 (Section 2.1.1): the language range represents the least specific tag 268 which is an acceptable match. 270 By default all extensions and their subtags are ignored for extended 271 language range matching. 273 Private use subtags may be specified in the language range and MUST 274 NOT be ignored when matching. 276 Subtags not specified, included those at the end of the language 277 range, are assigned the value "*". This makes each range into a 278 prefix much like that used in basic language range matching. For 279 example, the extended language range "zh-*-CN" matches all of the 280 following tags because the unspecified variant field is expanded to 281 "*": 283 zh-Hant-CN 285 zh-CN 287 zh-Hans-CN 289 zh-CN-x-wadegile 291 zh-Latn-CN-boont 293 2.2.2 Extended Range Lookup 295 In extended range lookup, the subtags in a language tag are compared 296 to the corresponding subtags in the extended language range. The 297 subtag is considered to match if it exactly matches the corresponding 298 subtag in the range or the range contains a subtag with the value "*" 299 (which matches all subtags, including the empty subtag). Extended 300 language range lookup is an extension of basic lookup 301 (Section 2.1.2): the language range represents the most specific tag 302 which will form an acceptable match. 304 Subtags not specified are assigned the value "*" prior to performing 305 tag matching. Unlike in extended range matching, however, fields at 306 the end of the range MUST NOT be expanded in this manner. For 307 example, "en-US" must not be considered to be the same as the range 308 "en-US-*". This allows ranges to be specific. The "*" wildcard MUST 309 be used at the end of the range to indicate that all tags with the 310 range as a prefix are allowable matches. That is, the range "zh-*" 311 matches the tags "zh-Hant" and "zh-Hant-CN", while the range "zh" 312 matches neither of those tags. 314 The wildcard "*" at the end of a range SHOULD be considered to match 315 any private use subtag sequences (making extended language range 316 lookup function exactly like extended range matching Section 2.2.1). 318 By default all extensions and their subtags SHOULD be ignored for 319 extended language range lookup. Private use subtags may be specified 320 in the language range and MUST NOT be ignored when performing lookup. 321 The wildcard "*" at the end of a range SHOULD be considered to match 322 any private use subtag sequences in addition to variants. 324 For example, the range "*-US" matches all of the following tags: 326 en-US 328 en-Latn-US 330 en-US-r-extends (extensions are ignored) 332 fr-US 334 For example, the range "en-*-US" matches _none_ of the following 335 tags: 337 fr-US 339 en (missing region US) 340 en-Latn (missing region US) 342 en-Latn-US-scouse (variant field is present) 344 For example, the range "en-*" matches all of the following tags: 346 en-Latn 348 en-Latn-US 350 en-Latn-US-scouse 352 en-US 354 en-scouse 356 It should be noted that the ability to be specific in extended range 357 lookup may make this matching scheme a more appropriate replacement 358 for basic matching than the extended range matching scheme. 360 2.2.3 Scored Matching 362 In the "scored matching" scheme, the extended language range and the 363 language tags are pre-normalized by mapping grandfathered and 364 obsolete tags into modern equivalents. 366 The language range and the language tags are normalized into 367 quadruples of the form (language, script, country, variant), where 368 extended language is considered part of language and x-private-codes 369 are considered part of the language if they are initial and part of 370 the variant if not initial. Missing components are set to "*". An 371 "*" pattern becomes the quadruple ("*", "*", "*", "*"). 373 Each language tag being matched or filtered is assigned a "quality 374 value" such that higher values indicate better matches and lower 375 values indicate worse ones. If the language matches, add 8 to the 376 quality value. If the script matches, add 4 to the quality value. 377 If the region matches, add 2 to the quality value. If the variant 378 matches, add 1 to the quality value. Elements of the quadruples are 379 considered to match if they are the same or if one of them is "*". 381 A value of 15 is a perfect match; 0 is no match at all. Different 382 values may be more or less appropriate for different applications and 383 implementations should probably allow users to choose the most 384 appropriate selection value. 386 2.3 Meaning of Language Tags and Ranges 388 A language tag defines a language as spoken (or written, signed or 389 otherwise signaled) by human beings for communication of information 390 to other human beings. 392 If a language tag B contains language tag A as a prefix, then B is 393 typically "narrower" or "more specific" than A. For example, "zh- 394 Hant-TW" is more specific than "zh-Hant". 396 This relationship is not guaranteed in all cases: specifically, 397 languages that begin with the same sequence of subtags are NOT 398 guaranteed to be mutually intelligible, although they may be. For 399 example, the tag "az" shares a prefix with both "az-Latn" 400 (Azerbaijani written using the Latin script) and "az-Cyrl" 401 (Azerbaijani written using the Cyrillic script). A person fluent in 402 one script may not be able to read the other, even though the text 403 might be otherwise identical. Content tagged as "az" most probably 404 is written in just one script and thus might not be intelligible to a 405 reader familiar with the other script. 407 The relationship between the tag and the information it relates to is 408 defined by the standard describing the context in which it appears. 409 Accordingly, this section can only give possible examples of its 410 usage. 412 o For a single information object, the associated language tags 413 might be interpreted as the set of languages that is required for 414 a complete comprehension of the complete object. Example: Plain 415 text documents. 417 o For an aggregation of information objects, the associated language 418 tags could be taken as the set of languages used inside components 419 of that aggregation. Examples: Document stores and libraries. 421 o For information objects whose purpose is to provide alternatives, 422 the associated language tags could be regarded as a hint that the 423 content is provided in several languages, and that one has to 424 inspect each of the alternatives in order to find its language or 425 languages. In this case, the presence of multiple tags might not 426 mean that one needs to be multi-lingual to get complete 427 understanding of the document. Example: MIME multipart/ 428 alternative. 430 o In markup languages, such as HTML and XML, language information 431 can be added to each part of the document identified by the markup 432 structure (including the whole document itself). For example, one 433 could write C'est la vie. inside a 434 Norwegian document; the Norwegian-speaking user could then access 435 a French-Norwegian dictionary to find out what the marked section 436 meant. If the user were listening to that document through a 437 speech synthesis interface, this formation could be used to signal 438 the synthesizer to appropriately apply French text-to-speech 439 pronunciation rules to that span of text, instead of misapplying 440 the Norwegian rules. 442 2.4 Choosing Between Alternate Matching Schemes 444 Implementations MAY choose to implement different styles of matching 445 for different kinds of processing. For example, an implementation 446 could treat an absent script subtag as a "wildcard" field; thus 447 "az-AZ" would match "az-AZ", "az-Cyrl-AZ", "az-Latn-AZ", etc. but not 448 "az" (this is extended range lookup). If one item is to be chosen, 449 the implementation could pick among those matches based on other 450 information, such as the most likely script used in the language/ 451 region in question or the script used by other content selected. 453 Because the primary language subtag cannot be absent in a language 454 tag, the 'UND' subtag may sometimes be used as a 'wildcard' in basic 455 matching. For example, in a query where you want to select all 456 language tags that contain 'Latn' as the script code and 'AZ' as the 457 region code, you could use the range "und-Latn-AZ". This requires an 458 implementation to examine the actual values of the subtags, though. 459 The matching schemes described elsewhere in this document do not 460 require implementations to examine the values supplied and, except 461 for scored matching, they do not require access to the Language 462 Subtag Registry nor the use of valid subtags in language tags or 463 ranges. This has great benefit for speed and simplicity of 464 implementation. 466 Implementations may also wish to use semantic information external to 467 the langauge tags when performing fallback. For example, the primary 468 language subtags 'nn' (Nynorsk Norwegian) and 'nb' (Bokmal Norwegian) 469 might both be usefully matched to the more general subtag 'no' 470 (Norwegian). Or an application might infer that content labeled 471 "zh-CN" is morely likely to match the range "zh-Hans" than equivalent 472 content labeled "zh-TW". 474 2.5 Considerations for Private Use Subtags 476 Private-use subtags require private agreement between the parties 477 that intend to use or exchange language tags that use them and great 478 caution should be used in employing them in content or protocols 479 intended for general use. Private-use subtags are simply useless for 480 information exchange without prior arrangement. 482 The value and semantic meaning of private-use tags and of the subtags 483 used within such a language tag are not defined. Matching private 484 use tags using language ranges or extended language ranges may result 485 in unpredictable content being returned. 487 3. IANA Considerations 489 This document presents no new or existing considerations for IANA. 491 4. Changes 493 This is the first version of this document. Changes from the 494 reference work (draft-phillips-matching-00) are too numerious to 495 record. 497 5. Security Considerations 499 The only security issue that has been raised with language tags since 500 the publication of RFC 1766, which stated that "Security issues are 501 believed to be irrelevant to this memo", is a concern with language 502 ranges used in content negotiation - that they may be used to infer 503 the nationality of the sender, and thus identify potential targets 504 for surveillance. 506 This is a special case of the general problem that anything you send 507 is visible to the receiving party. It is useful to be aware that 508 such concerns can exist in some cases. 510 The evaluation of the exact magnitude of the threat, and any possible 511 countermeasures, is left to each application protocol. 513 Although the specification of valid subtags for an extension MUST be 514 available over the Internet, implementations SHOULD NOT mechanically 515 depend on it being always accessible, to prevent denial-of-service 516 attacks. 518 6. Character Set Considerations 520 The syntax in this document requires that language ranges use only 521 the characters A-Z, a-z, 0-9, and HYPHEN-MINUS legal in language 522 tags. These characters are present in most character sets, so 523 presentation of language tags should not have any character set 524 issues. 526 Rendering of characters based on the content of a language tag is not 527 addressed in this memo. Historically, some languages have relied on 528 the use of specific character sets or other information in order to 529 infer how a specific character should be rendered (notably this 530 applies to language and culture specific variations of Han ideographs 531 as used in Japanese, Chinese, and Korean). When language tags are 532 applied to spans of text, rendering engines may use that information 533 in deciding which font to use in the absence of other information, 534 particularly where languages with distinct writing traditions use the 535 same characters. 537 7. References 539 7.1 Normative References 541 [1] Phillips, A., Ed. and M. Davis, Ed., "Tags for the 542 Identification of Languages (Internet-Draft)", February 2005, < 543 http://www.ietf.org/internet-drafts/ 544 draft-ietf-ltru-registry-01.txt>. 546 [2] Hardcastle-Kille, S., "Mapping between X.400(1988) / ISO 10021 547 and RFC 822", RFC 1327, May 1992. 549 [3] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail 550 Extensions) Part One: Mechanisms for Specifying and Describing 551 the Format of Internet Message Bodies", RFC 1521, 552 September 1993. 554 [4] Hovey, R. and S. Bradner, "The Organizations Involved in the 555 IETF Standards Process", BCP 11, RFC 2028, October 1996. 557 [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement 558 Levels", BCP 14, RFC 2119, March 1997. 560 [6] Freed, N. and K. Moore, "MIME Parameter Value and Encoded Word 561 Extensions: Character Sets, Languages, and Continuations", 562 RFC 2231, November 1997. 564 [7] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 565 Specifications: ABNF", RFC 2234, November 1997. 567 [8] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 568 Resource Identifiers (URI): Generic Syntax", RFC 2396, 569 August 1998. 571 [9] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA 572 Considerations Section in RFCs", BCP 26, RFC 2434, 573 October 1998. 575 [10] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., 576 Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- 577 HTTP/1.1", RFC 2616, June 1999. 579 [11] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 580 Understanding Concerning the Technical Work of the Internet 581 Assigned Numbers Authority", RFC 2860, June 2000. 583 [12] Yergeau, F., "UTF-8, a transformation format of ISO 10646", 584 STD 63, RFC 3629, November 2003. 586 7.2 Informative References 588 [13] International Organization for Standardization, "ISO 639- 589 1:2002, Codes for the representation of names of languages -- 590 Part 1: Alpha-2 code", ISO Standard 639, 2002. 592 [14] International Organization for Standardization, "ISO 639-2:1998 593 - Codes for the representation of names of languages -- Part 2: 594 Alpha-3 code - edition 1", August 1988. 596 [15] ISO TC46/WG3, "ISO 15924:2003 (E/F) - Codes for the 597 representation of names of scripts", January 2004. 599 [16] International Organization for Standardization, "Codes for the 600 representation of names of countries, 3rd edition", 601 ISO Standard 3166, August 1988. 603 [17] Statistical Division, United Nations, "Standard Country or Area 604 Codes for Statistical Use", UN Standard Country or Area Codes 605 for Statistical Use, Revision 4 (United Nations publication, 606 Sales No. 98.XVII.9, June 1999. 608 [18] Alvestrand, H., "Tags for the Identification of Languages", 609 RFC 1766, March 1995. 611 [19] Alvestrand, H., "Tags for the Identification of Languages", 612 BCP 47, RFC 3066, January 2001. 614 [20] Klyne, G. and C. Newman, "Date and Time on the Internet: 615 Timestamps", RFC 3339, July 2002. 617 Authors' Addresses 619 Addison Phillips (editor) 620 Quest Software 622 Email: addison dot phillips at quest dot com 624 Mark Davis (editor) 625 IBM 627 Email: mark dot davis at ibm dot com 629 Appendix A. Acknowledgements 631 Any list of contributors is bound to be incomplete; please regard the 632 following as only a selection from the group of people who have 633 contributed to make this document what it is today. 635 The contributors to RFC 3066 and RFC 1766, the precursors of this 636 document, made enormous contributions directly or indirectly to this 637 document and are generally responsible for the success of language 638 tags. 640 The following people (in alphabetical order) contributed to this 641 document or to RFCs 1766 and 3066: 643 Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet, 644 Nathaniel Borenstein, Eric Brunner, Sean M. Burke, Jeremy Carroll, 645 John Clews, Jim Conklin, Peter Constable, John Cowan, Mark Crispin, 646 Dave Crocker, Martin Duerst, Michael Everson, Doug Ewell, Ned Freed, 647 Tim Goodwin, Dirk-Willem van Gulik, Marion Gunn, Joel Halpren, 648 Elliotte Rusty Harold, Paul Hoffman, Richard Ishida, Olle Jarnefors, 649 Kent Karlsson, John Klensin, Alain LaBonte, Eric Mader, Keith Moore, 650 Chris Newman, Masataka Ohta, George Rhoten, Markus Scherer, Keld Jorn 651 Simonsen, Thierry Sourbier, Otto Stolz, Tex Texin, Andrea Vine, Rhys 652 Weatherley, Misha Wolf, Francois Yergeau and many, many others. 654 Very special thanks must go to Harald Tveit Alvestrand, who 655 originated RFCs 1766 and 3066, and without whom this document would 656 not have been possible. Special thanks must go to Michael Everson, 657 who has served as language tag reviewer for almost the complete 658 period since the publication of RFC 1766. Special thanks to Doug 659 Ewell, for his production of the first complete subtag registry, and 660 his work in producing a test parser for verifying language tags. 662 For this particular document, John Cowan originated the scheme 663 described in Section 2.2.3. Mark Davis originated the scheme 664 described in the Section 2.1.2. 666 Intellectual Property Statement 668 The IETF takes no position regarding the validity or scope of any 669 Intellectual Property Rights or other rights that might be claimed to 670 pertain to the implementation or use of the technology described in 671 this document or the extent to which any license under such rights 672 might or might not be available; nor does it represent that it has 673 made any independent effort to identify any such rights. Information 674 on the procedures with respect to rights in RFC documents can be 675 found in BCP 78 and BCP 79. 677 Copies of IPR disclosures made to the IETF Secretariat and any 678 assurances of licenses to be made available, or the result of an 679 attempt made to obtain a general license or permission for the use of 680 such proprietary rights by implementers or users of this 681 specification can be obtained from the IETF on-line IPR repository at 682 http://www.ietf.org/ipr. 684 The IETF invites any interested party to bring to its attention any 685 copyrights, patents or patent applications, or other proprietary 686 rights that may cover technology that may be required to implement 687 this standard. Please address the information to the IETF at 688 ietf-ipr@ietf.org. 690 Disclaimer of Validity 692 This document and the information contained herein are provided on an 693 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 694 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 695 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 696 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 697 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 698 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 700 Copyright Statement 702 Copyright (C) The Internet Society (2005). This document is subject 703 to the rights, licenses and restrictions contained in BCP 78, and 704 except as set forth therein, the authors retain all their rights. 706 Acknowledgment 708 Funding for the RFC Editor function is currently provided by the 709 Internet Society.