idnits 2.17.1 draft-ietf-ltru-4646bis-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2920. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2931. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2938. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2944. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC4646, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 30, 2007) is 6204 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2860 ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) ** Downref: Normative reference to an Informational RFC: RFC 4645 -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) -- Obsolete informational reference (is this intentional?): RFC 4646 (Obsoleted by RFC 5646) Summary: 6 errors (**), 0 flaws (~~), 2 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc. 4 Obsoletes: 4646 (if approved) M. Davis, Ed. 5 Intended status: Best Current Google 6 Practice April 30, 2007 7 Expires: November 1, 2007 9 Tags for Identifying Languages 10 draft-ietf-ltru-4646bis-05 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on November 1, 2007. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2007). 41 Abstract 43 This document describes the structure, content, construction, and 44 semantics of language tags for use in cases where it is desirable to 45 indicate the language used in an information object. It also 46 describes how to register values for use in language tags and the 47 creation of user-defined extensions for private interchange. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 5 53 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 5 54 2.2. Language Subtag Sources and Interpretation . . . . . . . . 8 55 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . 9 56 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11 57 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 12 58 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 13 59 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 15 60 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 16 61 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 17 62 2.2.8. Grandfathered Registrations . . . . . . . . . . . . . 18 63 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 18 64 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 20 65 3.1. Format of the IANA Language Subtag Registry . . . . . . . 20 66 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 20 67 3.1.2. Record Definitions . . . . . . . . . . . . . . . . . . 21 68 3.1.3. Subtag and Tag Fields . . . . . . . . . . . . . . . . 23 69 3.1.4. Description Field . . . . . . . . . . . . . . . . . . 24 70 3.1.5. Deprecated Field . . . . . . . . . . . . . . . . . . . 25 71 3.1.6. Preferred-Value Field . . . . . . . . . . . . . . . . 25 72 3.1.7. Prefix Field . . . . . . . . . . . . . . . . . . . . . 26 73 3.1.8. Comments Field . . . . . . . . . . . . . . . . . . . . 27 74 3.1.9. Suppress-Script Field . . . . . . . . . . . . . . . . 27 75 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 27 76 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 28 77 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 29 78 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 33 79 3.6. Possibilities for Registration . . . . . . . . . . . . . . 37 80 3.7. Extensions and Extensions Registry . . . . . . . . . . . . 39 81 3.8. Update of the Language Subtag Registry . . . . . . . . . . 42 82 4. Formation and Processing of Language Tags . . . . . . . . . . 43 83 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 43 84 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 47 85 4.3. Length Considerations . . . . . . . . . . . . . . . . . . 48 86 4.3.1. Working with Limited Buffer Sizes . . . . . . . . . . 48 87 4.3.2. Truncation of Language Tags . . . . . . . . . . . . . 49 89 4.4. Canonicalization of Language Tags . . . . . . . . . . . . 50 90 4.5. Considerations for Private Use Subtags . . . . . . . . . . 52 91 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 53 92 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 53 93 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 53 94 6. Security Considerations . . . . . . . . . . . . . . . . . . . 55 95 7. Character Set Considerations . . . . . . . . . . . . . . . . . 56 96 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 57 97 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 60 98 9.1. Normative References . . . . . . . . . . . . . . . . . . . 60 99 9.2. Informative References . . . . . . . . . . . . . . . . . . 61 100 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 63 101 Appendix B. Examples of Language Tags (Informative) . . . . . . . 64 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 67 103 Intellectual Property and Copyright Statements . . . . . . . . . . 68 105 1. Introduction 107 Human beings on our planet have, past and present, used a number of 108 languages. There are many reasons why one would want to identify the 109 language used when presenting or requesting information. 111 A user's language preferences often need to be identified so that 112 appropriate processing can be applied. For example, the user's 113 language preferences in a Web browser can be used to select Web pages 114 appropriately. Language preferences can also be used to select among 115 tools (such as dictionaries) to assist in the processing or 116 understanding of content in different languages. 118 In addition, knowledge about the particular language used by some 119 piece of information content might be useful or even required by some 120 types of processing; for example, spell-checking, computer- 121 synthesized speech, Braille transcription, or high-quality print 122 renderings. 124 One means of indicating the language used is by labeling the 125 information content with an identifier or "tag". These tags can be 126 used to specify user preferences when selecting information content, 127 or for labeling additional attributes of content and associated 128 resources. 130 Tags can also be used to indicate additional language attributes of 131 content. For example, indicating specific information about the 132 dialect, writing system, or orthography used in a document or 133 resource may enable the user to obtain information in a form that 134 they can understand, or it can be important in processing or 135 rendering the given content into an appropriate form or style. 137 This document specifies a particular identifier mechanism (the 138 language tag) and a registration function for values to be used to 139 form tags. It also defines a mechanism for private use values and 140 future extension. 142 This document replaces [RFC4646], which replaced [RFC3066] and its 143 predecessor [RFC1766]. For a list of changes in this document, see 144 Section 8. 146 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 147 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 148 document are to be interpreted as described in [RFC2119]. 150 2. The Language Tag 152 Language tags are used to help identify languages, whether spoken, 153 written, signed, or otherwise signaled, for the purpose of 154 communication. This includes constructed and artificial languages, 155 but excludes languages not intended primarily for human 156 communication, such as programming languages. 158 2.1. Syntax 160 The language tag is composed of one or more parts, known as 161 "subtags". Each subtag consists of a sequence of alphanumeric 162 characters. Subtags are distinguished and separated from one another 163 by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a 164 "primary language" subtag and a (possibly empty) series of subsequent 165 subtags, each of which refines or narrows the range of languages 166 identified by the overall tag. 168 Usually, each type of subtag is distinguished by length, position in 169 the tag, and content: subtags can be recognized solely by these 170 features. The only exception to this is a fixed list of 171 grandfathered tags registered under RFC 3066 [RFC3066]. This makes 172 it possible to construct a parser that can extract and assign some 173 semantic information to the subtags, even if the specific subtag 174 values are not recognized. Thus, a parser need not have an up-to- 175 date copy (or any copy at all) of the subtag registry to perform most 176 searching and matching operations. 178 The syntax of the language tag in ABNF [RFC4234] is: 180 Language-Tag = langtag 181 / privateuse ; private use tag 182 / irregular ; tags grandfathered by rule 184 langtag = (language 185 ["-" script] 186 ["-" region] 187 *("-" variant) 188 *("-" extension) 189 ["-" privateuse]) 191 language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code 192 / 4ALPHA ; reserved for future use 193 / 5*8ALPHA ; registered language subtag 195 extlang = *3("-" 3ALPHA) ; specific ISO 639-3 codes 197 script = 4ALPHA ; ISO 15924 code 199 region = 2ALPHA ; ISO 3166 code 200 / 3DIGIT ; UN M.49 code 202 variant = 5*8alphanum ; registered variants 203 / (DIGIT 3alphanum) 205 extension = singleton 1*("-" (2*8alphanum)) 207 singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT 208 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" 209 ; Single alphanumerics 210 ; "x" is reserved for private use 212 privateuse = "x" 1*("-" (1*8alphanum)) 214 irregular = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default" 215 / "i-enochian" / "i-hak" / "i-klingon" / "i-lux" 216 / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" 217 / "i-tay" / "i-tsu" / "sgn-BE-fr" / "sgn-BE-nl" 218 / "sgn-CH-de" 220 alphanum = (ALPHA / DIGIT) ; letters and numbers 222 Figure 1: Language Tag ABNF 224 All subtags have a maximum length of eight characters and whitespace 225 is not permitted in a language tag. There is a subtlety in the ABNF 226 production 'variant': variants starting with a digit MAY be four 227 characters long, while those starting with a letter MUST be at least 228 five characters long. For examples of language tags, see Appendix B. 230 Note Well: the ABNF syntax does not distinguish between upper and 231 lowercase. The appearance of upper and lowercase letters in the 232 varous ABNF productions above do not affect how implementations 233 interpret tags. That is, the tag "I-AMI" matches the item "i-ami" in 234 the 'irregular' production. At all times, the tags and their 235 subtags, including private use and extensions, are to be treated as 236 case insensitive: there exist conventions for the capitalization of 237 some of the subtags, but these MUST NOT be taken to carry meaning. 239 For example: 241 o [ISO639-1] recommends that language codes be written in lowercase 242 ('mn' Mongolian). 244 o [ISO3166-1] recommends that country codes be capitalized ('MN' 245 Mongolia). 247 o [ISO15924] recommends that script codes use lowercase with the 248 initial letter capitalized ('Cyrl' Cyrillic). 250 However, in the tags defined by this document, the uppercase US-ASCII 251 letters in the range 'A' through 'Z' are considered equivalent and 252 mapped directly to their US-ASCII lowercase equivalents in the range 253 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from 254 "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of 255 these variations conveys the same meaning: Mongolian written in the 256 Cyrillic script as used in Mongolia. 258 Although case distinctions do not carry meaning in language tags, 259 consistent formatting and presentation of the tags will aid users. 260 The format of the tags and subtags in the registry is RECOMMENDED. 261 In this format, all non-initial two-letter subtags are uppercase, all 262 non-initial four-letter subtags are titlecase, and all other subtags 263 are lowercase. 265 Note that although [RFC4234] refers to octets, the language tags 266 described in this document are sequences of characters from the US- 267 ASCII [ISO646] repertoire. Language tags MAY be used in documents 268 and applications that use other encodings, so long as these encompass 269 the US-ASCII repertoire. An example of this would be an XML document 270 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 272 2.2. Language Subtag Sources and Interpretation 274 The namespace of language tags and their subtags is administered by 275 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 276 the rules in Section 5 of this document. The Language Subtag 277 Registry maintained by IANA is the source for valid subtags: other 278 standards referenced in this section provide the source material for 279 that registry. 281 Terminology used in this document: 283 o Tag or tags refers to a complete language tag, such as 284 "sr-Latn-RS" or "az-Arab-IR". Examples of tags in this document 285 are enclosed in double-quotes ("en-US"). 287 o Subtag refers to a specific section of a tag, delimited by hyphen, 288 such as the subtag 'Hant' in "zh-Hant-CN". Examples of subtags in 289 this document are enclosed in single quotes ('Hant'). 291 o Code or codes refers to values defined in external standards (and 292 which are used as subtags in this document). For example, 'Hant' 293 is an [ISO15924] script code that was used to define the 'Hant' 294 script subtag for use in a language tag. Examples of codes in 295 this document are enclosed in single quotes ('en', 'Hant'). 297 The definitions in this section apply to the various subtags within 298 the language tags defined by this document, excepting those 299 "grandfathered" tags defined in Section 2.2.8. 301 Language tags are designed so that each subtag type has unique length 302 and content restrictions. These make identification of the subtag's 303 type possible, even if the content of the subtag itself is 304 unrecognized. This allows tags to be parsed and processed without 305 reference to the latest version of the underlying standards or the 306 IANA registry and makes the associated exception handling when 307 parsing tags simpler. 309 Subtags in the IANA registry that do not come from an underlying 310 standard can only appear in specific positions in a tag. 311 Specifically, they can only occur as primary language subtags or as 312 variant subtags. 314 Note that sequences of private use and extension subtags MUST occur 315 at the end of the sequence of subtags and MUST NOT be interspersed 316 with subtags defined elsewhere in this document. 318 Single-letter and single-digit subtags are reserved for current or 319 future use. These include the following current uses: 321 o The single-letter subtag 'x' is reserved to introduce a sequence 322 of private use subtags. The interpretation of any private use 323 subtags is defined solely by private agreement and is not defined 324 by the rules in this section or in any standard or registry 325 defined in this document. 327 o All other single-letter subtags are reserved to introduce 328 standardized extension subtag sequences as described in 329 Section 3.7. 331 The single-letter subtag 'i' is used by some grandfathered tags, such 332 as "i-default", where it always appears in the first position and 333 cannot be confused with an extension. 335 2.2.1. Primary Language Subtag 337 The primary language subtag is the first subtag in a language tag 338 (with the exception of private use and certain grandfathered tags) 339 and cannot be omitted. The following rules apply to the primary 340 language subtag: 342 1. All two-character primary language subtags were defined in the 343 IANA registry according to the assignments found in the standard 344 ISO 639 Part 1, "ISO 639-1:2002, Codes for the representation of 345 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 346 assignments subsequently made by the ISO 639-1 registration 347 authority (RA) or governing standardization bodies. 349 2. All three-character primary language subtags were defined in the 350 IANA registry according to the assignments found in either ISO 351 639 Part 2, "ISO 639-2:1998 - Codes for the representation of 352 names of languages -- Part 2: Alpha-3 code - edition 1" 353 [ISO639-2], ISO 639 Part 3, "Codes for the representation of 354 names of languages -- Part 3: Alpha-3 code for comprehensive 355 coverage of languages" [ISO639-3], or assignments subsequently 356 made by the relevant ISO 639 registration authorities or 357 governing standardization bodies. 359 3. The subtags in the range 'qaa' through 'qtz' are reserved for 360 private use in language tags. These subtags correspond to codes 361 reserved by ISO 639-2 for private use. These codes MAY be used 362 for non-registered primary language subtags (instead of using 363 private use subtags following 'x-'). Please refer to Section 4.5 364 for more information on private use subtags. 366 4. All four-character language subtags are reserved for possible 367 future standardization. 369 5. All language subtags of 5 to 8 characters in length in the IANA 370 registry were defined via the registration process in Section 3.5 371 and MAY be used to form the primary language subtag. At the time 372 this document was created, there were no examples of this kind of 373 subtag and future registrations of this type will be discouraged: 374 primary languages are strongly RECOMMENDED for registration with 375 ISO 639, and proposals rejected by ISO 639/RA will be closely 376 scrutinized before they are registered with IANA. 378 6. The single-character subtag 'x' as the primary subtag indicates 379 that the language tag consists solely of subtags whose meaning is 380 defined by private agreement. For example, in the tag "x-fr-CH", 381 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 382 French language or the country of Switzerland (or any other value 383 in the IANA registry) unless there is a private agreement in 384 place to do so. See Section 4.5. 386 7. The single-character subtag 'i' is used by some grandfathered 387 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 388 grandfathered tags have a primary language subtag in their first 389 position.) 391 8. Other values MUST NOT be assigned to the primary subtag except by 392 revision or update of this document. 394 Note: For languages that have both an ISO 639-1 two-character code 395 and a three character code assigned by either ISO 639-2 or ISO 639-3, 396 only the ISO 639-1 two-character code is defined in the IANA 397 registry. 399 Note: For languages that have no ISO 639-1 two-character code and for 400 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 401 (Bibliographic) codes differ, only the Terminology code is defined in 402 the IANA registry. At the time this document was created, all 403 languages that had both kinds of three-character code were also 404 assigned a two-character code; it is expected that future assignments 405 of this nature will not occur. 407 Note: To avoid problems with versioning and subtag choice as 408 experienced during the transition between RFC 1766 and RFC 3066, as 409 well as the canonical nature of subtags defined by this document, the 410 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 411 RA-JAC) has included the following statement in [iso639.prin]: 413 "A language code already in ISO 639-2 at the point of freezing ISO 414 639-1 shall not later be added to ISO 639-1. This is to ensure 415 consistency in usage over time, since users are directed in 416 Internet applications to employ the alpha-3 code when an alpha-2 417 code for that language is not available." 419 In order to avoid instability in the canonical form of tags, if a 420 two-character code is added to ISO 639-1 for a language for which a 421 three-character code was already included in either ISO 639-2 or ISO 422 639-3, the two-character code MUST NOT be registered. See 423 Section 3.4. 425 For example, if some content were tagged with 'haw' (Hawaiian), which 426 currently has no two-character code, the tag would not be invalidated 427 if ISO 639-1 were to assign a two-character code to the Hawaiian 428 language at a later date. 430 Note: An example of independent primary language subtag registration 431 might include: one of the grandfathered IANA registrations is 432 "i-enochian". The subtag 'enochian' could be registered in the IANA 433 registry as a primary language subtag (assuming that ISO 639 does not 434 register this language first), making tags such as "enochian-AQ" and 435 "enochian-Latn" valid. 437 2.2.2. Extended Language Subtags 439 Extended language subtags are used to identify languages that are 440 encompassed by a "macrolanguage". ISO 639-3 defines certain 441 languages to be "macrolanguages"; that is, they are groups of very 442 closely related languages which are treated as a single language in 443 certain contexts. In order to improve matching behavior and tagging 444 consistency, each language encompassed by a ISO 639-3 macrolanguage 445 is represented in the IANA registry using an extended language 446 subtag, provided that it is not already represented using a language 447 subtag. The following rules apply to the extended language subtags: 449 1. These subtags were defined in the IANA registry according to 450 assignments found in ISO 639 Part 3. 452 2. A sequence of up to three extended language subtags MAY appear in 453 a language tag. This sequence MUST follow the primary language 454 subtag and precede any other subtags. 456 3. Each extended language subtag MUST only be used with the exact 457 sequence of subtags that appears in the 'Prefix' field in its 458 registry record. 460 4. There MAY be up to three extended language subtags. 462 5. Other values MUST NOT be assigned to the extended language subtag 463 except by revision or update of this document. 465 Extended language subtag records MUST include exactly one 'Prefix' 466 field indicating an appropriate subtag or sequence of subtags for 467 that extended language subtag. 469 For example, the 'gan' and 'cmn' subtags represent the languages Gan 470 Chinese and Mandarin Chinese. Each is encompassed by the 471 macrolanguage 'zh' (Chinese). Therefore, they both have the prefix 472 "zh" in their registry records. Consequently, Gan Chinese is 473 represented as "zh-gan" and Mandarin Chinese as "zh-cmn". The 474 language subtag 'zh' can still be used without an extended language 475 subtag to label a resource as some unspecified variety of Chinese 476 (which in practice will usually be Mandarin, the dominant variety of 477 Chinese, but might also be some other variety). 479 Now suppose that, in the future, the ISO 639-3 Registration Authority 480 were to decide that Gan Chinese is actually two different closely 481 related languages: it might reclassify 'gan' as a macrolanguage and 482 introduce two new code elements. In that case, these code elements 483 would be added to the IANA registry as extended language subtags with 484 prefixes of "zh-gan". No change would be made to the registry record 485 for 'gan'. 487 2.2.3. Script Subtag 489 Script subtags are used to indicate the script or writing system 490 variations that distinguish the written forms of a language or its 491 dialects. The following rules apply to the script subtags: 493 1. All four-character subtags were defined according to 494 [ISO15924]--"Codes for the representation of the names of 495 scripts": alpha-4 script codes, or subsequently assigned by the 496 ISO 15924 maintenance agency or governing standardization bodies, 497 denoting the script or writing system used in conjunction with 498 this language. 500 2. Script subtags MUST immediately follow the primary language 501 subtag and all extended language subtags and MUST occur before 502 any other type of subtag described below. 504 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 505 use in language tags. These subtags correspond to codes reserved 506 by ISO 15924 for private use. These codes MAY be used for non- 507 registered script values. Please refer to Section 4.5 for more 508 information on private use subtags. 510 4. Script subtags MUST NOT be registered using the process in 511 Section 3.5 of this document. Variant subtags MAY be considered 512 for registration for that purpose. 514 5. There MUST be at most one script subtag in a language tag, and 515 the script subtag SHOULD be omitted when it adds no 516 distinguishing value to the tag or when the primary language 517 subtag's record includes a Suppress-Script field listing the 518 applicable script subtag. 520 Example: "sr-Latn" represents Serbian written using the Latin script. 522 2.2.4. Region Subtag 524 Region subtags are used to indicate linguistic variations associated 525 with or appropriate to a specific country, territory, or region. 526 Typically, a region subtag is used to indicate regional dialects or 527 usage, or region-specific spelling conventions. A region subtag can 528 also be used to indicate that content is expressed in a way that is 529 appropriate for use throughout a region, for instance, Spanish 530 content tailored to be useful throughout Latin America. 532 The following rules apply to the region subtags: 534 1. Region subtags MUST follow any language, extended language, or 535 script subtags and MUST precede all other subtags. 537 2. All two-character subtags following the primary subtag were 538 defined in the IANA registry according to the assignments found 539 in [ISO3166-1] ("Codes for the representation of names of 540 countries and their subdivisions -- Part 1: Country codes") using 541 the list of alpha-2 country codes, or using assignments 542 subsequently made by the ISO 3166 maintenance agency or governing 543 standardization bodies. 545 3. All three-character subtags consisting of digit (numeric) 546 characters following the primary subtag were defined in the IANA 547 registry according to the assignments found in UN Standard 548 Country or Area Codes for Statistical Use [UN_M.49] or 549 assignments subsequently made by the governing standards body. 550 Note that not all of the UN M.49 codes are defined in the IANA 551 registry. The following rules define which codes are entered 552 into the registry as valid subtags: 554 A. UN numeric codes assigned to 'macro-geographical 555 (continental)' or sub-regions MUST be registered in the 556 registry. These codes are not associated with an assigned 557 ISO 3166 alpha-2 code and represent supra-national areas, 558 usually covering more than one nation, state, province, or 559 territory. 561 B. UN numeric codes for 'economic groupings' or 'other 562 groupings' MUST NOT be registered in the IANA registry and 563 MUST NOT be used to form language tags. 565 C. UN numeric codes for countries or areas with ambiguous ISO 566 3166 alpha-2 codes, when entered into the registry, MUST be 567 defined according to the rules in Section 3.4 and MUST be 568 used to form language tags that represent the country or 569 region for which they are defined. 571 D. UN numeric codes for countries or areas for which there is an 572 associated ISO 3166 alpha-2 code in the registry MUST NOT be 573 entered into the registry and MUST NOT be used to form 574 language tags. Note that the ISO 3166-based subtag in the 575 registry MUST actually be associated with the UN M.49 code in 576 question. 578 E. UN numeric codes and ISO 3166 alpha-2 codes for countries or 579 areas listed as eligible for registration in [RFC4645] but 580 not presently registered MAY be entered into the IANA 581 registry via the process described in Section 3.5. Once 582 registered, these codes MAY be used to form language tags. 584 F. All other UN numeric codes for countries or areas that do not 585 have an associated ISO 3166 alpha-2 code MUST NOT be entered 586 into the registry and MUST NOT be used to form language tags. 587 For more information about these codes, see Section 3.4. 589 4. Note: The alphanumeric codes in Appendix X of the UN document 590 MUST NOT be entered into the registry and MUST NOT be used to 591 form language tags. (At the time this document was created, 592 these values matched the ISO 3166 alpha-2 codes.) 594 5. There MUST be at most one region subtag in a language tag and the 595 region subtag MAY be omitted, as when it adds no distinguishing 596 value to the tag. 598 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 599 reserved for private use in language tags. These subtags 600 correspond to codes reserved by ISO 3166 for private use. These 601 codes MAY be used for private use region subtags (instead of 602 using a private use subtag sequence). Please refer to 603 Section 4.5 for more information on private use subtags. 605 "de-CH" represents German ('de') as used in Switzerland ('CH'). 607 "sr-Latn-RS" represents Serbian ('sr') written using Latin script 608 ('Latn') as used in Serbia ('RS'). 610 "es-419" represents Spanish ('es') appropriate to the UN-defined 611 Latin America and Caribbean region ('419'). 613 2.2.5. Variant Subtags 615 Variant subtags are used to indicate additional, well-recognized 616 variations that define a language or its dialects that are not 617 covered by other available subtags. The following rules apply to the 618 variant subtags: 620 1. Variant subtags are not associated with any external standard. 621 Variant subtags and their meanings are defined by the 622 registration process defined in Section 3.5. 624 2. Variant subtags MUST follow all of the other defined subtags, but 625 precede any extension or private use subtag sequences. 627 3. More than one variant MAY be used to form the language tag. 629 4. Variant subtags MUST be registered with IANA according to the 630 rules in Section 3.5 of this document before being used to form 631 language tags. In order to distinguish variants from other types 632 of subtags, registrations MUST meet the following length and 633 content restrictions: 635 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 636 at least five characters long. 638 2. Variant subtags that begin with a digit (0-9) MUST be at 639 least four characters long. 641 Variant subtag records in the language subtag registry MAY include 642 one or more 'Prefix' fields, which indicate the language tag or tags 643 that would make a suitable prefix (with other subtags, as 644 appropriate) in forming a language tag with the variant. For 645 example, the subtag 'nedis' has a Prefix of "sl", making it suitable 646 to form language tags such as "sl-nedis" and "sl-IT-nedis", but not 647 suitable for use in a tag such as "zh-nedis" or "it-IT-nedis". 649 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 651 "de-CH-1996" represents German as used in Switzerland and as written 652 using the spelling reform beginning in the year 1996 C.E. 654 Most variants that share a prefix are mutually exclusive. For 655 example, the German orthographic variations '1996' and '1901' SHOULD 656 NOT be used in the same tag, as they represent the dates of different 657 spelling reforms. A variant that can meaningfully be used in 658 combination with another variant SHOULD include a 'Prefix' field in 659 its registry record that lists that other variant. For example, if 660 another German variant 'example' were created that made sense to use 661 with '1996', then 'example' should include two Prefix fields: "de" 662 and "de-1996". 664 2.2.6. Extension Subtags 666 Extensions provide a mechanism for extending language tags for use in 667 various applications. See Section 3.7. The following rules apply to 668 extensions: 670 1. Extension subtags are separated from the other subtags defined 671 in this document by a single-character subtag ("singleton"). 672 The singleton MUST be one allocated to a registration authority 673 via the mechanism described in Section 3.7 and MUST NOT be the 674 letter 'x', which is reserved for private use subtag sequences. 676 2. Note: Private use subtag sequences starting with the singleton 677 subtag 'x' are described in Section 2.2.7 below. 679 3. An extension MUST follow at least a primary language subtag. 680 That is, a language tag cannot begin with an extension. 681 Extensions extend language tags, they do not override or replace 682 them. For example, "a-value" is not a well-formed language tag, 683 while "de-a-value" is. 685 4. Each singleton subtag MUST appear at most one time in each tag 686 (other than as a private use subtag). That is, singleton 687 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 688 ccc" is invalid because the subtag 'a' appears twice. Note that 689 the tag "en-a-bbb-x-a-ccc" is valid because the second 690 appearance of the singleton 'a' is in a private use sequence. 692 5. Extension subtags MUST meet all of the requirements for the 693 content and format of subtags defined in this document. 695 6. Extension subtags MUST meet whatever requirements are set by the 696 document that defines their singleton prefix and whatever 697 requirements are provided by the maintaining authority. 699 7. Each extension subtag MUST be from two to eight characters long 700 and consist solely of letters or digits, with each subtag 701 separated by a single '-'. 703 8. Each singleton MUST be followed by at least one extension 704 subtag. For example, the tag "tlh-a-b-foo" is invalid because 705 the first singleton 'a' is followed immediately by another 706 singleton 'b'. 708 9. Extension subtags MUST follow all language, extended language, 709 script, region, and variant subtags in a tag. 711 10. All subtags following the singleton and before another singleton 712 are part of the extension. Example: In the tag "fr-a-Latn", the 713 subtag 'Latn' does not represent the script subtag 'Latn' 714 defined in the IANA Language Subtag Registry. Its meaning is 715 defined by the extension 'a'. 717 11. In the event that more than one extension appears in a single 718 tag, the tag SHOULD be canonicalized as described in 719 Section 4.4. 721 For example, if the prefix singleton 'r' and the shown subtags were 722 defined, then the following tag would be a valid example: "en-Latn- 723 GB-boont-r-extended-sequence-x-private" 725 2.2.7. Private Use Subtags 727 Private use subtags are used to indicate distinctions in language 728 important in a given context by private agreement. The following 729 rules apply to private use subtags: 731 1. Private use subtags are separated from the other subtags defined 732 in this document by the reserved single-character subtag 'x'. 734 2. Private use subtags MUST conform to the format and content 735 constraints defined in the ABNF for all subtags. 737 3. Private use subtags MUST follow all language, extended language, 738 script, region, variant, and extension subtags in the tag. 739 Another way of saying this is that all subtags following the 740 singleton 'x' MUST be considered private use. Example: The 741 subtag 'US' in the tag "en-x-US" is a private use subtag. 743 4. A tag MAY consist entirely of private use subtags. 745 5. No source is defined for private use subtags. Use of private use 746 subtags is by private agreement only. 748 6. Private use subtags are NOT RECOMMENDED where alternatives exist 749 or for general interchange. See Section 4.5 for more information 750 on private use subtag choice. 752 For example: Users who wished to utilize codes from the Ethnologue 753 publication of SIL International for language identification might 754 agree to exchange tags such as "az-Arab-x-AZE-derbend". This example 755 contains two private use subtags. The first is 'AZE' and the second 756 is 'derbend'. 758 2.2.8. Grandfathered Registrations 760 Prior to RFC 4646, whole language tags were registered according to 761 the rules in RFC 1766 and/or RFC 3066. These registered tags 762 maintain their validity. Of those tags, those that were made 763 obsolete or redundant by the advent of RFC 4646, by this document, or 764 by subsequent registration of subtags are maintained in the registry 765 in records as "redundant" records. Those tags that do not match the 766 'langtag' production in the ABNF in this document or that contain 767 subtags that do not individually appear in the registry are 768 maintained in the registry in records of the "grandfathered" type. 770 Grandfathered tags contain one or more subtags that are not defined 771 in the Language Subtag Registry (see Section 3). Redundant tags 772 consist entirely of subtags defined above and whose independent 773 registration was superseded by [RFC4646]. For more information see 774 Section 3.8. 776 Some grandfathered tags are "regular" in that they match the 777 'langtag' production in Figure 1. In some cases, these tags could 778 become redundant if their (current unregistered) subtags were to be 779 registered (as variants, for example). In other cases, although the 780 subtags match the language tag pattern, the meaning assigned to the 781 various subtags is prohibited by rules elsewhere in this document. 782 Those tags can never become redundant. 784 The remaining grandfathered tags are "irregular" and do not match the 785 'langtag' production. These are listed in the 'irregular' production 786 in Figure 1. These grandfathered tags can never become redundant. 787 Many of these tags have been superseded by other registrations: their 788 record contains a Preferred-Value field that really ought to be used 789 to form language tags representing that value. 791 2.2.9. Classes of Conformance 793 Implementations sometimes need to describe their capabilities with 794 regard to the rules and practices described in this document. Tags 795 can be checked or verified in a number of ways, but two particular 796 classes of tag conformance are formally defined here. 798 A tag is considered "well-formed" if it conforms to the ABNF 799 (Section 2.1). Note that irregular grandfathered tags are now listed 800 in the 'irregular' production. 802 A tag is considered "valid" if it well-formed and it also satisfies 803 these conditions: 805 o The tag is either a grandfathered tag, or all of its language, 806 extended language, script, region, and variant subtags appear in 807 the IANA language subtag registry as of the particular registry 808 date. 810 o There are no duplicate singleton (extension) subtags and no 811 duplicate variant subtags. 813 o For each subtag that has a 'Prefix' field in the registry, the 814 Prefix matches the language tag using Extended Filtering 815 [RFC4647]. That is, each subtag in the Prefix is present in the 816 tag and in the same order. For example, the Prefix "zh-TW" 817 matches the tag "zh-Hant-TW". 819 Note that a tag's validity depends on the date of the registry used 820 to validate the tag. A more-recent copy of the registry might 821 contain a subtag that an older version does not. 823 A tag is considered "valid" for a given extension (Section 3.7) (as 824 of a particular version, revision, and date) if it meets the criteria 825 for "valid" above and also satisfies this condition: 827 Each subtag used in the extension part of the tag is valid 828 according to the extension. 830 3. Registry Format and Maintenance 832 This section defines the Language Subtag Registry and the maintenance 833 and update procedures associated with it, as well as a registry for 834 extensions to language tags (Section 3.7). 836 The Language Subtag Registry contains a comprehensive list of all of 837 the subtags valid in language tags. This allows implementers a 838 straightforward and reliable way to validate language tags. The 839 Language Subtag Registry will be maintained so that, except for 840 extension subtags, it is possible to validate all of the subtags that 841 appear in a language tag under the provisions of this document or its 842 revisions or successors. In addition, the meaning of the various 843 subtags will be unambiguous and stable over time. (The meaning of 844 private use subtags, of course, is not defined by the IANA registry.) 846 3.1. Format of the IANA Language Subtag Registry 848 The IANA Language Subtag Registry ("the registry") consists of a text 849 file that is machine readable in the format described in this 850 section, plus copies of the registration forms approved in accordance 851 with the process described in Section 3.5. The existing registration 852 forms for grandfathered and redundant tags taken from RFC 3066 will 853 be maintained as part of the obsolete RFC 3066 registry. The 854 remaining set of initial subtags will not have registration forms 855 created for them. 857 3.1.1. File Format 859 The registry is in the text format described below. This format was 860 based on the record-jar format described in [record-jar]. 862 Each line of text is limited to 72 characters, including all 863 whitespace. Records are separated by lines containing only the 864 sequence "%%" (%x25.25). 866 Each field can be viewed as a single, logical line of ASCII 867 characters, comprising a field-name and a field-body separated by a 868 COLON character (%x3A). For convenience, the field-body portion of 869 this conceptual entity can be split into a multiple-line 870 representation; this is called "folding". The format of the registry 871 is described by the following ABNF (per [RFC4234]): 873 registry = record *("%%" CRLF record) 874 record = 1*( field-name *SP ":" *SP field-body CRLF ) 875 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] 876 field-body = *(ASCCHAR/LWSP) 877 ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 878 UNICHAR = "&#x" 2*6HEXDIG ";" 880 Figure 2: Registry Format ABNF 882 The sequence '..' (%x2E.2E) in a field-body denotes a range of 883 values. Such a range represents all subtags of the same length that 884 are in alphabetic or numeric order within that range, including the 885 values explicitly mentioned. For example 'a..c' denotes the values 886 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 887 '13'. 889 Characters from outside the US-ASCII [ISO646] repertoire, as well as 890 the AMPERSAND character ("&", %x26) when it occurs in a field-body, 891 are represented by a "Numeric Character Reference" using hexadecimal 892 notation in the style used by [XML10] (see 893 ). This consists of the 894 sequence "&#x" (%x26.23.78) followed by a hexadecimal representation 895 of the character's code point in [ISO10646] followed by a closing 896 semicolon (%x3B). For example, the EURO SIGN, U+20AC, would be 897 represented by the sequence "€". Note that the hexadecimal 898 notation MAY have between two and six digits. 900 All fields whose field-body contains a date value use the "full-date" 901 format specified in [RFC3339]. For example: "2004-06-28" represents 902 June 28, 2004, in the Gregorian calendar. 904 3.1.2. Record Definitions 906 There are three types of records in the registry: "File-Date", 907 "Subtag", and "Tag" records. 909 The first record in the registry is a "File-Date" record. This 910 record contains the single field whose field-name is "File-Date" (see 911 Figure 2). The field-body of this record contains the last 912 modification date of this copy of the registry, making it possible to 913 compare different versions of the registry. The registry on the IANA 914 website is the most current. Versions with an older date than that 915 one are not up-to-date. 917 File-Date: 2004-06-28 918 %% 920 Figure 3: Example of the File-Date Record 922 Subsequent records represent either subtags or tags in the registry. 923 "Subtag" records contain a field with a field-name of "Subtag", 924 while, unsurprisingly, "Tag" records contain a field with a field- 925 name of "Tag". Each of the fields in each record MUST occur no more 926 than once, unless otherwise noted below. Each record MUST contain 927 the following fields: 929 o 'Type' 931 * Type's field-body MUST consist of one of the following strings: 932 "language", "extlang", "script", "region", "variant", 933 "grandfathered", and "redundant" and denotes the type of tag or 934 subtag. 936 o Either 'Subtag' or 'Tag' 938 * Subtag's field-body contains the subtag being defined. This 939 field MUST only appear in records of whose 'Type' has one of 940 these values: "language", "extlang", "script", "region", or 941 "variant". 943 * Tag's field-body contains a complete language tag. This field 944 MUST only appear in records whose 'Type' has one of these 945 values: "grandfathered" or "redundant". Note that the field- 946 body will always follow the 'grandfathered' production in the 947 ABNF in Section 2.1 949 o Description 951 * Description's field-body contains a non-normative description 952 of the subtag or tag. 954 o Added 956 * Added's field-body contains the date the record was added to 957 the registry. 959 Each record MAY also contain the following fields: 961 o Preferred-Value 963 * For fields of type 'script', 'region', and 'variant', 964 'Preferred-Value' contains the subtag of the same 'Type' that 965 is preferred for forming the language tag. 967 * For fields of type 'language' and 'extlang', 'Preferred-Value' 968 contains the language production (see Figure 1) that is 969 preferred when forming the language tag. This can be simply a 970 'language' subtag, or it can be a 'language' subtag followed by 971 an extended language sequence. 973 * For fields of type 'grandfathered' and 'redundant', a canonical 974 mapping to a complete language tag. 976 o Deprecated 978 * Deprecated's field-body contains the date the record was 979 deprecated. 981 o Prefix 983 * Prefix's field-body contains a language tag with which this 984 subtag MAY be used to form a new language tag, perhaps with 985 other subtags as well. This field MUST only appear in records 986 whose 'Type' field-body is 'variant' or 'extlang'. For 987 example, the 'Prefix' for the variant 'nedis' is 'sl', meaning 988 that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate 989 while the tag "is-nedis" is not. 991 o Comments 993 * Comments contains additional information about the subtag, as 994 deemed appropriate for understanding the registry and 995 implementing language tags using the subtag or tag. 997 o Suppress-Script 999 * Suppress-Script contains a script subtag that SHOULD NOT be 1000 used to form language tags with the associated primary language 1001 subtag. This field MUST only appear in records whose 'Type' 1002 field-body is 'language'. See Section 4.1. 1004 Future versions of this document might add additional fields to the 1005 registry, so implementations SHOULD ignore fields found in the 1006 registry that are not defined in this document. 1008 3.1.3. Subtag and Tag Fields 1010 The 'Subtag' field MUST use lowercase letters to form the subtag, 1011 with two exceptions. Subtags whose 'Type' field is 'script' (in 1012 other words, subtags defined by ISO 15924) MUST use titlecase. 1013 Subtags whose 'Type' field is 'region' (in other words, subtags 1014 defined by ISO 3166) MUST use uppercase. These exceptions mirror the 1015 use of case in the underlying standards. 1017 Each subtag in the tags contained in a 'Tag' field MUST be formatted 1018 using the rules in the preceeding paragraph. That is, all subtags 1019 are lowercase except for subtags that represent script or region 1020 codes. 1022 3.1.4. Description Field 1024 The field 'Description' contains a description of the tag or subtag 1025 in the record. The 'Description' field MAY appear more than once per 1026 record, that is, there can be multiple descriptions for a given 1027 record. At least one of the 'Description' fields MUST be written or 1028 transcribed into the Latin script; additional 'Description' fields 1029 MAY also include a description in a non-Latin script. Each 1030 'Description' field MUST be unique, both within the record in which 1031 it appears and for the collection of records of the same type. 1032 Moreover, formatting variations of the same description MUST NOT 1033 occur in that specific record or in any other record of the same 1034 type. For example, while the ISO 639-1 code 'fy' contains both the 1035 descriptions "Western Frisian" and "Frisian, Western", only one of 1036 these descriptions appears in the registry. 1038 The 'Description' field is used for identification purposes and 1039 SHOULD NOT be taken to represent the actual native name of the 1040 language or variation or to be in any particular language. 1042 For records taken from a source standard (such as ISO 639 or ISO 1043 3166), the 'Description' value(s) SHOULD also be taken from the 1044 source standard. Multiple descriptions in the source standard MUST 1045 be split into separate 'Description' fields. The source standard's 1046 descriptions MAY be edited, either prior to insertion or via the 1047 registration process. For fields of type 'language' or 'extlang', 1048 the first 'Description' field appearing in the Registry corresponds 1049 to the Reference Name assigned by ISO 639-3. This helps facilitate 1050 cross-referencing between ISO 639 and the registry. 1052 When creating or updating a record due to the action of one of the 1053 source standards, the Language Subtag Reviewer SHOULD remove 1054 duplicate or redundant descriptions and MAY edit descriptions to 1055 correct irregularities in formatting (such as misspellings, 1056 inappropriate apostrophes or other punctuation, or excessive or 1057 missing spaces) prior to submitting the proposed record to the ietf- 1058 languages list. 1060 Note: Descriptions in registry entries that correspond to ISO 639, 1061 ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate 1062 the meaning of that identifier as defined in the source standard at 1063 the time it was added to the registry. The description does not 1064 replace the content of the source standard itself. The descriptions 1065 are not intended to be the English localized names for the subtags. 1067 Localization or translation of language tag and subtag descriptions 1068 is out of scope of this document. 1070 3.1.5. Deprecated Field 1072 The field 'Deprecated' MAY be added to any record via the maintenance 1073 process described in Section 3.3 or via the registration process 1074 described in Section 3.5. Usually, the addition of a 'Deprecated' 1075 field is due to the action of one of the standards bodies, such as 1076 ISO 3166, withdrawing a code. In some historical cases, it might not 1077 have been possible to reconstruct the original deprecation date. For 1078 these cases, an approximate date appears in the registry. Although 1079 valid in language tags, subtags and tags with a 'Deprecated' field 1080 are deprecated and validating processors SHOULD NOT generate these 1081 subtags. Note that a record that contains a 'Deprecated' field and 1082 no corresponding 'Preferred-Value' field has no replacement mapping. 1084 3.1.6. Preferred-Value Field 1086 The field 'Preferred-Value' contains a mapping between the record in 1087 which it appears and another tag or subtag. The value in this field 1088 is strongly RECOMMENDED as the best choice to represent the value of 1089 this record when selecting a language tag. These values form three 1090 groups: 1092 1. ISO 639 language codes that were later withdrawn in favor of 1093 other codes. These values are mostly a historical curiosity. 1095 2. ISO 3166 region codes that have been withdrawn in favor of a new 1096 code. This sometimes happens when a country changes its name or 1097 administration in such a way that warrants a new region code. 1099 3. Grandfathered or redundant tags from RFC 3066. In many cases, 1100 these tags have become obsolete because the values they represent 1101 were later encoded by ISO 639. 1103 Records that contain a 'Preferred-Value' field MUST also have a 1104 'Deprecated' field. This field contains a date of deprecation. 1105 Thus, a language tag processor can use the registry to construct the 1106 valid, non-deprecated set of subtags for a given date. In addition, 1107 for any given tag, a processor can construct the set of valid 1108 language tags that correspond to that tag for all dates up to the 1109 date of the registry. The ability to do these mappings MAY be 1110 beneficial to applications that are matching, selecting, for 1111 filtering content based on its language tags. 1113 Note that 'Preferred-Value' mappings in records of type 'region' 1114 sometimes do not represent exactly the same meaning as the original 1115 value. There are many reasons for a country code to be changed, and 1116 the effect this has on the formation of language tags will depend on 1117 the nature of the change in question. 1119 In particular, the 'Preferred-Value' field does not imply retagging 1120 content that uses the affected subtag. 1122 The field 'Preferred-Value' MUST NOT be modified once created in the 1123 registry. The field MAY be added to records according to the rules 1124 in Section 3.3. 1126 The 'Preferred-Value' field in records of type "grandfathered" and 1127 "redundant" contains whole language tags that are strongly 1128 RECOMMENDED for use in place of the record's value. In many cases, 1129 the mappings were created by deprecation of the tags during the 1130 period before this document was adopted. For example, the tag "no- 1131 nyn" was deprecated in favor of the ISO 639-1-defined language code 1132 'nn'. 1134 3.1.7. Prefix Field 1136 The field of type 'Prefix' MUST NOT be removed from any record. The 1137 field-body for this type of field MAY be modified, but only if the 1138 modification broadens the meaning of the subtag. That is, the field- 1139 body can be replaced only by a prefix a prefix of itself. For 1140 example, the Prefix "be-Latn" (Belarusian, Latin script) could be 1141 replaced by the Prefix "be" (Belarusian) but not by the Prefix "ru- 1142 Latn" (Russian, Latin script). 1144 The field-body of the 'Prefix' field consists of a language tag whose 1145 subtags are appropriate to use with this subtag. For example, the 1146 variant subtag '1996' has a 'Prefix' field of "de". This means that 1147 tags starting with the sequence "de-" are appropriate with this 1148 subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while 1149 the tag "fr-1996" is an inappropriate choice. 1151 Records of type 'variant' MAY have more than one field of type 1152 'Prefix'. Additional fields of this type MAY be added to a 'variant' 1153 record via the registration process. 1155 The field-body of the 'Prefix' field MUST NOT conflict with any 1156 'Prefix' already registered for a given record. Such a conflict 1157 would occur when when no valid tag could be constructed that would 1158 contain the prefix, such as when when two subtags each have a 1159 'Prefix' that contains the other subtag. For example, suppose that 1160 the subtag 'avariant' has the prefix "es-bvariant". Then the subtag 1161 'bvariant' cannot given the prefix 'avariant', for that would require 1162 a tag of the form "es-avariant-bvariant-avariant", which would not be 1163 valid. 1165 Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. 1167 3.1.8. Comments Field 1169 The field 'Comments' MAY appear more than once per record. This 1170 field MAY be inserted or changed via the registration process and no 1171 guarantee of stability is provided. The content of this field is not 1172 restricted, except by the need to register the information, the 1173 suitability of the request, and by reasonable practical size 1174 limitations. 1176 3.1.9. Suppress-Script Field 1178 The field 'Suppress-Script' MUST only appear in records whose 'Type' 1179 field-body is 'language'. This field MUST NOT appear more than one 1180 time in a record. This field indicates a script used to write the 1181 overwhelming majority of documents for the given language and that 1182 therefore adds no distinguishing information to a language tag. It 1183 helps ensure greater compatibility between the language tags 1184 generated according to the rules in this document and language tags 1185 and tag processors or consumers based on RFC 3066. For example, 1186 virtually all Icelandic documents are written in the Latin script, 1187 making the subtag 'Latn' redundant in the tag "is-Latn". 1189 Many language subtag records do not have a Suppress-Script field. 1190 The lack of a Suppress-Script might indicate that the language is 1191 customarily written in more than one script or that the language is 1192 not customarily written at all. It might also mean that sufficient 1193 information was not available when the record was created and thus 1194 remains a candidate for future registration. 1196 3.2. Language Subtag Reviewer 1198 The Language Subtag Reviewer moderates the ietf-languages mailing 1199 list, responds to requests for registration, and performs the other 1200 registry maintenance duties described in Section 3.3. Only the 1201 Language Subtag Reviewer is permitted to request IANA to change, 1202 update, or add records to the Language Subtag Registry. The Language 1203 Subtag Reviewer MAY delegate list moderation and other clerical 1204 duties as needed. 1206 The Language Subtag Reviewer is appointed by the IESG for an 1207 indefinite term, subject to removal or replacement at the IESG's 1208 discretion. The IESG will solicit nominees for the position 1209 (initially or upon a vacancy) and seek to ascertain the candidates' 1210 qualifications. 1212 The subsequent performance or decisions of the Language Subtag 1213 Reviewer MAY be appealed to the IESG under the same rules as other 1214 IETF decisions (see [RFC2026]). The IESG can reverse or overturn the 1215 decision of the Language Subtag Reviewer, provide guidance, or take 1216 other appropriate actions. 1218 3.3. Maintenance of the Registry 1220 Maintenance of the registry requires that as codes are assigned or 1221 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1222 Subtag Reviewer MUST evaluate each change, determine whether it 1223 conflicts with existing registry entries, and submit the information 1224 to IANA for inclusion in the registry. If a change takes place and 1225 the Language Subtag Reviewer does not do this in a timely manner, 1226 then any interested party MAY use the procedure in Section 3.5 to 1227 register the appropriate update. 1229 Note: The redundant and grandfathered entries together are the 1230 complete list of tags registered under [RFC3066]. The redundant tags 1231 are those that can now be formed using the subtags defined in the 1232 registry together with the rules of Section 2.2. The grandfathered 1233 entries include those that can never be legal under those same 1234 provisions plus those tags that contain subtags not yet registered 1235 or, perhaps, inappropriate for registration. 1237 The set of redundant and grandfathered tags is permanent and stable: 1238 new entries in this section MUST NOT be added and existing entries 1239 MUST NOT be removed. Records of type 'grandfathered' MAY have their 1240 type converted to 'redundant'; see item 12 in Section 3.6 for more 1241 information. The decision-making process about which tags were 1242 initially grandfathered and which were made redundant is described in 1243 [RFC4645]. 1245 RFC 3066 tags that were deprecated prior to the adoption of [RFC4646] 1246 are part of the list of grandfathered tags, and their component 1247 subtags were not included as registered variants (although they 1248 remain eligible for registration). For example, the tag "art-lojban" 1249 was deprecated in favor of the language subtag 'jbo'. 1251 The Language Subtag Reviewer MUST ensure that new subtags meet the 1252 requirements in Section 4.1 or submit an appropriate alternate subtag 1253 as described in that section. When either a change or addition to 1254 the registry is needed, the Language Subtag Reviewer MUST prepare the 1255 complete record, including all fields, and forward it to IANA for 1256 insertion into the registry. Each record being modified or inserted 1257 MUST be forwarded in a separate message. 1259 If a record represents a new subtag that does not currently exist in 1260 the registry, then the message's subject line MUST include the word 1261 "INSERT". If the record represents a change to an existing subtag, 1262 then the subject line of the message MUST include the word "MODIFY". 1263 The message MUST contain both the record for the subtag being 1264 inserted or modified and the new File-Date record. Here is an 1265 example of what the body of the message might contain: 1267 LANGUAGE SUBTAG MODIFICATION 1268 File-Date: 2005-01-02 1269 %% 1270 Type: variant 1271 Subtag: nedis 1272 Description: Natisone dialect 1273 Description: Nadiza dialect 1274 Added: 2003-10-09 1275 Prefix: sl 1276 Comments: This is a comment shown 1277 as an example. 1278 %% 1280 Figure 4: Example of a Language Subtag Modification Form 1282 Whenever an entry is created or modified in the registry, the 'File- 1283 Date' record at the start of the registry is updated to reflect the 1284 most recent modification date in the [RFC3339] "full-date" format. 1286 Before forwarding a new registration to IANA, the Language Subtag 1287 Reviewer MUST ensure that values in the 'Subtag' field match case 1288 according to the description in Section 3.1. 1290 3.4. Stability of IANA Registry Entries 1292 The stability of entries and their meaning in the registry is 1293 critical to the long-term stability of language tags. The rules in 1294 this section guarantee that a specific language tag's meaning is 1295 stable over time and will not change. 1297 These rules specifically deal with how changes to codes (including 1298 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1299 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1300 Subtag Registry. Assignments to the IANA Language Subtag Registry 1301 MUST follow the following stability rules: 1303 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1304 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1305 guaranteed to be stable over time. 1307 2. Values in the 'Description' field MUST NOT be changed in a way 1308 that would invalidate previously-existing tags. They MAY be 1309 broadened somewhat in scope, changed to add information, or 1310 adapted to the most common modern usage. For example, countries 1311 occasionally change their official names; a historical example 1312 of this would be "Upper Volta" changing to "Burkina Faso". 1314 3. Values in the field 'Prefix' MAY be added to records of type 1315 'variant' via the registration process. If a prefix is added to 1316 a variant record, 'Comment' fields SHOULD be used to explain 1317 different usages with the various prefixes. 1319 4. Values in the field 'Prefix' in records of type 'variant' MAY be 1320 modified, so long as the modifications broaden the set of 1321 prefixes. That is, a prefix MAY be replaced by one of its own 1322 prefixes. For example, the prefix "en-US" could be replaced by 1323 "en", but not by the prefixes "en-Latn", "fr", or "en-US-boont". 1324 If one of those prefixes were needed, a new Prefix SHOULD be 1325 registered. 1327 5. Values in the field 'Prefix' in records of type 'extlang' MUST 1328 NOT be modified. 1330 6. Values in the field 'Prefix' MUST NOT be removed. 1332 7. The field 'Comments' MAY be added, changed, modified, or removed 1333 via the registration process or any of the processes or 1334 considerations described in this section. 1336 8. The field 'Suppress-Script' MAY be added or removed via the 1337 registration process. 1339 9. Codes assigned by ISO 639-1 that do not conflict with existing 1340 two-letter primary language subtags and which have no 1341 corresponding three-letter primary or extended language subtags 1342 defined in the registry are entered into the IANA registry as 1343 new records of type 'language'. 1345 10. Codes assigned by ISO 639-2 that do not conflict with existing 1346 three-letter primary or extended language subtags are entered 1347 into the IANA registry as new records of type 'language'. 1349 11. Codes assigned by ISO 639-3 that do not conflict with existing 1350 three-letter primary or extended language subtags are entered 1351 into the IANA registry as new records. 1353 1. Codes that have a defined "macro-language" mapping at the 1354 time of their registration MUST be entered into the registry 1355 as records of type 'extlang' with a 'Prefix' field 1356 containing the appropriate prefix tag. 1358 2. Codes that represent sign languages MUST be entered into the 1359 registry as record of type 'extlang' with a 'Prefix' field 1360 that matches the Basic Language Range "sgn" (see Section 1361 3.3.1 "Basic Filtering" in [RFC4647]). 1363 3. All other codes MUST be entered into the registry as records 1364 of type 'language'. 1366 12. A record of type 'language' or 'extlang' MUST NOT be registered 1367 if there exists a record of either type with the same subtag 1368 value. For example, if an 'extlang' subtag 'foo' exists in the 1369 registry, all attempts to register a 'language' subtag 'foo' 1370 will be rejected. 1372 13. Codes assigned by ISO 15924 and ISO 3166 that do not conflict 1373 with existing subtags of the associated type and whose meaning 1374 is not the same as an existing subtag of the same type are 1375 entered into the IANA registry as new records. 1377 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1378 withdrawn by their respective maintenance or registration 1379 authority remain valid in language tags. A 'Deprecated' field 1380 containing the date of withdrawal MUST be added to the record. 1381 If a new record of the same type is added that represents a 1382 replacement value, then a 'Preferred-Value' field MAY also be 1383 added. The registration process MAY be used to add comments 1384 about the withdrawal of the code by the respective standard. 1386 Example The region code 'TL' was assigned to the country 1387 'Timor-Leste', replacing the code 'TP' (which was assigned to 1388 'East Timor' when it was under administration by Portugal). 1389 The subtag 'TP' remains valid in language tags, but its 1390 record contains the a 'Preferred-Value' of 'TL' and its field 1391 'Deprecated' contains the date the new code was assigned 1392 ('2004-07-06'). 1394 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1395 with existing subtags of the associated type, including subtags 1396 that are deprecated, MUST NOT be entered into the registry. The 1397 following additional considerations apply to subtag values that 1398 are reassigned: 1400 A. For ISO 639 codes, if the newly assigned code's meaning is 1401 not represented by a subtag in the IANA registry, the 1402 Language Subtag Reviewer, as described in Section 3.5, SHALL 1403 prepare a proposal for entering in the IANA registry as soon 1404 as practical a registered language subtag as an alternate 1405 value for the new code. The form of the registered language 1406 subtag will be at the discretion of the Language Subtag 1407 Reviewer and MUST conform to other restrictions on language 1408 subtags in this document. 1410 B. For all subtags whose meaning is derived from an external 1411 standard (that is, by ISO 639, ISO 15924, ISO 3166, or UN 1412 M.49), if a new meaning is assigned to an existing code and 1413 the new meaning broadens the meaning of that code, then the 1414 meaning for the associated subtag MAY be changed to match. 1415 The meaning of a subtag MUST NOT be narrowed, however, as 1416 this can result in an unknown proportion of the existing 1417 uses of a subtag becoming invalid. Note: ISO 639 1418 maintenance agency/registration authority (MA/RA) has 1419 adopted a similar stability policy. 1421 C. For ISO 15924 codes, if the newly assigned code's meaning is 1422 not represented by a subtag in the IANA registry, the 1423 Language Subtag Reviewer, as described in Section 3.5, SHALL 1424 prepare a proposal for entering in the IANA registry as soon 1425 as practical a registered variant subtag as an alternate 1426 value for the new code. The form of the registered variant 1427 subtag will be at the discretion of the Language Subtag 1428 Reviewer and MUST conform to other restrictions on variant 1429 subtags in this document. 1431 D. For ISO 3166 codes, if the newly assigned code's meaning is 1432 associated with the same UN M.49 code as another 'region' 1433 subtag, then the existing region subtag remains as the 1434 preferred value for that region and no new entry is created. 1435 A comment MAY be added to the existing region subtag 1436 indicating the relationship to the new ISO 3166 code. 1438 E. For ISO 3166 codes, if the newly assigned code's meaning is 1439 associated with a UN M.49 code that is not represented by an 1440 existing region subtag, then the Language Subtag Reviewer, 1441 as described in Section 3.5, SHALL prepare a proposal for 1442 entering the appropriate UN M.49 country code as an entry in 1443 the IANA registry. 1445 F. For ISO 3166 codes, if there is no associated UN numeric 1446 code, then the Language Subtag Reviewer SHALL petition the 1447 UN to create one. If there is no response from the UN 1448 within ninety days of the request being sent, the Language 1449 Subtag Reviewer SHALL prepare a proposal for entering in the 1450 IANA registry as soon as practical a registered variant 1451 subtag as an alternate value for the new code. The form of 1452 the registered variant subtag will be at the discretion of 1453 the Language Subtag Reviewer and MUST conform to other 1454 restrictions on variant subtags in this document. This 1455 situation is very unlikely to ever occur. 1457 16. UN M.49 has codes for both countries and areas (such as '276' 1458 for Germany) and geographical regions and sub-regions (such as 1459 '150' for Europe). UN M.49 country or area codes for which 1460 there is no corresponding ISO 3166 code SHOULD NOT be 1461 registered, except as a surrogate for an ISO 3166 code that is 1462 blocked from registration by an existing subtag. If such a code 1463 becomes necessary, then the registration authority for ISO 3166 1464 SHOULD first be petitioned to assign a code to the region. If 1465 the petition for a code assignment by ISO 3166 is refused or not 1466 acted on in a timely manner, the registration process described 1467 in Section 3.5 MAY then be used to register the corresponding UN 1468 M.49 code. This way, UN M.49 codes remain available as the 1469 value of last resort in cases where ISO 3166 reassigns a 1470 deprecated value in the registry. 1472 17. Stability provisions apply to grandfathered tags with this 1473 exception: should it be possible to compose one of the 1474 grandfathered tags from registered subtags, then the field 1475 'Type' in that record is changed from 'grandfathered' to 1476 'redundant'. Note that this will not affect language tags that 1477 match the grandfathered tag, since these tags will now match 1478 valid generative subtag sequences. For example, this document 1479 caused the ISO 639-3 code 'gan', used in the redundant tag "zh- 1480 gan", to be registered as an extended language subtag. The 1481 formerly-grandfathered tag "zh-gan" became a redundant tag as a 1482 result (but existing content or implementations that use "zh- 1483 gan" remain valid). 1485 3.5. Registration Procedure for Subtags 1487 The procedure given here MUST be used by anyone who wants to use a 1488 subtag not currently in the IANA Language Subtag Registry. 1490 Only subtags of type 'language' and 'variant' will be considered for 1491 independent registration of new subtags. Handling of subtags needed 1492 for stability and subtags necessary to keep the registry synchronized 1493 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1494 defined by this document are described in Section 3.3. Stability 1495 provisions are described in Section 3.4. 1497 This procedure MAY also be used to register or alter the information 1498 for the 'Description', 'Comments', 'Deprecated', 'Prefix', or 1499 'Suppress-Script' fields in a subtag's record as described in 1500 Section 3.4. Changes to all other fields in the IANA registry are 1501 NOT permitted. 1503 Registering a new subtag or requesting modifications to an existing 1504 tag or subtag starts with the requester filling out the registration 1505 form reproduced below. Note that each response is not limited in 1506 size so that the request can adequately describe the registration. 1507 The fields in the "Record Requested" section SHOULD follow the 1508 requirements in Section 3.1. 1510 LANGUAGE SUBTAG REGISTRATION FORM 1511 1. Name of requester: 1512 2. E-mail address of requester: 1513 3. Record Requested: 1515 Type: 1516 Subtag: 1517 Description: 1518 Prefix: 1519 Preferred-Value: 1520 Deprecated: 1521 Suppress-Script: 1522 Comments: 1524 4. Intended meaning of the subtag: 1525 5. Reference to published description 1526 of the language (book or article): 1527 6. Any other relevant information: 1529 Figure 5: The Language Subtag Registration Form 1531 The subtag registration form MUST be sent to 1532 for a two-week review period before it can 1533 be submitted to IANA. If modifications are made to the request 1534 during the course of the registration process (such as corrections to 1535 meet the requirements in Section 3.1) the corrected form MUST also be 1536 sent to prior to submission to IANA. 1538 The ietf-languages list is an open list and can be joined by sending 1539 a request to . The list can be 1540 hosted by IANA or by any third party at the request of IESG. 1542 Variant subtags are usually registered for use with a particular 1543 range of language tags. For example, the subtag 'rozaj' is intended 1544 for use with language tags that start with the primary language 1545 subtag "sl", since Resian is a dialect of Slovenian. Thus, the 1546 subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" 1547 or "sl-IT-rozaj". This information is stored in the 'Prefix' field 1548 in the registry. Variant registration requests SHOULD include at 1549 least one 'Prefix' field in the registration form. 1551 Extended language subtags MUST include exactly one 'Prefix' field. 1553 The 'Prefix' field for a given registered subtag exists in the IANA 1554 registry as a guide to usage. Additional prefixes MAY be added by 1555 filing an additional registration form. In that form, the "Any other 1556 relevant information:" field MUST indicate that it is the addition of 1557 a prefix. 1559 Requests to add a prefix to a variant subtag that imply a different 1560 semantic meaning will probably be rejected. For example, a request 1561 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1562 nedis" represented some German dialect would be rejected. The 1563 'nedis' subtag represents a particular Slovenian dialect and the 1564 additional registration would change the semantic meaning assigned to 1565 the subtag. A separate subtag SHOULD be proposed instead. 1567 The 'Description' field MUST contain a description of the tag being 1568 registered written or transcribed into the Latin script; it MAY also 1569 include a description in a non-Latin script. Non-ASCII characters 1570 MUST be escaped using the syntax described in Section 3.1. The 1571 'Description' field is used for identification purposes and doesn't 1572 necessarily represent the actual native name of the language or 1573 variation or to be in any particular language. 1575 While the 'Description' field itself is not guaranteed to be stable 1576 and errata corrections MAY be undertaken from time to time, attempts 1577 to provide translations or transcriptions of entries in the registry 1578 itself will probably be frowned upon by the community or rejected 1579 outright, as changes of this nature have an impact on the provisions 1580 in Section 3.4. 1582 When the two-week period has passed, the Language Subtag Reviewer 1583 MUST take one of the following actions: 1585 Forward the record to be inserted or modified to iana@iana.org 1586 according to the procedure described in Section 3.3. 1588 Explicitly reject the request because of significant objections 1589 raised on the list or due to problems with constraints in this 1590 document (which MUST be explicitly cited) 1592 Extend the review period by granting an additional two-week 1593 increment to permit further discussion. After each two-week 1594 increment, the Language Subtag Reviewer MUST indicate on the list 1595 whether the registration has been accepted, rejected, or extended. 1597 Note that the Language Subtag Reviewer MAY raise objections on the 1598 list if he or she so desires. The important thing is that the 1599 objection MUST be made publicly. 1601 Sometimes the requested record needs to be modified as a result of 1602 discussion during the review period or due to requirements in this 1603 document. The applicant, Language Subtag Reviewer, or others are 1604 free to submit a modified version of the request, which will be 1605 considered in lieu of the original request with the explicit approval 1606 of the applicant. Such changes do not restart the two-week 1607 discussion period, although an application containing the final 1608 record submitted to IANA MUST appear on the list at least one week 1609 prior to the Language Subtag Reviewer forwarding the record to IANA. 1610 The applicant is also free to modify a rejected application with 1611 additional information and submit it again; this starts a new two- 1612 week comment period. 1614 Decisions made by the Language Subtag Reviewer MAY be appealed to the 1615 IESG [RFC2028] under the same rules as other IETF decisions 1616 [RFC2026]. This includes a decision to extend the review period or 1617 the failure to announce a decision in a clear and timely manner. 1619 All approved registration forms are available online in the directory 1620 http://www.iana.org/numbers.html under "languages". 1622 Updates or changes to existing records follow the same procedure as 1623 new registrations. The Language Subtag Reviewer decides whether 1624 there is consensus to update the registration following the two week 1625 review period; normally, objections by the original registrant will 1626 carry extra weight in forming such a consensus. 1628 Registrations are permanent and stable. Once registered, subtags 1629 will not be removed from the registry and will remain a valid way in 1630 which to specify a specific language or variant. 1632 Note: The purpose of the "Reference to published description" section 1633 in the registration form is to aid in verifying whether a language is 1634 registered or what language or language variation a particular subtag 1635 refers to. In most cases, reference to an authoritative grammar or 1636 dictionary of that language will be useful; in cases where no such 1637 work exists, other well-known works describing that language or in 1638 that language MAY be appropriate. The Language Subtag Reviewer 1639 decides what constitutes "good enough" reference material. This 1640 requirement is not intended to exclude particular languages or 1641 dialects due to the size of the speaker population or lack of a 1642 standardized orthography. Minority languages will be considered 1643 equally on their own merits. 1645 3.6. Possibilities for Registration 1647 Possibilities for registration of subtags or information about 1648 subtags include: 1650 o Primary language subtags for languages not listed in ISO 639 that 1651 are not variants of any listed or registered language MAY be 1652 registered. At the time this document was created, there were no 1653 examples of this form of subtag. Before attempting to register a 1654 language subtag, there MUST be an attempt to register the language 1655 with ISO 639. Subtags MUST NOT be registered for languages 1656 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3, 1657 or that are under consideration by the ISO 639 registration 1658 authorities, or that have never been attempted for registration 1659 with those authorities. If ISO 639 has previously rejected a 1660 language for registration, it is reasonable to assume that there 1661 must be additional, very compelling evidence of need before it 1662 will be registered as a primary language subtag in the IANA 1663 registry (to the extent that it is very unlikely that any subtags 1664 will be registered of this type). 1666 o Dialect or other divisions or variations within a language, its 1667 orthography, writing system, regional or historical usage, 1668 transliteration or other transformation, or distinguishing 1669 variation MAY be registered as variant subtags. An example is the 1670 'rozaj' subtag (the Resian dialect of Slovenian). 1672 o The addition or maintenance of fields (generally of an 1673 informational nature) in Tag or Subtag records as described in 1674 Section 3.1 and subject to the stability provisions in 1675 Section 3.4. This includes descriptions, comments, deprecation 1676 and preferred values for obsolete or withdrawn codes, or the 1677 addition of script or extlang information to primary language 1678 subtags. 1680 o The addition of records and related field value changes necessary 1681 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1682 UN M.49 as described in Section 3.4. 1684 Subtags proposed for registration that would cause all or part of a 1685 grandfathered tag to become redundant but whose meaning conflicts 1686 with or alters the meaning of the grandfathered tag MUST be rejected. 1688 This document leaves the decision on what subtags or changes to 1689 subtags are appropriate (or not) to the registration process 1690 described in Section 3.5. 1692 Note: four-character primary language subtags are reserved to allow 1693 for the possibility of alpha4 codes in some future addition to the 1694 ISO 639 family of standards. 1696 ISO 639 defines a maintenance agency for additions to and changes in 1697 the list of languages in ISO 639. This agency is: 1699 International Information Centre for Terminology (Infoterm) 1700 Aichholzgasse 6/12, AT-1120 1701 Wien, Austria 1702 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1704 ISO 639-2 defines a maintenance agency for additions to and changes 1705 in the list of languages in ISO 639-2. This agency is: 1707 Library of Congress 1708 Network Development and MARC Standards Office 1709 Washington, D.C. 20540 USA 1710 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1711 URL: http://www.loc.gov/standards/iso639-2 1713 ISO 639-3 defines a maintenance agency for additions to and changes 1714 in the list of languages in ISO 639-3. This agency is: 1716 SIL International 1717 ISO 639-3 Registrar 1718 7500 W. Camp Wisdom Rd. 1719 Dallas, TX 75236 USA 1720 Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546 1721 Email: iso639-3@sil.org 1722 URL: http://www.sil.org/iso639-3 1724 The maintenance agency for ISO 3166 (country codes) is: 1726 ISO 3166 Maintenance Agency 1727 c/o International Organization for Standardization 1728 Case postale 56 1729 CH-1211 Geneva 20 Switzerland 1730 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1731 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1733 The registration authority for ISO 15924 (script codes) is: 1735 Unicode Consortium Box 391476 1736 Mountain View, CA 94039-1476, USA 1737 URL: http://www.unicode.org/iso15924 1739 The Statistics Division of the United Nations Secretariat maintains 1740 the Standard Country or Area Codes for Statistical Use and can be 1741 reached at: 1743 Statistical Services Branch 1744 Statistics Division 1745 United Nations, Room DC2-1620 1746 New York, NY 10017, USA 1748 Fax: +1-212-963-0623 1749 E-mail: statistics@un.org 1750 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1752 3.7. Extensions and Extensions Registry 1754 Extension subtags are those introduced by single-character subtags 1755 ("singletons") other than 'x'. They are reserved for the generation 1756 of identifiers that contain a language component and are compatible 1757 with applications that understand language tags. 1759 The structure and form of extensions are defined by this document so 1760 that implementations can be created that are forward compatible with 1761 applications that might be created using singletons in the future. 1762 In addition, defining a mechanism for maintaining singletons will 1763 lend stability to this document by reducing the likely need for 1764 future revisions or updates. 1766 Single-character subtags are assigned by IANA using the "IETF 1767 Consensus" policy defined by [RFC2434]. This policy requires the 1768 development of an RFC, which SHALL define the name, purpose, 1769 processes, and procedures for maintaining the subtags. The 1770 maintaining or registering authority, including name, contact email, 1771 discussion list email, and URL location of the registry, MUST be 1772 indicated clearly in the RFC. The RFC MUST specify or include each 1773 of the following: 1775 o The specification MUST reference the specific version or revision 1776 of this document that governs its creation and MUST reference this 1777 section of this document. 1779 o The specification and all subtags defined by the specification 1780 MUST follow the ABNF and other rules for the formation of tags and 1781 subtags as defined in this document. In particular, it MUST 1782 specify that case is not significant and that subtags MUST NOT 1783 exceed eight characters in length. 1785 o The specification MUST specify a canonical representation. 1787 o The specification of valid subtags MUST be available over the 1788 Internet and at no cost. 1790 o The specification MUST be in the public domain or available via a 1791 royalty-free license acceptable to the IETF and specified in the 1792 RFC. 1794 o The specification MUST be versioned, and each version of the 1795 specification MUST be numbered, dated, and stable. 1797 o The specification MUST be stable. That is, extension subtags, 1798 once defined by a specification, MUST NOT be retracted or change 1799 in meaning in any substantial way. 1801 o The specification MUST include in a separate section the 1802 registration form reproduced in this section (below) to be used in 1803 registering the extension upon publication as an RFC. 1805 o IANA MUST be informed of changes to the contact information and 1806 URL for the specification. 1808 IANA will maintain a registry of allocated single-character 1809 (singleton) subtags. This registry MUST use the record-jar format 1810 described by the ABNF in Section 3.1. Upon publication of an 1811 extension as an RFC, the maintaining authority defined in the RFC 1812 MUST forward this registration form to iesg@ietf.org, who MUST 1813 forward the request to iana@iana.org. The maintaining authority of 1814 the extension MUST maintain the accuracy of the record by sending an 1815 updated full copy of the record to iana@iana.org with the subject 1816 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only 1817 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY 1818 be modified in these updates. 1820 Failure to maintain this record, maintain the corresponding registry, 1821 or meet other conditions imposed by this section of this document MAY 1822 be appealed to the IESG [RFC2028] under the same rules as other IETF 1823 decisions (see [RFC2026]) and MAY result in the authority to maintain 1824 the extension being withdrawn or reassigned by the IESG. 1826 %% 1827 Identifier: 1828 Description: 1829 Comments: 1830 Added: 1831 RFC: 1832 Authority: 1833 Contact_Email: 1834 Mailing_List: 1835 URL: 1836 %% 1838 Figure 6: Format of Records in the Language Tag Extensions Registry 1840 'Identifier' contains the single-character subtag (singleton) 1841 assigned to the extension. The Internet-Draft submitted to define 1842 the extension SHOULD specify which letter or digit to use, although 1843 the IESG MAY change the assignment when approving the RFC. 1845 'Description' contains the name and description of the extension. 1847 'Comments' is an OPTIONAL field and MAY contain a broader description 1848 of the extension. 1850 'Added' contains the date the RFC was published in the "full-date" 1851 format specified in [RFC3339]. For example: 2004-06-28 represents 1852 June 28, 2004, in the Gregorian calendar. 1854 'RFC' contains the RFC number assigned to the extension. 1856 'Authority' contains the name of the maintaining authority for the 1857 extension. 1859 'Contact_Email' contains the email address used to contact the 1860 maintaining authority. 1862 'Mailing_List' contains the URL or subscription email address of the 1863 mailing list used by the maintaining authority. 1865 'URL' contains the URL of the registry for this extension. 1867 The determination of whether an Internet-Draft meets the above 1868 conditions and the decision to grant or withhold such authority rests 1869 solely with the IESG and is subject to the normal review and appeals 1870 process associated with the RFC process. 1872 Extension authors are strongly cautioned that many (including most 1873 well-formed) processors will be unaware of any special relationships 1874 or meaning inherent in the order of extension subtags. Extension 1875 authors SHOULD avoid subtag relationships or canonicalization 1876 mechanisms that interfere with matching or with length restrictions 1877 that sometimes exist in common protocols where the extension is used. 1878 In particular, applications MAY truncate the subtags in doing 1879 matching or in fitting into limited lengths, so it is RECOMMENDED 1880 that the most significant information be in the most significant 1881 (left-most) subtags and that the specification gracefully handle 1882 truncated subtags. 1884 When a language tag is to be used in a specific, known, protocol, it 1885 is RECOMMENDED that that the language tag not contain extensions not 1886 supported by that protocol. In addition, note that some protocols 1887 MAY impose upper limits on the length of the strings used to store or 1888 transport the language tag. 1890 3.8. Update of the Language Subtag Registry 1892 Upon adoption of this document the IANA Language Subtag Registry will 1893 need an update so that it contains the complete set of subtags valid 1894 in a language tag. This collection of subtags, along with a 1895 description of the process used to create it, is described by 1896 [registry-update]. IANA will publish the updated version of the 1897 registry described by this document using the instructions and 1898 content of [registry-update]. Once published by IANA, the 1899 maintenance procedures, rules, and registration processes described 1900 in this document will be available for new registrations or updates. 1902 Registrations that are in process under the rules defined in 1903 [RFC4646] when this document is adopted MUST be completed under the 1904 rules contained in this document. 1906 4. Formation and Processing of Language Tags 1908 This section addresses how to use the information in the registry 1909 with the tag syntax to choose, form, and process language tags. 1911 4.1. Choice of Language Tag 1913 The guiding principle in forming language tags is to "tag content 1914 wisely." This means that sometimes there is a choice between several 1915 possible tags for the same content and that the choice of which tag 1916 to use depends on the content and application in question. 1918 Interoperability is best served when the same language tag is used 1919 consistently to represent the same language. If an application has 1920 requirements that make the rules here inapplicable, then that 1921 application risks damaging interoperability. It is strongly 1922 RECOMMENDED that users not define their own rules for language tag 1923 choice. 1925 A subtag SHOULD only be used when it adds useful distinguishing 1926 information to the tag. Extraneous subtags interfere with the 1927 meaning, understanding, and processing of language tags. In 1928 particular, users and implementations SHOULD follow the 'Prefix' and 1929 'Suppress-Script' fields in the registry (defined in Section 3.1): 1930 these fields provide guidance on when specific additional subtags 1931 SHOULD be used or avoided in a language tag. 1933 In particular, some applications can benefit from the use of script 1934 subtags in language tags, as long as the use is consistent for a 1935 given context. Script subtags are never appropriate for unwritten 1936 content (such as audio recordings). 1938 Script subtags were not formally defined in [RFC3066] and their use 1939 can affect matching and subtag identification for implementations of 1940 RFC 3066, as these subtags appear between the primary language and 1941 region subtags. For example, if an implementation selects content 1942 using Basic Filtering [RFC4647] (originally described in Section 2.5 1943 of [RFC3066]) and the user requested the language range "en-US", 1944 content labeled "en-Latn-US" will not match the request and thus not 1945 be selected. Therefore, it is important to know when script subtags 1946 will customarily be used and when they ought not be used. In the 1947 registry, the Suppress-Script field helps ensure greater 1948 compatibility between the language tags by defining when users SHOULD 1949 NOT include a script subtag with a particular primary language 1950 subtag. 1952 Extended language subtags (type 'extlang' in the registry; see 1953 Section 3.1) also appear between the primary language and subsequent 1954 (script, region, or variant) subtags. Applications sometimes benefit 1955 from their judicious use in forming language tags. 1957 Standards, protocols, and applications that reference this document 1958 normatively but apply different rules to the ones given in this 1959 section MUST specify how language tag selection varies from the 1960 guidelines given here. 1962 The choice of subtags used to form a language tag SHOULD be guided by 1963 the following rules: 1965 1. Use as precise a tag as possible, but no more specific than is 1966 justified. Avoid using subtags that are not important for 1967 distinguishing content in an application. 1969 * For example, 'de' might suffice for tagging an email written 1970 in German, while "de-CH-1996" is probably unnecessarily 1971 precise for such a task. 1973 2. The script subtag SHOULD NOT be used to form language tags unless 1974 the script adds some distinguishing information to the tag. The 1975 field 'Suppress-Script' in the primary language record in the 1976 registry indicates script subtags that do not add distinguishing 1977 information for most applications. For example: 1979 * The subtag 'Latn' should not be used with the primary language 1980 'en' because nearly all English documents are written in the 1981 Latin script and it adds no distinguishing information. 1982 However, if a document were written in English mixing Latin 1983 script with another script such as Braille ('Brai'), then it 1984 might be appropriate to choose to indicate both scripts to aid 1985 in content selection, such as the application of a style 1986 sheet. 1988 * When labeling content that is unwritten (such as a recording 1989 of human speech), the script subtag should not be used, even 1990 if the language is customarily written in several scripts. 1991 Thus the subtitles to a movie might use the tag "zh-cmn-Hant" 1992 (Chinese, Mandarin, Traditional script), but the audio track 1993 for the same language would be tagged "zh-cmn". 1995 3. If a tag or subtag has a 'Preferred-Value' field in its registry 1996 entry, then the value of that field SHOULD be used to form the 1997 language tag in preference to the tag or subtag in which the 1998 preferred value appears. 2000 * For example, use 'he' for Hebrew in preference to 'iw'. 2002 4. [ISO639-2] has defined several codes included in the subtag 2003 registry that require additional care when choosing language 2004 tags. In most of these cases, where omitting the language tag is 2005 permitted, such omission preferable to using these codes. 2006 Language tags SHOULD NOT incorporate these subtags as a prefix, 2007 unless the additional information conveys some value to the 2008 application. 2010 1. Use specific language subtags or subtag sequences in 2011 preference to subtags for language collections. A "language 2012 collection" is a subtag derived from one of the [ISO639-2] 2013 codes that represents multiple related languages. These 2014 codes are included as primary language subtags in the 2015 registry. For example, the code 'cmc' represents "Chamic 2016 languages". The registry contains values for each of the 2017 approximately ten individual languages represented by this 2018 collective code. Some other examples include the subtags 2019 Germanic ('ger') or Algonquian languages ('alg'). Since 2020 these codes are interpreted inclusively, content tagged with 2021 "en" (English), "de" (German), or "gsw" (Swiss German, 2022 Alemannic) could also (but SHOULD NOT) be tagged with "ger" 2023 (Germanic languages). Subtags derived from collection codes 2024 SHOULD NOT be used be used unless more specific language 2025 information is not available. Note that matching 2026 implementations generally do not understand the relationship 2027 between the collection and its encompassed languages, and so 2028 users ought not assume a subtag based on a language 2029 collection is a useful means for selecting content in its 2030 encompassed languages. 2032 2. The 'mul' (Multiple) primary language subtag is intended to 2033 identify content in multiple languages. It SHOULD NOT be 2034 used when a list of languages (such as Content-Language) or 2035 individual tags for each content element can be used instead. 2037 3. The 'und' (Undetermined) primary language subtag is intended 2038 to identify linguistic content whose language is not known. 2039 It SHOULD NOT be used unless a language tag is required and 2040 language information is not available or cannot be 2041 determined. Omitting the language tag (where permitted) is 2042 preferred. The 'und' subtag MAY be useful for protocols that 2043 require a language tag to be provided or where a primary 2044 language subtag is required (such as in "und-Latn"). The 2045 'und' subtag MAY also be useful when matching language tags 2046 in certain situations. 2048 4. The 'zxx' (Non-Linguistic) primary language subtag is 2049 intended to identify content that has no language. Some 2050 examples might include instrumental or electronic music; 2051 sound recordings consisting of nonverbal sounds; audiovisual 2052 materials with no narration, printed titles, or subtitles; 2053 machine-readable data files consisting of machine languages 2054 or character codes; or programming source code. Note: where 2055 there are fragments of linguistic content, such as 2056 programming source code containing comments written in 2057 English, the subtag 'zxx' might still be used to indicate the 2058 primary status of the content, just as 'en' can be applied to 2059 a predominantly English text that contains a few French 2060 phrases. 2062 5. The 'mis' (Miscellaneous) primary language subtag is derived 2063 from a collective code and is used to identify linguistic 2064 content whose language is known but cannot otherwise be 2065 identified. It is commonly used when the range of language 2066 tags is constrained or for languages not otherwise 2067 categorized. For example, a library application might be 2068 limited to the set of subtags defined for use by the [MARC21] 2069 standard. The 'mis' subtag might be used by this application 2070 for languages not included in that set. It SHOULD NOT be 2071 used unless a language tag is required and no other means of 2072 identifying the language is available. 2074 6. The grandfathered tag "i-default" (Default Language) was 2075 originally registered according to [RFC1766] to meet the 2076 needs of [RFC2277]. It is used to indicate not a specific 2077 language, but rather, it identifies the condition or content 2078 used where the language preferences of the user cannot be 2079 established. It SHOULD NOT be used except as a means of 2080 labeling the default content for applications or protocols 2081 that require default language content to be labeled with that 2082 specific tag. It MAY also be used by an application or 2083 protocol to identify when the default language content is 2084 being returned. 2086 5. The same variant subtag MUST NOT be used more than once within a 2087 language tag. 2089 * For example, the tag "de-DE-1901-1901" is not valid. 2091 To ensure consistent backward compatibility, this document contains 2092 several provisions to account for potential instability in the 2093 standards used to define the subtags that make up language tags. 2094 These provisions mean that no language tag created under the rules in 2095 this document will become invalid. 2097 4.2. Meaning of the Language Tag 2099 The relationship between the tag and the information it relates to is 2100 defined by the context in which the tag appears. Accordingly, this 2101 section gives only possible examples of its usage. 2103 o For a single information object, the associated language tags 2104 might be interpreted as the set of languages that is necessary for 2105 a complete comprehension of the complete object. Example: Plain 2106 text documents. 2108 o For an aggregation of information objects, the associated language 2109 tags could be taken as the set of languages used inside components 2110 of that aggregation. Examples: Document stores and libraries. 2112 o For information objects whose purpose is to provide alternatives, 2113 the associated language tags could be regarded as a hint that the 2114 content is provided in several languages and that one has to 2115 inspect each of the alternatives in order to find its language or 2116 languages. In this case, the presence of multiple tags might not 2117 mean that one needs to be multi-lingual to get complete 2118 understanding of the document. Example: MIME multipart/ 2119 alternative. 2121 o In markup languages, such as HTML and XML, language information 2122 can be added to each part of the document identified by the markup 2123 structure (including the whole document itself). For example, one 2124 could write C'est la vie. inside a 2125 Norwegian document; the Norwegian-speaking user could then access 2126 a French-Norwegian dictionary to find out what the marked section 2127 meant. If the user were listening to that document through a 2128 speech synthesis interface, this formation could be used to signal 2129 the synthesizer to appropriately apply French text-to-speech 2130 pronunciation rules to that span of text, instead of applying the 2131 inappropriate Norwegian rules. 2133 Language tags are related when they contain a similar sequence of 2134 subtags. For example, if a language tag B contains language tag A as 2135 a prefix, then B is typically "narrower" or "more specific" than A. 2136 Thus, "zh-Hant-TW" is more specific than "zh-Hant". 2138 This relationship is not guaranteed in all cases: specifically, 2139 languages that begin with the same sequence of subtags are NOT 2140 guaranteed to be mutually intelligible, although they might be. For 2141 example, the tag "az" shares a prefix with both "az-Latn" 2142 (Azerbaijani written using the Latin script) and "az-Cyrl" 2143 (Azerbaijani written using the Cyrillic script). A person fluent in 2144 one script might not be able to read the other, even though the text 2145 might be identical. Content tagged as "az" most probably is written 2146 in just one script and thus might not be intelligible to a reader 2147 familiar with the other script. 2149 4.3. Length Considerations 2151 There is no defined upper limit on the size of language tags. While 2152 historically most language tags have consisted of language and region 2153 subtags with a combined total length of up to six characters, larger 2154 tags have always been both possible and actually appeared in use. 2156 Neither the language tag syntax nor other requirements in this 2157 document impose a fixed upper limit on the number of subtags in a 2158 language tag (and thus an upper bound on the size of a tag). The 2159 language tag syntax suggests that, depending on the specific 2160 language, more subtags (and thus a longer tag) are sometimes 2161 necessary to completely identify the language for certain 2162 applications; thus, it is possible to envision long or complex subtag 2163 sequences. 2165 4.3.1. Working with Limited Buffer Sizes 2167 Some applications and protocols are forced to allocate fixed buffer 2168 sizes or otherwise limit the length of a language tag. A conformant 2169 implementation or specification MAY refuse to support the storage of 2170 language tags that exceed a specified length. Any such limitation 2171 SHOULD be clearly documented, and such documentation SHOULD include 2172 what happens to longer tags (for example, whether an error value is 2173 generated or the language tag is truncated). A protocol that allows 2174 tags to be truncated at an arbitrary limit, without giving any 2175 indication of what that limit is, has the potential for causing harm 2176 by changing the meaning of tags in substantial ways. 2178 In practice, most language tags do not require more than a few 2179 subtags and will not approach reasonably sized buffer limitations; 2180 see Section 4.1. 2182 Some specifications or protocols have limits on tag length but do not 2183 have a fixed length limitation. For example, [RFC2231] has no 2184 explicit length limitation: the length available for the language tag 2185 is constrained by the length of other header components (such as the 2186 charset's name) coupled with the 76-character limit in [RFC2047]. 2187 Thus, the "limit" might be 50 or more characters, but it could 2188 potentially be quite small. 2190 The considerations for assigning a buffer limit are: 2192 Implementations SHOULD NOT truncate language tags unless the 2193 meaning of the tag is purposefully being changed, or unless the 2194 tag does not fit into a limited buffer size specified by a 2195 protocol for storage or transmission. 2197 Implementations SHOULD warn the user when a tag is truncated since 2198 truncation changes the semantic meaning of the tag. 2200 Implementations of protocols or specifications that are space 2201 constrained but do not have a fixed limit SHOULD use the longest 2202 possible tag in preference to truncation. 2204 Protocols or specifications that specify limited buffer sizes for 2205 language tags MUST allow for language tags of up to 33 characters. 2207 Protocols or specifications that specify limited buffer sizes for 2208 language tags SHOULD allow for language tags of at least 42 2209 characters. 2211 The following illustration shows how the 42-character recommendation 2212 was derived. The combination of language and extended language 2213 subtags was chosen for future compatibility. At up to 15 characters, 2214 this combination is longer than the longest possible primary language 2215 subtag (8 characters): 2217 language = 3 (ISO 639-2; ISO 639-1 requires 2) 2218 extlang1 = 4 (each subsequent subtag includes '-') 2219 extlang2 = 4 (unlikely: needs prefix="language-extlang1") 2220 extlang3 = 4 (extremely unlikely) 2221 script = 5 (if not suppressed: see Section 4.1) 2222 region = 4 (UN M.49; ISO 3166 requires 3) 2223 variant1 = 9 (needs 'language' as a prefix) 2224 variant2 = 9 (needs 'language-variant1' as a prefix) 2226 total = 42 characters 2228 Figure 7: Derivation of the Limit on Tag Length 2230 4.3.2. Truncation of Language Tags 2232 Truncation of a language tag alters the meaning of the tag, and thus 2233 SHOULD be avoided. However, truncation of language tags is sometimes 2234 necessary due to limited buffer sizes. Such truncation MUST NOT 2235 permit a subtag to be chopped off in the middle or the formation of 2236 invalid tags (for example, one ending with the "-" character). 2238 This means that applications or protocols that truncate tags MUST do 2239 so by progressively removing subtags along with their preceding "-" 2240 from the right side of the language tag until the tag is short enough 2241 for the given buffer. If the resulting tag ends with a single- 2242 character subtag, that subtag and its preceding "-" MUST also be 2243 removed. For example: 2245 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 2246 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 2247 2. zh-Latn-CN-variant1-a-extend1 2248 3. zh-Latn-CN-variant1 2249 4. zh-Latn-CN 2250 5. zh-Latn 2251 6. zh 2253 Figure 8: Example of Tag Truncation 2255 4.4. Canonicalization of Language Tags 2257 Since a particular language tag is sometimes used by many processes, 2258 language tags SHOULD always be created or generated in a canonical 2259 form. 2261 A language tag is in canonical form when: 2263 1. The tag is well-formed according the rules in Section 2.1 and 2264 Section 2.2. 2266 2. Subtags of type 'Region' that have a Preferred-Value mapping in 2267 the IANA registry (see Section 3.1) SHOULD be replaced with their 2268 mapped value. Note: In rare cases, the mapped value will also 2269 have a Preferred-Value. 2271 3. Redundant or grandfathered tags that have a Preferred-Value 2272 mapping in the IANA registry (see Section 3.1) MUST be replaced 2273 with their mapped value. These items either are deprecated 2274 mappings created before the adoption of this document (such as 2275 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 2276 the result of later registrations or additions to this document 2277 (for example, "zh-hakka" was deprecated in favor of the language- 2278 extlang combination "zh-hak" when this document was adopted). 2280 4. Other subtags that have a Preferred-Value mapping in the IANA 2281 registry (see Section 3.1) MUST be replaced with their mapped 2282 value. These items consist entirely of clerical corrections to 2283 ISO 639-1 in which the deprecated subtags have been maintained 2284 for compatibility purposes. 2286 5. If more than one extension subtag sequence exists, the extension 2287 sequences are ordered into case-insensitive ASCII order by 2288 singleton subtag. 2290 Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical 2291 form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in 2292 canonical form. 2294 Example: The language tag "en-BU" (English as used in Burma) is not 2295 canonical because the 'BU' subtag has a canonical mapping to 'MM' 2296 (Myanmar), although the tag "en-BU" maintains its validity. 2298 Canonicalization of language tags does not imply anything about the 2299 use of upper or lowercase letters when processing or comparing 2300 subtags (and as described in Section 2.1). All comparisons MUST be 2301 performed in a case-insensitive manner. 2303 When performing canonicalization of language tags, processors MAY 2304 regularize the case of the subtags (that is, this process is 2305 OPTIONAL), following the case used in the registry. Note that this 2306 corresponds to the following casing rules: uppercase all non-initial 2307 two-letter subtags; titlecase all non-initial four-letter subtags; 2308 lowercase everything else. 2310 Note: Case folding of ASCII letters in certain locales, unless 2311 carefully handled, sometimes produces non-ASCII character values. 2312 The Unicode Character Database file "SpecialCasing.txt" defines the 2313 specific cases that are known to cause problems with this. In 2314 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2315 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2316 Implementers SHOULD specify a locale-neutral casing operation to 2317 ensure that case folding of subtags does not produce this value, 2318 which is illegal in language tags. For example, if one were to 2319 uppercase the region subtag 'in' using Turkish locale rules, the 2320 sequence U+0130 U+004E would result instead of the expected 'IN'. 2322 Note: if the field 'Deprecated' appears in a registry record without 2323 an accompanying 'Preferred-Value' field, then that tag or subtag is 2324 deprecated without a replacement. Validating processors SHOULD NOT 2325 generate tags that include these values, although the values are 2326 canonical when they appear in a language tag. 2328 An extension MUST define any relationships that exist between the 2329 various subtags in the extension and thus MAY define an alternate 2330 canonicalization scheme for the extension's subtags. Extensions MAY 2331 define how the order of the extension's subtags are interpreted. For 2332 example, an extension could define that its subtags are in canonical 2333 order when the subtags are placed into ASCII order: that is, "en-a- 2334 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2335 define that the order of the subtags influences their semantic 2336 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2337 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2338 so that they are tolerant of the typical processes described in 2339 Section 3.7. 2341 4.5. Considerations for Private Use Subtags 2343 Private use subtags, like all other subtags, MUST conform to the 2344 format and content constraints in the ABNF. Private use subtags have 2345 no meaning outside the private agreement between the parties that 2346 intend to use or exchange language tags that employ them. The same 2347 subtags MAY be used with a different meaning under a separate private 2348 agreement. They SHOULD NOT be used where alternatives exist and 2349 SHOULD NOT be used in content or protocols intended for general use. 2351 Private use subtags are simply useless for information exchange 2352 without prior arrangement. The value and semantic meaning of private 2353 use tags and of the subtags used within such a language tag are not 2354 defined by this document. 2356 Subtags defined in the IANA registry as having a specific private use 2357 meaning convey more information that a purely private use tag 2358 prefixed by the singleton subtag 'x'. For applications, this 2359 additional information MAY be useful. 2361 For example, the region subtags 'AA', 'ZZ', and in the ranges 2362 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2363 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2364 great deal of public, interchangeable information about the language 2365 material (that it is Chinese in the simplified Chinese script and is 2366 suitable for some geographic region 'XQ'). While the precise 2367 geographic region is not known outside of private agreement, the tag 2368 conveys far more information than an opaque tag such as "x-someLang", 2369 which contains no information about the language subtag or script 2370 subtag outside of the private agreement. 2372 However, in some cases content tagged with private use subtags MAY 2373 interact with other systems in a different and possibly unsuitable 2374 manner compared to tags that use opaque, privately defined subtags, 2375 so the choice of the best approach sometimes depends on the 2376 particular domain in question. 2378 5. IANA Considerations 2380 This section deals with the processes and requirements necessary for 2381 IANA to undertake to maintain the subtag and extension registries as 2382 defined by this document and in accordance with the requirements of 2383 [RFC2434]. 2385 The impact on the IANA maintainers of the two registries defined by 2386 this document will be a small increase in the frequency of new 2387 entries or updates. 2389 5.1. Language Subtag Registry 2391 Upon adoption of this document, IANA will update the registry using 2392 instructions and content provided in a companion document: 2393 [registry-update]. The criteria and process for selecting the 2394 updated set of records are described in that document. The updated 2395 set of records represents no impact on IANA, since the work to create 2396 it will be performed externally. 2398 Future work on the Language Subtag Registry has been limited to 2399 inserting or replacing whole records preformatted for IANA by the 2400 Language Subtag Reviewer as described in Section 3.3 of this document 2401 and archiving the forwarded registration form. 2403 Each record MUST be sent to iana@iana.org with a subject line 2404 indicating whether the enclosed record is an insertion of a new 2405 record (indicated by the word "INSERT" in the subject line) or a 2406 replacement of an existing record (indicated by the word "MODIFY" in 2407 the subject line). Records MUST NOT be deleted from the registry. 2408 IANA MUST place any inserted or modified records into the appropriate 2409 section of the language subtag registry, grouping the records by 2410 their 'Type' field. Inserted records MAY be placed anywhere in the 2411 appropriate section; there is no guarantee of the order of the 2412 records beyond grouping them together by 'Type'. Modified records 2413 MUST overwrite the record they replace. 2415 Included in any request to insert or modify records MUST be a new 2416 File-Date record. This record MUST be placed first in the registry. 2417 In the event that the File-Date record present in the registry has a 2418 later date than the record being inserted or modified, the existing 2419 record MUST be preserved. 2421 5.2. Extensions Registry 2423 The Language Tag Extensions Registry can contain at most 35 records 2424 and thus changes to this registry are expected to be very infrequent. 2426 Future work by IANA on the Language Tag Extensions Registry is 2427 limited to two cases. First, the IESG MAY request that new records 2428 be inserted into this registry from time to time. These requests 2429 MUST include the record to insert in the exact format described in 2430 Section 3.7. In addition, there MAY be occasional requests from the 2431 maintaining authority for a specific extension to update the contact 2432 information or URLs in the record. These requests MUST include the 2433 complete, updated record. IANA is not responsible for validating the 2434 information provided, only that it is properly formatted. It should 2435 reasonably be seen to come from the maintaining authority named in 2436 the record present in the registry. 2438 6. Security Considerations 2440 Language tags used in content negotiation, like any other information 2441 exchanged on the Internet, might be a source of concern because they 2442 might be used to infer the nationality of the sender, and thus 2443 identify potential targets for surveillance. 2445 This is a special case of the general problem that anything sent is 2446 visible to the receiving party and possibly to third parties as well. 2447 It is useful to be aware that such concerns can exist in some cases. 2449 The evaluation of the exact magnitude of the threat, and any possible 2450 countermeasures, is left to each application protocol (see BCP 72 2451 [RFC3552] for best current practice guidance on security threats and 2452 defenses). 2454 The language tag associated with a particular information item is of 2455 no consequence whatsoever in determining whether that content might 2456 contain possible homographs. The fact that a text is tagged as being 2457 in one language or using a particular script subtag provides no 2458 assurance whatsoever that it does not contain characters from scripts 2459 other than the one(s) associated with or specified by that language 2460 tag. 2462 Since there is no limit to the number of variant, private use, and 2463 extension subtags, and consequently no limit on the possible length 2464 of a tag, implementations need to guard against buffer overflow 2465 attacks. See Section 4.3 for details on language tag truncation, 2466 which can occur as a consequence of defenses against buffer overflow. 2468 Although the specification of valid subtags for an extension (see 2469 Section 3.7) MUST be available over the Internet, implementations 2470 SHOULD NOT mechanically depend on it being always accessible, to 2471 prevent denial-of-service attacks. 2473 7. Character Set Considerations 2475 The syntax in this document requires that language tags use only the 2476 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2477 character sets, so the composition of language tags should not have 2478 any character set issues. 2480 Rendering of characters based on the content of a language tag is not 2481 addressed in this memo. Historically, some languages have relied on 2482 the use of specific character sets or other information in order to 2483 infer how a specific character should be rendered (notably this 2484 applies to language- and culture-specific variations of Han 2485 ideographs as used in Japanese, Chinese, and Korean). When language 2486 tags are applied to spans of text, rendering engines sometimes use 2487 that information in deciding which font to use in the absence of 2488 other information, particularly where languages with distinct writing 2489 traditions use the same characters. 2491 8. Changes from RFC 4646 2493 The main goal for this revision of this document was to incorporate 2494 ISO 639-3 and its attendent set of language codes into the IANA 2495 Language Subtag Registry, permitting the identification of many more 2496 languages and dialects than previously supported. 2498 The specific changes in this document to meet these goals are: 2500 o Defines the incorporation of ISO 639-3 codes as language and 2501 extlang subtags. Extlangs are now permitted in language tags. 2502 The changes necessary to achieve this were: 2504 * something 2506 o Changed the ABNF related to grandfathered tags. The irregular 2507 tags are now listed. Well-formed grandfathered tags are now 2508 described by the 'langtag' production and the 'grandfathered' 2509 production was removed as a result. Also: added description of 2510 both types of grandfathered tags to Section 2.2.8. 2512 o Added the paragraph on "collections" to Section 4.1. 2514 o Changed the capitalization rules for 'Tag' fields in Section 3.1. 2516 o Split section 3.1 up into subsections. 2518 o Modified section 3.5 to allow Suppress-Script fields to be added, 2519 modified, or removed via the registration process. This was an 2520 erratum from RFC 4646. 2522 o Modified examples that used region code 'CS' (formerly Serbia and 2523 Montenegro) to use 'RS' (Serbia) instead. 2525 o Modified the rules for creating and maintaining record 2526 'Description' fields to prevent duplicates, including inverted 2527 duplicates. 2529 o Removed the lengthy description of why RFC 4646 was created from 2530 this section, which also caused the removal of the reference to 2531 XML Schema. 2533 o Modified the text in section 2.1 to place more emphasis on the 2534 fact that language tags are not case sensitive. 2536 o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS" 2537 and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the 2538 Suppress-Script on 'Latn' with 'fr'. 2540 o Changed the requirements for well-formedness to make singleton 2541 repetition checking optional (it is required for validity 2542 checking) in Section 2.2.9. 2544 o Changed the text in Section 2.2.9 refering to grandfathered 2545 checking to note that the list is now included in the ABNF. 2547 o Modified and added text to Section 3.2. The job description was 2548 placed first. A note was added making clear that the Language 2549 Subtag Reviewer may delegate various non-critical duties, 2550 including list moderation. Finally, additional text was added to 2551 make the appointment process clear and to clarify that decisions 2552 and performance of the reviewer are appealable. 2554 o Added text to Section 3.5 clarifying that the ietf-languages list 2555 is operated by whomever the IESG appoints. 2557 o Added text to Section 3.1.4 clarifying that the first Description 2558 in a 'language' or 'extlang' record matches the corresponding 2559 Reference Name for the language in ISO 639-3. 2561 o Modified Section 2.2.9 to define classes of conformance related to 2562 specific tags (formerly 'well-formed' and 'valid' referred to 2563 implementations). 2565 o Added text to the end of Section 3.1.2 noting that future versions 2566 of this document might add new field types and recommending that 2567 implementations ignore any unrecognized fields. 2569 o Modified the 'extlang' examples in Appendix A to use valid subtags 2570 and removed the note saying that they were only examples. 2572 o Added text about what the lack of a Suppress-Script field means in 2573 a record to Section 3.1.9. 2575 o Added text allowing the correction of misspellings and typographic 2576 errors to Section 3.1.4. 2578 o Added text to Section 3.1.7 disallowing Prefix field conflicts 2579 (such as circular prefix references). 2581 o Modified text in Section 3.5 to require the subtag reviewer to 2582 announce his/her decision (or extension) following the two-week 2583 period. Also clarified that any decision or failure to decide can 2584 be appealed. 2586 o Modified text in Section 4.1 to include the (heretofore anecdotal) 2587 guiding principle of tag choice, and clarifying the non-use of 2588 script subtags in non-written applications. Also updated examples 2589 in this section to use Chamic languages as an example of language 2590 collections. 2592 o Prohibited multiple use of the same variant in a tag (i.e. "de- 2593 1901-1901"). Previously this was only a recommendation 2594 ("SHOULD"). 2596 o Removed inappropriate [RFC2119] language from the illustration in 2597 Section 4.3.1. 2599 o Replaced the example of "zh-gouyu" with "zh-hakka"->"zh-hak" in 2600 Section 4.4, noting that it was this document that caused the 2601 change. 2603 o Replaced the section in Section 4.1 dealing with "mul"/"und" to 2604 include the subtags 'zxx' and 'mis', as well as the tag 2605 "i-default". A normative reference to RFC 2277 was added, along 2606 with an informative reference to MARC21. 2608 o Added text to Section 3.5 clarifying that any modifications of a 2609 registration request must be sent to the ietf-languages list 2610 before submission to IANA. 2612 [[Ed.Note: Open issues in this version: 2614 Whether encompassed language rules for the creation of extlang 2615 records in the registry should be retained or modified. 2617 Modification of the registry to use UTF-8 as its character 2618 encoding. (removed and apparently rejected) 2620 Details of the appointment, term duration, performance review of 2621 the subtag reviewer by the IESG. (addressed?) 2623 Inclusion of additional information related to Suppress-Script in 2624 the registry (e.g. that it wasn't assigned on purpose) 2626 ]] 2628 9. References 2630 9.1. Normative References 2632 [ISO10646] 2633 International Organization for Standardization, "ISO/IEC 2634 10646:2003. Information technology -- Universal Multiple- 2635 Octet Coded Character Set (UCS)", 2003. 2637 [ISO15924] 2638 International Organization for Standardization, "ISO 2639 15924:2004. Information and documentation -- Codes for the 2640 representation of names of scripts", January 2004. 2642 [ISO3166-1] 2643 International Organization for Standardization, "ISO 3166- 2644 1:1997. Codes for the representation of names of countries 2645 and their subdivisions -- Part 1: Country codes", 1997. 2647 [ISO639-1] 2648 International Organization for Standardization, "ISO 639- 2649 1:2002. Codes for the representation of names of languages 2650 -- Part 1: Alpha-2 code", 2002. 2652 [ISO639-2] 2653 International Organization for Standardization, "ISO 639- 2654 2:1998. Codes for the representation of names of languages 2655 -- Part 2: Alpha-3 code, first edition", 1998. 2657 [ISO639-3] 2658 International Organization for Standardization, "ISO 639- 2659 3:2007. Codes for the representation of names of languages 2660 -- Part 3: Alpha-3 code for comprehensive coverage of 2661 languages", 2007. 2663 [ISO646] International Organization for Standardization, "ISO/IEC 2664 646:1991, Information technology -- ISO 7-bit coded 2665 character set for information interchange.", 1991. 2667 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2668 3", BCP 9, RFC 2026, October 1996. 2670 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2671 the IETF Standards Process", BCP 11, RFC 2028, 2672 October 1996. 2674 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2675 Requirement Levels", BCP 14, RFC 2119, March 1997. 2677 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 2678 Languages", BCP 18, RFC 2277, January 1998. 2680 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2681 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2682 October 1998. 2684 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2685 Understanding Concerning the Technical Work of the 2686 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2688 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2689 Timestamps", RFC 3339, July 2002. 2691 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 2692 Specifications: ABNF", RFC 4234, October 2005. 2694 [RFC4645] Ewell, D., Ed., "Initial Language Subtag Registry", 2695 September 2006, . 2697 [RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of Language 2698 Tags", September 2006, 2699 . 2701 [UN_M.49] Statistics Division, United Nations, "Standard Country or 2702 Area Codes for Statistical Use", UN Standard Country or 2703 Area Codes for Statistical Use, Revision 4 (United Nations 2704 publication, Sales No. 98.XVII.9, June 1999. 2706 9.2. Informative References 2708 [MARC21] Library of Congress, National Development and MARC 2709 Standards Office, "MARC 21 Specifications for Record 2710 Structure, Character Sets, and Exchange Media", 2711 January 2000, . 2713 [RFC1766] Alvestrand, H., "Tags for the Identification of 2714 Languages", RFC 1766, March 1995. 2716 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2717 Part Three: Message Header Extensions for Non-ASCII Text", 2718 RFC 2047, November 1996. 2720 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2721 Word Extensions: Character Sets, Languages, and 2722 Continuations", RFC 2231, November 1997. 2724 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2725 10646", RFC 2781, February 2000. 2727 [RFC3066] Alvestrand, H., "Tags for the Identification of 2728 Languages", BCP 47, RFC 3066, January 2001. 2730 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2731 Text on Security Considerations", BCP 72, RFC 3552, 2732 July 2003. 2734 [RFC4646] Phillips, A., Ed. and M. Davis, Ed., "Tags for the 2735 Identification of Languages", September 2006, 2736 . 2738 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2739 Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003. 2740 ISBN 0-321-49081-0)", January 2007. 2742 [XML10] Bray (et al), T., "Extensible Markup Language (XML) 1.0", 2743 02 2004. 2745 [iso639.prin] 2746 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2747 Committee: Working principles for ISO 639 maintenance", 2748 March 2000, 2749 . 2752 [record-jar] 2753 Raymond, E., "The Art of Unix Programming", 2003, 2754 . 2756 [registry-update] 2757 Ewell, D., Ed., "Update to the Language Subtag Registry", 2758 September 2006, . 2761 Appendix A. Acknowledgements 2763 Any list of contributors is bound to be incomplete; please regard the 2764 following as only a selection from the group of people who have 2765 contributed to make this document what it is today. 2767 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the 2768 precursors of this document, made enormous contributions directly or 2769 indirectly to this document and are generally responsible for the 2770 success of language tags. 2772 The following people contributed to this document: 2774 Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan, 2775 Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion 2776 Gunn, Kent Karlsson, Randy Presuhn, Stephen Silver, and many, many 2777 others. 2779 Very special thanks must go to Harald Tveit Alvestrand, who 2780 originated RFCs 1766 and 3066, and without whom this document would 2781 not have been possible. 2783 Special thanks go to Michael Everson, who served as the Language Tag 2784 Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as 2785 the Language Subtag Reviewer since the adoption of RFC 4646. 2787 Special thanks also to Doug Ewell, for his production of the first 2788 complete subtag registry, his work to support and maintain new 2789 registrations, and his careful editorship of both RFC 4645 and 2790 [registry-update]. 2792 Appendix B. Examples of Language Tags (Informative) 2794 Simple language subtag: 2796 de (German) 2798 fr (French) 2800 ja (Japanese) 2802 i-enochian (example of a grandfathered tag) 2804 Language subtag plus Script subtag: 2806 zh-Hant (Chinese written using the Traditional Chinese script) 2808 zh-Hans (Chinese written using the Simplified Chinese script) 2810 sr-Cyrl (Serbian written using the Cyrillic script) 2812 sr-Latn (Serbian written using the Latin script) 2814 Language-Script-Region: 2816 zh-Hans-CN (Chinese written using the Simplified script as used in 2817 mainland China) 2819 sr-Latn-RS (Serbian written using the Latin script as used in 2820 Serbia) 2822 Language-Variant: 2824 sl-rozaj (Resian dialect of Slovenian) 2826 sl-nedis (Nadiza dialect of Slovenian) 2828 Language-Region-Variant: 2830 de-CH-1901 (German as used in Switzerland using the 1901 variant 2831 [orthography]) 2833 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 2835 Language-Script-Region-Variant: 2837 hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as 2838 used in Italy) 2840 Language-Region: 2842 de-DE (German for Germany) 2844 en-US (English as used in the United States) 2846 es-419 (Spanish appropriate for the Latin America and Caribbean 2847 region using the UN region code) 2849 Private use subtags: 2851 de-CH-x-phonebk 2853 az-Arab-x-AZE-derbend 2855 Extended language subtags: 2857 zh-cmn 2859 zh-cmn-Hant-CN 2861 Private use registry values: 2863 x-whatever (private use using the singleton 'x') 2865 qaa-Qaaa-QM-x-southern (all private tags) 2867 de-Qaaa (German, with a private script) 2869 sr-Latn-QM (Serbian, Latin-script, private region) 2871 sr-Qaaa-RS (Serbian, private script, for Serbia) 2873 Tags that use extensions (examples ONLY: extensions MUST be defined 2874 by revision or update to this document or by RFC): 2876 en-US-u-islamCal 2878 zh-CN-a-myExt-x-private 2880 en-a-myExt-b-another 2882 Some Invalid Tags: 2884 de-419-DE (two region tags) 2886 a-DE (use of a single-character subtag in primary position; note 2887 that there are a few grandfathered tags that start with "i-" that 2888 are valid) 2890 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter 2891 prefix) 2893 Authors' Addresses 2895 Addison Phillips (editor) 2896 Yahoo! Inc. 2898 Email: addison@inter-locale.com 2899 URI: http://www.inter-locale.com 2901 Mark Davis (editor) 2902 Google 2904 Email: mark.davis@macchiato.com or mark.davis@google.com 2906 Full Copyright Statement 2908 Copyright (C) The IETF Trust (2007). 2910 This document is subject to the rights, licenses and restrictions 2911 contained in BCP 78, and except as set forth therein, the authors 2912 retain all their rights. 2914 This document and the information contained herein are provided on an 2915 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2916 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 2917 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 2918 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 2919 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2920 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2922 Intellectual Property 2924 The IETF takes no position regarding the validity or scope of any 2925 Intellectual Property Rights or other rights that might be claimed to 2926 pertain to the implementation or use of the technology described in 2927 this document or the extent to which any license under such rights 2928 might or might not be available; nor does it represent that it has 2929 made any independent effort to identify any such rights. Information 2930 on the procedures with respect to rights in RFC documents can be 2931 found in BCP 78 and BCP 79. 2933 Copies of IPR disclosures made to the IETF Secretariat and any 2934 assurances of licenses to be made available, or the result of an 2935 attempt made to obtain a general license or permission for the use of 2936 such proprietary rights by implementers or users of this 2937 specification can be obtained from the IETF on-line IPR repository at 2938 http://www.ietf.org/ipr. 2940 The IETF invites any interested party to bring to its attention any 2941 copyrights, patents or patent applications, or other proprietary 2942 rights that may cover technology that may be required to implement 2943 this standard. Please address the information to the IETF at 2944 ietf-ipr@ietf.org. 2946 Acknowledgment 2948 Funding for the RFC Editor function is provided by the IETF 2949 Administrative Support Activity (IASA).