idnits 2.17.1 draft-ietf-ltru-4646bis-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 3129. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3140. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3147. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3153. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC4646, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 31, 2007) is 6112 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'ISO10646' is defined on line 2790, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2860 ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) ** Downref: Normative reference to an Informational RFC: RFC 4645 -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) -- Obsolete informational reference (is this intentional?): RFC 4646 (Obsoleted by RFC 5646) Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc. 4 Obsoletes: 4646 (if approved) M. Davis, Ed. 5 Intended status: Best Current Google 6 Practice July 31, 2007 7 Expires: February 1, 2008 9 Tags for Identifying Languages 10 draft-ietf-ltru-4646bis-07 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on February 1, 2008. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2007). 41 Abstract 43 This document describes the structure, content, construction, and 44 semantics of language tags for use in cases where it is desirable to 45 indicate the language used in an information object. It also 46 describes how to register values for use in language tags and the 47 creation of user-defined extensions for private interchange. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 5 53 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 5 54 2.2. Language Subtag Sources and Interpretation . . . . . . . . 8 55 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . 9 56 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11 57 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 12 58 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 13 59 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 15 60 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 16 61 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 17 62 2.2.8. Grandfathered Registrations . . . . . . . . . . . . . 18 63 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 18 64 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 20 65 3.1. Format of the IANA Language Subtag Registry . . . . . . . 20 66 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 20 67 3.1.2. Record Definitions . . . . . . . . . . . . . . . . . . 21 68 3.1.3. Subtag and Tag Fields . . . . . . . . . . . . . . . . 23 69 3.1.4. Description Field . . . . . . . . . . . . . . . . . . 24 70 3.1.5. Deprecated Field . . . . . . . . . . . . . . . . . . . 25 71 3.1.6. Preferred-Value Field . . . . . . . . . . . . . . . . 25 72 3.1.7. Prefix Field . . . . . . . . . . . . . . . . . . . . . 26 73 3.1.8. Suppress-Script Field . . . . . . . . . . . . . . . . 27 74 3.1.9. Macrolanguage Field . . . . . . . . . . . . . . . . . 27 75 3.1.10. Comments Field . . . . . . . . . . . . . . . . . . . . 28 76 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 28 77 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 28 78 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 29 79 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 34 80 3.6. Possibilities for Registration . . . . . . . . . . . . . . 38 81 3.7. Extensions and Extensions Registry . . . . . . . . . . . . 40 82 3.8. Update of the Language Subtag Registry . . . . . . . . . . 43 83 4. Formation and Processing of Language Tags . . . . . . . . . . 44 84 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 44 85 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 48 86 4.3. Length Considerations . . . . . . . . . . . . . . . . . . 50 87 4.3.1. Working with Limited Buffer Sizes . . . . . . . . . . 50 88 4.3.2. Truncation of Language Tags . . . . . . . . . . . . . 52 89 4.4. Canonicalization of Language Tags . . . . . . . . . . . . 52 90 4.5. Considerations for Private Use Subtags . . . . . . . . . . 54 91 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 56 92 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 56 93 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 57 94 6. Security Considerations . . . . . . . . . . . . . . . . . . . 58 95 7. Character Set Considerations . . . . . . . . . . . . . . . . . 59 96 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 60 97 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 64 98 9.1. Normative References . . . . . . . . . . . . . . . . . . . 64 99 9.2. Informative References . . . . . . . . . . . . . . . . . . 65 100 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 67 101 Appendix B. Examples of Language Tags (Informative) . . . . . . . 68 102 Appendix C. Examples of Registration Forms . . . . . . . . . . . 71 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 73 104 Intellectual Property and Copyright Statements . . . . . . . . . . 74 106 1. Introduction 108 Human beings on our planet have, past and present, used a number of 109 languages. There are many reasons why one would want to identify the 110 language used when presenting or requesting information. 112 A user's language preferences often need to be identified so that 113 appropriate processing can be applied. For example, the user's 114 language preferences in a Web browser can be used to select Web pages 115 appropriately. Language preferences can also be used to select among 116 tools (such as dictionaries) to assist in the processing or 117 understanding of content in different languages. 119 In addition, knowledge about the particular language used by some 120 piece of information content might be useful or even required by some 121 types of processing; for example, spell-checking, computer- 122 synthesized speech, Braille transcription, or high-quality print 123 renderings. 125 One means of indicating the language used is by labeling the 126 information content with an identifier or "tag". These tags can be 127 used to specify user preferences when selecting information content, 128 or for labeling additional attributes of content and associated 129 resources. 131 Tags can also be used to indicate additional language attributes of 132 content. For example, indicating specific information about the 133 dialect, writing system, or orthography used in a document or 134 resource may enable the user to obtain information in a form that 135 they can understand, or it can be important in processing or 136 rendering the given content into an appropriate form or style. 138 This document specifies a particular identifier mechanism (the 139 language tag) and a registration function for values to be used to 140 form tags. It also defines a mechanism for private use values and 141 future extension. 143 This document replaces [RFC4646], which replaced [RFC3066] and its 144 predecessor [RFC1766]. For a list of changes in this document, see 145 Section 8. 147 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 148 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 149 document are to be interpreted as described in [RFC2119]. 151 2. The Language Tag 153 Language tags are used to help identify languages, whether spoken, 154 written, signed, or otherwise signaled, for the purpose of 155 communication. This includes constructed and artificial languages, 156 but excludes languages not intended primarily for human 157 communication, such as programming languages. 159 2.1. Syntax 161 The language tag is composed of one or more parts, known as 162 "subtags". Each subtag consists of a sequence of alphanumeric 163 characters. Subtags are distinguished and separated from one another 164 by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a 165 "primary language" subtag and a (possibly empty) series of subsequent 166 subtags, each of which refines or narrows the range of languages 167 identified by the overall tag. 169 Usually, each type of subtag is distinguished by length, position in 170 the tag, and content: subtags can be recognized solely by these 171 features. The only exception to this is a fixed list of 172 grandfathered tags registered under RFC 3066 [RFC3066]. This makes 173 it possible to construct a parser that can extract and assign some 174 semantic information to the subtags, even if the specific subtag 175 values are not recognized. Thus, a parser need not have an up-to- 176 date copy (or any copy at all) of the subtag registry to perform most 177 searching and matching operations. 179 The syntax of the language tag in ABNF [RFC4234] is: 181 Language-Tag = langtag 182 / privateuse ; private use tag 183 / irregular ; tags grandfathered by rule 185 langtag = (language 186 ["-" script] 187 ["-" region] 188 *("-" variant) 189 *("-" extension) 190 ["-" privateuse]) 192 language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code 193 / 4ALPHA ; reserved for future use 194 / 5*8ALPHA ; registered language subtag 196 extlang = *3("-" 3ALPHA) ; specific ISO 639-3 codes 198 script = 4ALPHA ; ISO 15924 code 200 region = 2ALPHA ; ISO 3166 code 201 / 3DIGIT ; UN M.49 code 203 variant = 5*8alphanum ; registered variants 204 / (DIGIT 3alphanum) 206 extension = singleton 1*("-" (2*8alphanum)) 208 singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT 209 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" 210 ; Single alphanumerics 211 ; "x" is reserved for private use 213 privateuse = "x" 1*("-" (1*8alphanum)) 215 irregular = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default" 216 / "i-enochian" / "i-hak" / "i-klingon" / "i-lux" 217 / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" 218 / "i-tay" / "i-tsu" / "sgn-BE-fr" / "sgn-BE-nl" 219 / "sgn-CH-de" 221 alphanum = (ALPHA / DIGIT) ; letters and numbers 223 Figure 1: Language Tag ABNF 225 All subtags have a maximum length of eight characters and whitespace 226 is not permitted in a language tag. There is a subtlety in the ABNF 227 production 'variant': variants starting with a digit MAY be four 228 characters long, while those starting with a letter MUST be at least 229 five characters long. For examples of language tags, see Appendix B. 231 Note Well: the ABNF syntax does not distinguish between upper and 232 lowercase. The appearance of upper and lowercase letters in the 233 varous ABNF productions above do not affect how implementations 234 interpret tags. That is, the tag "I-AMI" matches the item "i-ami" in 235 the 'irregular' production. At all times, the tags and their 236 subtags, including private use and extensions, are to be treated as 237 case insensitive: there exist conventions for the capitalization of 238 some of the subtags, but these MUST NOT be taken to carry meaning. 240 For example: 242 o [ISO639-1] recommends that language codes be written in lowercase 243 ('mn' Mongolian). 245 o [ISO3166-1] recommends that country codes be capitalized ('MN' 246 Mongolia). 248 o [ISO15924] recommends that script codes use lowercase with the 249 initial letter capitalized ('Cyrl' Cyrillic). 251 However, in the tags defined by this document, the uppercase US-ASCII 252 letters in the range 'A' through 'Z' are considered equivalent and 253 mapped directly to their US-ASCII lowercase equivalents in the range 254 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from 255 "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of 256 these variations conveys the same meaning: Mongolian written in the 257 Cyrillic script as used in Mongolia. 259 Although case distinctions do not carry meaning in language tags, 260 consistent formatting and presentation of the tags will aid users. 261 The format of the tags and subtags in the registry is RECOMMENDED. 262 In this format, all non-initial two-letter subtags are uppercase, all 263 non-initial four-letter subtags are titlecase, and all other subtags 264 are lowercase. 266 Note that although [RFC4234] refers to octets, the language tags 267 described in this document are sequences of characters from the US- 268 ASCII [ISO646] repertoire. Language tags MAY be used in documents 269 and applications that use other encodings, so long as these encompass 270 the US-ASCII repertoire. An example of this would be an XML document 271 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 273 2.2. Language Subtag Sources and Interpretation 275 The namespace of language tags and their subtags is administered by 276 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 277 the rules in Section 5 of this document. The Language Subtag 278 Registry maintained by IANA is the source for valid subtags: other 279 standards referenced in this section provide the source material for 280 that registry. 282 Terminology used in this document: 284 o Tag or tags refers to a complete language tag, such as 285 "sr-Latn-RS" or "az-Arab-IR". Examples of tags in this document 286 are enclosed in double-quotes ("en-US"). 288 o Subtag refers to a specific section of a tag, delimited by hyphen, 289 such as the subtag 'Hant' in "zh-Hant-CN". Examples of subtags in 290 this document are enclosed in single quotes ('Hant'). 292 o Code or codes refers to values defined in external standards (and 293 which are used as subtags in this document). For example, 'Hant' 294 is an [ISO15924] script code that was used to define the 'Hant' 295 script subtag for use in a language tag. Examples of codes in 296 this document are enclosed in single quotes ('en', 'Hant'). 298 The definitions in this section apply to the various subtags within 299 the language tags defined by this document, excepting those 300 "grandfathered" tags defined in Section 2.2.8. 302 Language tags are designed so that each subtag type has unique length 303 and content restrictions. These make identification of the subtag's 304 type possible, even if the content of the subtag itself is 305 unrecognized. This allows tags to be parsed and processed without 306 reference to the latest version of the underlying standards or the 307 IANA registry and makes the associated exception handling when 308 parsing tags simpler. 310 Subtags in the IANA registry that do not come from an underlying 311 standard can only appear in specific positions in a tag. 312 Specifically, they can only occur as primary language subtags or as 313 variant subtags. 315 Note that sequences of private use and extension subtags MUST occur 316 at the end of the sequence of subtags and MUST NOT be interspersed 317 with subtags defined elsewhere in this document. 319 Single-letter and single-digit subtags are reserved for current or 320 future use. These include the following current uses: 322 o The single-letter subtag 'x' is reserved to introduce a sequence 323 of private use subtags. The interpretation of any private use 324 subtags is defined solely by private agreement and is not defined 325 by the rules in this section or in any standard or registry 326 defined in this document. 328 o All other single-letter subtags are reserved to introduce 329 standardized extension subtag sequences as described in 330 Section 3.7. 332 The single-letter subtag 'i' is used by some grandfathered tags, such 333 as "i-default", where it always appears in the first position and 334 cannot be confused with an extension. 336 2.2.1. Primary Language Subtag 338 The primary language subtag is the first subtag in a language tag 339 (with the exception of private use and certain grandfathered tags) 340 and cannot be omitted. The following rules apply to the primary 341 language subtag: 343 1. All two-character primary language subtags were defined in the 344 IANA registry according to the assignments found in the standard 345 ISO 639 Part 1, "ISO 639-1:2002, Codes for the representation of 346 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 347 assignments subsequently made by the ISO 639-1 registration 348 authority (RA) or governing standardization bodies. 350 2. All three-character primary language subtags were defined in the 351 IANA registry according to the assignments found in either ISO 352 639 Part 2, "ISO 639-2:1998 - Codes for the representation of 353 names of languages -- Part 2: Alpha-3 code - edition 1" 354 [ISO639-2], ISO 639 Part 3, "Codes for the representation of 355 names of languages -- Part 3: Alpha-3 code for comprehensive 356 coverage of languages" [ISO639-3], or assignments subsequently 357 made by the relevant ISO 639 registration authorities or 358 governing standardization bodies. 360 3. The subtags in the range 'qaa' through 'qtz' are reserved for 361 private use in language tags. These subtags correspond to codes 362 reserved by ISO 639-2 for private use. These codes MAY be used 363 for non-registered primary language subtags (instead of using 364 private use subtags following 'x-'). Please refer to Section 4.5 365 for more information on private use subtags. 367 4. All four-character language subtags are reserved for possible 368 future standardization. 370 5. All language subtags of 5 to 8 characters in length in the IANA 371 registry were defined via the registration process in Section 3.5 372 and MAY be used to form the primary language subtag. At the time 373 this document was created, there were no examples of this kind of 374 subtag and future registrations of this type will be discouraged: 375 primary languages are strongly RECOMMENDED for registration with 376 ISO 639, and proposals rejected by ISO 639/RA-JAC will be closely 377 scrutinized before they are registered with IANA. 379 6. The single-character subtag 'x' as the primary subtag indicates 380 that the language tag consists solely of subtags whose meaning is 381 defined by private agreement. For example, in the tag "x-fr-CH", 382 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 383 French language or the country of Switzerland (or any other value 384 in the IANA registry) unless there is a private agreement in 385 place to do so. See Section 4.5. 387 7. The single-character subtag 'i' is used by some grandfathered 388 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 389 grandfathered tags have a primary language subtag in their first 390 position.) 392 8. Other values MUST NOT be assigned to the primary subtag except by 393 revision or update of this document. 395 Note: For languages that have both an ISO 639-1 two-character code 396 and a three character code assigned by either ISO 639-2 or ISO 639-3, 397 only the ISO 639-1 two-character code is defined in the IANA 398 registry. 400 Note: For languages that have no ISO 639-1 two-character code and for 401 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 402 (Bibliographic) codes differ, only the Terminology code is defined in 403 the IANA registry. At the time this document was created, all 404 languages that had both kinds of three-character code were also 405 assigned a two-character code; it is expected that future assignments 406 of this nature will not occur. 408 Note: To avoid problems with versioning and subtag choice as 409 experienced during the transition between RFC 1766 and RFC 3066, as 410 well as the canonical nature of subtags defined by this document, the 411 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 412 RA-JAC) has included the following statement in [iso639.prin]: 414 "A language code already in ISO 639-2 at the point of freezing ISO 415 639-1 shall not later be added to ISO 639-1. This is to ensure 416 consistency in usage over time, since users are directed in 417 Internet applications to employ the alpha-3 code when an alpha-2 418 code for that language is not available." 420 In order to avoid instability in the canonical form of tags, if a 421 two-character code is added to ISO 639-1 for a language for which a 422 three-character code was already included in either ISO 639-2 or ISO 423 639-3, the two-character code MUST NOT be registered. See 424 Section 3.4. 426 For example, if some content were tagged with 'haw' (Hawaiian), which 427 currently has no two-character code, the tag would not be invalidated 428 if ISO 639-1 were to assign a two-character code to the Hawaiian 429 language at a later date. 431 Note: An example of independent primary language subtag registration 432 might include: one of the grandfathered IANA registrations is 433 "i-enochian". The subtag 'enochian' could be registered in the IANA 434 registry as a primary language subtag (assuming that ISO 639 does not 435 register this language first), making tags such as "enochian-AQ" and 436 "enochian-Latn" valid. 438 2.2.2. Extended Language Subtags 440 Extended language subtags are used to identify languages that are 441 encompassed by a "macrolanguage". ISO 639-3 defines certain 442 languages to be "macrolanguages"; that is, they are groups of very 443 closely related languages which are treated as a single language in 444 certain contexts. In order to improve matching behavior and tagging 445 consistency, each language encompassed by a ISO 639-3 macrolanguage 446 is represented in the IANA registry using an extended language 447 subtag, provided that it is not already represented using a language 448 subtag. The following rules apply to the extended language subtags: 450 1. These subtags were defined in the IANA registry according to 451 assignments found in ISO 639 Part 3. 453 2. A sequence of up to three extended language subtags MAY appear in 454 a language tag. This sequence MUST follow the primary language 455 subtag and precede any other subtags. 457 3. Each extended language subtag MUST only appear in a tag 458 immediately following the exact sequence of subtags that appears 459 in the 'Prefix' field in its registry record. 461 4. Other values MUST NOT be assigned to the extended language subtag 462 except by revision or update of this document. 464 Extended language subtag records MUST include exactly one 'Prefix' 465 field indicating an appropriate subtag or sequence of subtags for 466 that extended language subtag. 468 For example, the 'gan' and 'cmn' subtags represent the languages Gan 469 Chinese and Mandarin Chinese. Each is encompassed by the 470 macrolanguage 'zh' (Chinese). Therefore, they both have the prefix 471 "zh" in their registry records. Consequently, Gan Chinese is 472 represented as "zh-gan" and Mandarin Chinese as "zh-cmn". The 473 language subtag 'zh' can still be used without an extended language 474 subtag to label a resource as some unspecified variety of Chinese 475 (which in practice will usually be Mandarin, the dominant variety of 476 Chinese, but might also be some other variety). 478 Now suppose that, in the future, the ISO 639-3 Registration Authority 479 were to decide that Gan Chinese is actually two different closely 480 related languages: it might reclassify 'gan' as a macrolanguage and 481 introduce two new code elements. In that case, these code elements 482 would be added to the IANA registry as extended language subtags with 483 prefixes of "zh-gan". No change would be made to the registry record 484 for 'gan'. 486 2.2.3. Script Subtag 488 Script subtags are used to indicate the script or writing system 489 variations that distinguish the written forms of a language or its 490 dialects. The following rules apply to the script subtags: 492 1. All four-character subtags were defined according to 493 [ISO15924]--"Codes for the representation of the names of 494 scripts": alpha-4 script codes, or subsequently assigned by the 495 ISO 15924 maintenance agency or governing standardization bodies, 496 denoting the script or writing system used in conjunction with 497 this language. 499 2. Script subtags MUST immediately follow the primary language 500 subtag and all extended language subtags and MUST occur before 501 any other type of subtag described below. 503 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 504 use in language tags. These subtags correspond to codes reserved 505 by ISO 15924 for private use. These codes MAY be used for non- 506 registered script values. Please refer to Section 4.5 for more 507 information on private use subtags. 509 4. Script subtags MUST NOT be registered using the process in 510 Section 3.5 of this document. Variant subtags MAY be considered 511 for registration for that purpose. 513 5. There MUST be at most one script subtag in a language tag, and 514 the script subtag SHOULD be omitted when it adds no 515 distinguishing value to the tag or when the primary language 516 subtag's record includes a Suppress-Script field listing the 517 applicable script subtag. 519 Example: "sr-Latn" represents Serbian written using the Latin script. 521 2.2.4. Region Subtag 523 Region subtags are used to indicate linguistic variations associated 524 with or appropriate to a specific country, territory, or region. 525 Typically, a region subtag is used to indicate regional dialects or 526 usage, or region-specific spelling conventions. A region subtag can 527 also be used to indicate that content is expressed in a way that is 528 appropriate for use throughout a region, for instance, Spanish 529 content tailored to be useful throughout Latin America. 531 The following rules apply to the region subtags: 533 1. Region subtags MUST follow any language, extended language, or 534 script subtags and MUST precede all other subtags. 536 2. All two-character subtags following the primary subtag were 537 defined in the IANA registry according to the assignments found 538 in [ISO3166-1] ("Codes for the representation of names of 539 countries and their subdivisions -- Part 1: Country codes") using 540 the list of alpha-2 country codes, or using assignments 541 subsequently made by the ISO 3166 maintenance agency or governing 542 standardization bodies. 544 3. All three-character subtags consisting of digit (numeric) 545 characters following the primary subtag were defined in the IANA 546 registry according to the assignments found in UN Standard 547 Country or Area Codes for Statistical Use [UN_M.49] or 548 assignments subsequently made by the governing standards body. 549 Note that not all of the UN M.49 codes are defined in the IANA 550 registry. The following rules define which codes are entered 551 into the registry as valid subtags: 553 A. UN numeric codes assigned to 'macro-geographical 554 (continental)' or sub-regions MUST be registered in the 555 registry. These codes are not associated with an assigned 556 ISO 3166 alpha-2 code and represent supra-national areas, 557 usually covering more than one nation, state, province, or 558 territory. 560 B. UN numeric codes for 'economic groupings' or 'other 561 groupings' MUST NOT be registered in the IANA registry and 562 MUST NOT be used to form language tags. 564 C. UN numeric codes for countries or areas with ambiguous ISO 565 3166 alpha-2 codes, when entered into the registry, MUST be 566 defined according to the rules in Section 3.4 and MUST be 567 used to form language tags that represent the country or 568 region for which they are defined. 570 D. UN numeric codes for countries or areas for which there is an 571 associated ISO 3166 alpha-2 code in the registry MUST NOT be 572 entered into the registry and MUST NOT be used to form 573 language tags. Note that the ISO 3166-based subtag in the 574 registry MUST actually be associated with the UN M.49 code in 575 question. 577 E. UN numeric codes and ISO 3166 alpha-2 codes for countries or 578 areas listed as eligible for registration in [RFC4645] but 579 not presently registered MAY be entered into the IANA 580 registry via the process described in Section 3.5. Once 581 registered, these codes MAY be used to form language tags. 583 F. All other UN numeric codes for countries or areas that do not 584 have an associated ISO 3166 alpha-2 code MUST NOT be entered 585 into the registry and MUST NOT be used to form language tags. 586 For more information about these codes, see Section 3.4. 588 4. Note: The alphanumeric codes in Appendix X of the UN document 589 MUST NOT be entered into the registry and MUST NOT be used to 590 form language tags. (At the time this document was created, 591 these values matched the ISO 3166 alpha-2 codes.) 593 5. There MUST be at most one region subtag in a language tag and the 594 region subtag MAY be omitted, as when it adds no distinguishing 595 value to the tag. 597 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 598 reserved for private use in language tags. These subtags 599 correspond to codes reserved by ISO 3166 for private use. These 600 codes MAY be used for private use region subtags (instead of 601 using a private use subtag sequence). Please refer to 602 Section 4.5 for more information on private use subtags. 604 "de-CH" represents German ('de') as used in Switzerland ('CH'). 606 "sr-Latn-RS" represents Serbian ('sr') written using Latin script 607 ('Latn') as used in Serbia ('RS'). 609 "es-419" represents Spanish ('es') appropriate to the UN-defined 610 Latin America and Caribbean region ('419'). 612 2.2.5. Variant Subtags 614 Variant subtags are used to indicate additional, well-recognized 615 variations that define a language or its dialects that are not 616 covered by other available subtags. The following rules apply to the 617 variant subtags: 619 1. Variant subtags are not associated with any external standard. 620 Variant subtags and their meanings are defined by the 621 registration process defined in Section 3.5. 623 2. Variant subtags MUST follow all of the other defined subtags, but 624 precede any extension or private use subtag sequences. 626 3. More than one variant MAY be used to form the language tag. 628 4. Variant subtags MUST be registered with IANA according to the 629 rules in Section 3.5 of this document before being used to form 630 language tags. In order to distinguish variants from other types 631 of subtags, registrations MUST meet the following length and 632 content restrictions: 634 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 635 at least five characters long. 637 2. Variant subtags that begin with a digit (0-9) MUST be at 638 least four characters long. 640 Variant subtag records in the language subtag registry MAY include 641 one or more 'Prefix' fields. The 'Prefix' indicates the language tag 642 or tags that would make a suitable prefix (with other subtags, as 643 appropriate) in forming a language tag with the variant. That is, 644 each of the subtags in the prefix SHOULD appear before the variant. 645 For example, the subtag 'nedis' has a Prefix of "sl", making it 646 suitable to form language tags such as "sl-nedis" and "sl-IT-nedis", 647 but not suitable for use in a tag such as "zh-nedis" or "it-IT- 648 nedis". 650 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 652 "de-CH-1996" represents German as used in Switzerland and as written 653 using the spelling reform beginning in the year 1996 C.E. 655 Most variants that share a prefix are mutually exclusive. For 656 example, the German orthographic variations '1996' and '1901' SHOULD 657 NOT be used in the same tag, as they represent the dates of different 658 spelling reforms. A variant that can meaningfully be used in 659 combination with another variant SHOULD include a 'Prefix' field in 660 its registry record that lists that other variant. For example, if 661 another German variant 'example' were created that made sense to use 662 with '1996', then 'example' should include two Prefix fields: "de" 663 and "de-1996". 665 2.2.6. Extension Subtags 667 Extensions provide a mechanism for extending language tags for use in 668 various applications. See Section 3.7. The following rules apply to 669 extensions: 671 1. Extension subtags are separated from the other subtags defined 672 in this document by a single-character subtag ("singleton"). 673 The singleton MUST be one allocated to a registration authority 674 via the mechanism described in Section 3.7 and MUST NOT be the 675 letter 'x', which is reserved for private use subtag sequences. 677 2. Note: Private use subtag sequences starting with the singleton 678 subtag 'x' are described in Section 2.2.7 below. 680 3. An extension MUST follow at least a primary language subtag. 681 That is, a language tag cannot begin with an extension. 682 Extensions extend language tags, they do not override or replace 683 them. For example, "a-value" is not a well-formed language tag, 684 while "de-a-value" is. 686 4. Each singleton subtag MUST appear at most one time in each tag 687 (other than as a private use subtag). That is, singleton 688 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 689 ccc" is invalid because the subtag 'a' appears twice. Note that 690 the tag "en-a-bbb-x-a-ccc" is valid because the second 691 appearance of the singleton 'a' is in a private use sequence. 693 5. Extension subtags MUST meet all of the requirements for the 694 content and format of subtags defined in this document. 696 6. Extension subtags MUST meet whatever requirements are set by the 697 document that defines their singleton prefix and whatever 698 requirements are provided by the maintaining authority. 700 7. Each extension subtag MUST be from two to eight characters long 701 and consist solely of letters or digits, with each subtag 702 separated by a single '-'. 704 8. Each singleton MUST be followed by at least one extension 705 subtag. For example, the tag "tlh-a-b-foo" is invalid because 706 the first singleton 'a' is followed immediately by another 707 singleton 'b'. 709 9. Extension subtags MUST follow all language, extended language, 710 script, region, and variant subtags in a tag. 712 10. All subtags following the singleton and before another singleton 713 are part of the extension. Example: In the tag "fr-a-Latn", the 714 subtag 'Latn' does not represent the script subtag 'Latn' 715 defined in the IANA Language Subtag Registry. Its meaning is 716 defined by the extension 'a'. 718 11. In the event that more than one extension appears in a single 719 tag, the tag SHOULD be canonicalized as described in 720 Section 4.4. 722 For example, if the prefix singleton 'r' and the shown subtags were 723 defined, then the following tag would be a valid example: "en-Latn- 724 GB-boont-r-extended-sequence-x-private" 726 2.2.7. Private Use Subtags 728 Private use subtags are used to indicate distinctions in language 729 important in a given context by private agreement. The following 730 rules apply to private use subtags: 732 1. Private use subtags are separated from the other subtags defined 733 in this document by the reserved single-character subtag 'x'. 735 2. Private use subtags MUST conform to the format and content 736 constraints defined in the ABNF for all subtags. 738 3. Private use subtags MUST follow all language, extended language, 739 script, region, variant, and extension subtags in the tag. 740 Another way of saying this is that all subtags following the 741 singleton 'x' MUST be considered private use. Example: The 742 subtag 'US' in the tag "en-x-US" is a private use subtag. 744 4. A tag MAY consist entirely of private use subtags. 746 5. No source is defined for private use subtags. Use of private use 747 subtags is by private agreement only. 749 6. Private use subtags are NOT RECOMMENDED where alternatives exist 750 or for general interchange. See Section 4.5 for more information 751 on private use subtag choice. 753 For example: Users who wished to utilize codes from the Ethnologue 754 publication of SIL International for language identification might 755 agree to exchange tags such as "az-Arab-x-AZE-derbend". This example 756 contains two private use subtags. The first is 'AZE' and the second 757 is 'derbend'. 759 2.2.8. Grandfathered Registrations 761 Prior to RFC 4646, whole language tags were registered according to 762 the rules in RFC 1766 and/or RFC 3066. These registered tags 763 maintain their validity. Of those tags, those that were made 764 obsolete or redundant by the advent of RFC 4646, by this document, or 765 by subsequent registration of subtags are maintained in the registry 766 in records as "redundant" records. Those tags that do not match the 767 'langtag' production in the ABNF in this document or that contain 768 subtags that do not individually appear in the registry are 769 maintained in the registry in records of the "grandfathered" type. 771 Grandfathered tags contain one or more subtags that are not defined 772 in the Language Subtag Registry (see Section 3). Redundant tags 773 consist entirely of subtags defined above and whose independent 774 registration was superseded by [RFC4646]. For more information see 775 Section 3.8. 777 Some grandfathered tags are "regular" in that they match the 778 'langtag' production in Figure 1. In some cases, these tags could 779 become redundant if their (current unregistered) subtags were to be 780 registered (as variants, for example). In other cases, although the 781 subtags match the language tag pattern, the meaning assigned to the 782 various subtags is prohibited by rules elsewhere in this document. 783 Those tags can never become redundant. 785 The remaining grandfathered tags are "irregular" and do not match the 786 'langtag' production. These are listed in the 'irregular' production 787 in Figure 1. These grandfathered tags can never become redundant. 788 Many of these tags have been superseded by other registrations: their 789 record contains a Preferred-Value field that really ought to be used 790 to form language tags representing that value. 792 2.2.9. Classes of Conformance 794 Implementations sometimes need to describe their capabilities with 795 regard to the rules and practices described in this document. Tags 796 can be checked or verified in a number of ways, but two particular 797 classes of tag conformance are formally defined here. 799 A tag is considered "well-formed" if it conforms to the ABNF 800 (Section 2.1). Note that irregular grandfathered tags are now listed 801 in the 'irregular' production. 803 A tag is considered "valid" if it well-formed and it also satisfies 804 these conditions: 806 o The tag is either a grandfathered tag, or all of its language, 807 extended language, script, region, and variant subtags appear in 808 the IANA language subtag registry as of the particular registry 809 date. 811 o There are no duplicate singleton (extension) subtags and no 812 duplicate variant subtags. 814 o For each subtag that has a 'Prefix' field in the registry, the 815 Prefix matches the language tag using Extended Filtering 816 [RFC4647]. That is, each subtag in the Prefix is present in the 817 tag and in the same order. Furthermore, all of the Prefix's 818 subtags MUST appear before the subtag. For example, the Prefix 819 "zh-TW" matches the tag "zh-Hant-TW". 821 Note that a tag's validity depends on the date of the registry used 822 to validate the tag. A more-recent copy of the registry might 823 contain a subtag that an older version does not. 825 A tag is considered "valid" for a given extension (Section 3.7) (as 826 of a particular version, revision, and date) if it meets the criteria 827 for "valid" above and also satisfies this condition: 829 Each subtag used in the extension part of the tag is valid 830 according to the extension. 832 3. Registry Format and Maintenance 834 This section defines the Language Subtag Registry and the maintenance 835 and update procedures associated with it, as well as a registry for 836 extensions to language tags (Section 3.7). 838 The Language Subtag Registry contains a comprehensive list of all of 839 the subtags valid in language tags. This allows implementers a 840 straightforward and reliable way to validate language tags. The 841 Language Subtag Registry will be maintained so that, except for 842 extension subtags, it is possible to validate all of the subtags that 843 appear in a language tag under the provisions of this document or its 844 revisions or successors. In addition, the meaning of the various 845 subtags will be unambiguous and stable over time. (The meaning of 846 private use subtags, of course, is not defined by the IANA registry.) 848 3.1. Format of the IANA Language Subtag Registry 850 The IANA Language Subtag Registry ("the registry") consists of a text 851 file that is machine readable in the format described in this 852 section, plus copies of the registration forms approved in accordance 853 with the process described in Section 3.5. The existing registration 854 forms for grandfathered and redundant tags taken from RFC 3066 will 855 be maintained as part of the obsolete RFC 3066 registry. The 856 remaining set of subtags created by either [RFC4645] or 857 [registry-update] will not have registration forms created for them. 859 3.1.1. File Format 861 The registry consists of a plain-text file in the record-jar format 862 (described in [record-jar]) which uses the UTF-8 [RFC3629] character 863 encoding. 865 Each line of text is limited to 72 bytes in length. Records are 866 separated by lines containing only the sequence "%%" (%x25.25). 868 Each field can be considered a single, logical line of Unicode 869 [Unicode] characters, comprising a field-name and a field-body 870 separated by a COLON character (%x3A). For convenience (and because 871 there is a 72-byte line length limit), the field-body portion of this 872 conceptual entity can be split into a multiple-line representation; 873 this is called "folding". Folding is always done on Unicode code 874 point boundaries (never in the middle of a multibyte UTF-8 sequence). 875 Although the file format uses the UTF-8 encoding, unless otherwise 876 indicated, fields are restricted to the printable characters from the 877 US-ASCII [ISO646] repertoire. 879 The format of the registry is described by the following ABNF (per 881 [RFC4234]): 883 registry = record *("%%" CRLF record) 884 record = 1*( field-name *SP ":" *SP field-body CRLF ) 885 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] 886 field-body = *([[*SP CRLF] 1*SP] 1*CHARS) 887 CHARS = (%x21-10FFFF) ; Unicode code points 889 Figure 2: Registry Format ABNF 891 The sequence '..' (%x2E.2E) in a field-body denotes a range of 892 values. Such a range represents all subtags of the same length that 893 are in alphabetic or numeric order within that range, including the 894 values explicitly mentioned. For example 'a..c' denotes the values 895 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 896 '13'. 898 All fields whose field-body contains a date value use the "full-date" 899 format specified in [RFC3339]. For example: "2004-06-28" represents 900 June 28, 2004, in the Gregorian calendar. 902 3.1.2. Record Definitions 904 There are three types of records in the registry: "File-Date", 905 "Subtag", and "Tag" records. 907 The first record in the registry is a "File-Date" record. This 908 record contains the single field whose field-name is "File-Date" (see 909 Figure 2). The field-body of this record contains the last 910 modification date of this copy of the registry, making it possible to 911 compare different versions of the registry. The registry on the IANA 912 website is the most current. Versions with an older date than that 913 one are not up-to-date. 915 File-Date: 2004-06-28 916 %% 918 Figure 3: Example of the File-Date Record 920 Subsequent records represent either subtags or tags in the registry. 921 "Subtag" records contain a field with a field-name of "Subtag", 922 while, unsurprisingly, "Tag" records contain a field with a field- 923 name of "Tag". Each of the fields in each record MUST occur no more 924 than once, unless otherwise noted below. Each record MUST contain 925 the following fields: 927 o 'Type' 928 * Type's field-body MUST consist of one of the following strings: 929 "language", "extlang", "script", "region", "variant", 930 "grandfathered", and "redundant" and denotes the type of tag or 931 subtag. 933 o Either 'Subtag' or 'Tag' 935 * Subtag's field-body contains the subtag being defined. This 936 field MUST only appear in records of whose 'Type' has one of 937 these values: "language", "extlang", "script", "region", or 938 "variant". 940 * Tag's field-body contains a complete language tag. This field 941 MUST only appear in records whose 'Type' has one of these 942 values: "grandfathered" or "redundant". Note that the field- 943 body will always follow the 'grandfathered' production in the 944 ABNF in Section 2.1 946 o Description 948 * Description's field-body contains a non-normative description 949 of the subtag or tag. 951 o Added 953 * Added's field-body contains the date the record was added to 954 the registry. 956 Each record MAY also contain the following fields: 958 o Preferred-Value 960 * For fields of type 'script', 'region', and 'variant', 961 'Preferred-Value' contains the subtag of the same 'Type' that 962 is preferred for forming the language tag. 964 * For fields of type 'language' and 'extlang', 'Preferred-Value' 965 contains the language production (see Figure 1) that is 966 preferred when forming the language tag. This can be simply a 967 'language' subtag, or it can be a 'language' subtag followed by 968 an extended language sequence. 970 * For fields of type 'grandfathered' and 'redundant', a canonical 971 mapping to a complete language tag. 973 o Deprecated 974 * Deprecated's field-body contains the date the record was 975 deprecated. 977 o Prefix 979 * Prefix's field-body contains a language tag with which this 980 subtag MAY be used to form a new language tag, perhaps with 981 other subtags as well. The Prefix's subtags appear before the 982 subtag. This field MUST only appear in records whose 'Type' 983 field-body is 'variant' or 'extlang'. For example, the 984 'Prefix' for the variant 'nedis' is 'sl', meaning that the tags 985 "sl-nedis" and "sl-IT-nedis" might be appropriate while the tag 986 "is-nedis" is not. 988 o Comments 990 * Comments contains additional information about the subtag, as 991 deemed appropriate for understanding the registry and 992 implementing language tags using the subtag or tag. 994 o Suppress-Script 996 * Suppress-Script contains a script subtag that SHOULD NOT be 997 used to form language tags with the associated primary language 998 subtag. This field MUST only appear in records whose 'Type' 999 field-body is 'language'. See Section 4.1. 1001 o Macrolanguage 1003 * Macrolanguage contains a primary or extended language subtag 1004 defined by ISO 639 as a "macrolanguage" that encompasses this 1005 language subtag. This field MUST only appear in records whose 1006 'Type' field-body is 'language' or 'extlang'. 1008 Future versions of this document might add additional fields to the 1009 registry, so implementations SHOULD ignore fields found in the 1010 registry that are not defined in this document. 1012 3.1.3. Subtag and Tag Fields 1014 The 'Subtag' field MUST use lowercase letters to form the subtag, 1015 with two exceptions. Subtags whose 'Type' field is 'script' (in 1016 other words, subtags defined by ISO 15924) MUST use titlecase. 1017 Subtags whose 'Type' field is 'region' (in other words, the non- 1018 numeric region subtags defined by ISO 3166) MUST use uppercase. 1019 These exceptions mirror the use of case in the underlying standards. 1021 Each subtag in the tags contained in a 'Tag' field MUST be formatted 1022 using the rules in the preceeding paragraph. That is, all subtags 1023 are lowercase except for subtags that represent script or region 1024 codes. 1026 3.1.4. Description Field 1028 The field 'Description' contains a description of the tag or subtag 1029 in the record. The 'Description' field MAY appear more than once per 1030 record, that is, there can be multiple descriptions for a given 1031 record. At least one of the 'Description' fields MUST be written or 1032 transcribed into the Latin script; additional 'Description' fields 1033 MAY also include a description in a non-Latin script. The 1034 'Description' field MAY thus include non-ASCII characters. Each 1035 'Description' field MUST be unique, both within the record in which 1036 it appears and for the collection of records of the same type. 1037 Moreover, formatting variations of the same description MUST NOT 1038 occur in that specific record or in any other record of the same 1039 type. For example, while the ISO 639-1 code 'fy' contains both the 1040 descriptions "Western Frisian" and "Frisian, Western", only one of 1041 these descriptions appears in the registry. 1043 The 'Description' field is used for identification purposes and 1044 SHOULD NOT be taken to represent the actual native name of the 1045 language or variation or to be in any particular language. 1047 For records taken from a source standard (such as ISO 639 or ISO 1048 3166), the 'Description' value(s) SHOULD also be taken from the 1049 source standard. Multiple descriptions in the source standard MUST 1050 be split into separate 'Description' fields. The source standard's 1051 descriptions MAY be edited, either prior to insertion or via the 1052 registration process. For fields of type 'language' or 'extlang', 1053 the first 'Description' field appearing in the Registry corresponds 1054 to the Reference Name assigned by ISO 639-3. This helps facilitate 1055 cross-referencing between ISO 639 and the registry. 1057 When creating or updating a record due to the action of one of the 1058 source standards, the Language Subtag Reviewer SHOULD remove 1059 duplicate or redundant descriptions and MAY edit descriptions to 1060 correct irregularities in formatting (such as misspellings, 1061 inappropriate apostrophes or other punctuation, or excessive or 1062 missing spaces) prior to submitting the proposed record to the ietf- 1063 languages list. 1065 Note: Descriptions in registry entries that correspond to ISO 639, 1066 ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate 1067 the meaning of that identifier as defined in the source standard at 1068 the time it was added to the registry. The description does not 1069 replace the content of the source standard itself. The descriptions 1070 are not intended to be the English localized names for the subtags. 1071 Localization or translation of language tag and subtag descriptions 1072 is out of scope of this document. 1074 3.1.5. Deprecated Field 1076 The field 'Deprecated' MAY be added to any record via the maintenance 1077 process described in Section 3.3 or via the registration process 1078 described in Section 3.5. Usually, the addition of a 'Deprecated' 1079 field is due to the action of one of the standards bodies, such as 1080 ISO 3166, withdrawing a code. In some historical cases, it might not 1081 have been possible to reconstruct the original deprecation date. For 1082 these cases, an approximate date appears in the registry. Although 1083 valid in language tags, subtags and tags with a 'Deprecated' field 1084 are deprecated and validating processors SHOULD NOT generate these 1085 subtags. Note that a record that contains a 'Deprecated' field and 1086 no corresponding 'Preferred-Value' field has no replacement mapping. 1088 3.1.6. Preferred-Value Field 1090 The field 'Preferred-Value' contains a mapping between the record in 1091 which it appears and another tag or subtag. The value in this field 1092 is strongly RECOMMENDED as the best choice to represent the value of 1093 this record when selecting a language tag. These values form three 1094 groups: 1096 1. ISO 639 language codes that were later withdrawn in favor of 1097 other codes. These values are mostly a historical curiosity. 1099 2. ISO 3166 region codes that have been withdrawn in favor of a new 1100 code. This sometimes happens when a country changes its name or 1101 administration in such a way that warrants a new region code. 1103 3. Grandfathered or redundant tags from RFC 3066. In many cases, 1104 these tags have become obsolete because the values they represent 1105 were later encoded by ISO 639. 1107 Records that contain a 'Preferred-Value' field MUST also have a 1108 'Deprecated' field. This field contains a date of deprecation. 1109 Thus, a language tag processor can use the registry to construct the 1110 valid, non-deprecated set of subtags for a given date. In addition, 1111 for any given tag, a processor can construct the set of valid 1112 language tags that correspond to that tag for all dates up to the 1113 date of the registry. The ability to do these mappings MAY be 1114 beneficial to applications that are matching, selecting, for 1115 filtering content based on its language tags. 1117 Note that 'Preferred-Value' mappings in records of type 'region' 1118 sometimes do not represent exactly the same meaning as the original 1119 value. There are many reasons for a country code to be changed, and 1120 the effect this has on the formation of language tags will depend on 1121 the nature of the change in question. 1123 In particular, the 'Preferred-Value' field does not imply retagging 1124 content that uses the affected subtag. 1126 The field 'Preferred-Value' MUST NOT be modified once created in the 1127 registry. The field MAY be added to records according to the rules 1128 in Section 3.3. 1130 The 'Preferred-Value' field in records of type "grandfathered" and 1131 "redundant" contains whole language tags that are strongly 1132 RECOMMENDED for use in place of the record's value. In many cases, 1133 the mappings were created by deprecation of the tags during the 1134 period before this document was adopted. For example, the tag "no- 1135 nyn" was deprecated in favor of the ISO 639-1-defined language code 1136 'nn'. 1138 3.1.7. Prefix Field 1140 The 'Prefix' field contains an extended language range whose subtags 1141 are appropriate to use with this subtag: each of the subtags in one 1142 of the subtag's Prefix fields MUST appear before the variant in a 1143 valid tag. For example, the variant subtag '1996' has a 'Prefix' 1144 field of "de". This means that tags starting with the sequence "de-" 1145 are appropriate with this subtag, so "de-Latg-1996" and "de-CH-1996" 1146 are both acceptable, while the tag "fr-1996" is an inappropriate 1147 choice. 1149 The field of type 'Prefix' MUST NOT be removed from any record. The 1150 field-body for this type of field MAY be modified, but only if the 1151 modification broadens the meaning of the subtag. That is, the field- 1152 body can be replaced only by a prefix a prefix of itself. For 1153 example, the Prefix "be-Latn" (Belarusian, Latin script) could be 1154 replaced by the Prefix "be" (Belarusian) but not by the Prefix "ru- 1155 Latn" (Russian, Latin script). 1157 Records of type 'variant' MAY have more than one field of type 1158 'Prefix'. Additional fields of this type MAY be added to a 'variant' 1159 record via the registration process. 1161 The field-body of the 'Prefix' field MUST NOT conflict with any 1162 'Prefix' already registered for a given record. Such a conflict 1163 would occur when when no valid tag could be constructed that would 1164 contain the prefix, such as when when two subtags each have a 1165 'Prefix' that contains the other subtag. For example, suppose that 1166 the subtag 'avariant' has the prefix "es-bvariant". Then the subtag 1167 'bvariant' cannot given the prefix 'avariant', for that would require 1168 a tag of the form "es-avariant-bvariant-avariant", which would not be 1169 valid. 1171 Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. 1173 3.1.8. Suppress-Script Field 1175 The field 'Suppress-Script' contains a script subtag (whose record 1176 appears in the registry). The field 'Suppress-Script' MUST only 1177 appear in records whose 'Type' field-body is 'language'. This field 1178 MUST NOT appear more than one time in a record. This field indicates 1179 a script used to write the overwhelming majority of documents for the 1180 given language. This script code therefore adds no distinguishing 1181 information to a language tag. This helps ensure greater 1182 compatibility between the language tags generated according to the 1183 rules in this document and language tags and tag processors or 1184 consumers based on RFC 3066 by indicating that the script subtag 1185 SHOULD NOT be used for most documents in that language. For example, 1186 virtually all Icelandic documents are written in the Latin script, 1187 making the subtag 'Latn' redundant in the tag "is-Latn". 1189 Many language subtag records do not have a Suppress-Script field. 1190 The lack of a Suppress-Script might indicate that the language is 1191 customarily written in more than one script or that the language is 1192 not customarily written at all. It might also mean that sufficient 1193 information was not available when the record was created and thus 1194 remains a candidate for future registration. 1196 3.1.9. Macrolanguage Field 1198 The Macrolanguage field contains a primary or extended language 1199 subtag that encompasses this subtag's language. That is, the 1200 language subtag whose record this field appears in is sometimes 1201 considered to be a sub-language of the Macrolanguage. Macrolanguage 1202 values are defined by ISO 639-3 and the exact nature of the 1203 relationship between the encompassed and encompassing languages 1204 varies on a case-by-case basis. 1206 This field can be useful to applications or users when selecting 1207 language tags or as additional metadata useful in matching. The 1208 Macrolanguage field can only occur in records of type 'language' or 1209 'extlang'. Only values assigned by ISO 639-3 will be considered for 1210 inclusion. Macrolanguage fields MAY be added or removed via the 1211 normal registration process whenever ISO 639-3 defines new values. 1212 Macrolanguages are informational, and MAY be removed or changed if 1213 ISO 639-3 changes the values. 1215 For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn' 1216 (Norwegian Nynorsk) each have a Macrolanguage entry of 'no' 1217 (Norwegian). For more information see Section 4.1. 1219 3.1.10. Comments Field 1221 The field 'Comments' conveys additional information about the record 1222 and MAY appear more than once per record. The field-body MAY include 1223 non-ASCII characters. This field MAY be inserted or changed via the 1224 registration process and no guarantee of stability is provided. The 1225 content of this field is not restricted, except by the need to 1226 register the information, the suitability of the request, and by 1227 reasonable practical size limitations. 1229 3.2. Language Subtag Reviewer 1231 The Language Subtag Reviewer moderates the ietf-languages mailing 1232 list, responds to requests for registration, and performs the other 1233 registry maintenance duties described in Section 3.3. Only the 1234 Language Subtag Reviewer is permitted to request IANA to change, 1235 update, or add records to the Language Subtag Registry. The Language 1236 Subtag Reviewer MAY delegate list moderation and other clerical 1237 duties as needed. 1239 The Language Subtag Reviewer is appointed by the IESG for an 1240 indefinite term, subject to removal or replacement at the IESG's 1241 discretion. The IESG will solicit nominees for the position (upon 1242 adoption of this document or upon a vacancy) and then solicit 1243 feedback on the nominees' qualifications. Qualified candidates 1244 should be familiar with BCP 47 and its requirements; be willing to 1245 fairly, responsively, and judiciously administer the registration 1246 process; and be suitably informed about the issues of language 1247 identification so that they can draw upon and assess the claim and 1248 contributions of language experts and subtag requesters. 1250 The subsequent performance or decisions of the Language Subtag 1251 Reviewer MAY be appealed to the IESG under the same rules as other 1252 IETF decisions (see [RFC2026]). The IESG can reverse or overturn the 1253 decision of the Language Subtag Reviewer, provide guidance, or take 1254 other appropriate actions. 1256 3.3. Maintenance of the Registry 1258 Maintenance of the registry requires that as codes are assigned or 1259 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1260 Subtag Reviewer MUST evaluate each change and determine the 1261 appropriate course of action according to the rules in this document. 1262 Such updates follow the registration process described in 1263 Section 3.5. Usually the Language Subtag Reviewer will start the 1264 process for the new or updated record by filling in the registration 1265 form and submitting it. If a change to one of these standards takes 1266 place and the Language Subtag Reviewer does not do this in a timely 1267 manner, then any interested party MAY submit the form. Thereafter 1268 the registration process continues normally. 1270 The Language Subtag Reviewer MUST ensure that new subtags meet the 1271 requirements elsewhere in this document (and most especially in 1272 Section 3.4) or submit an appropriate registration form for an 1273 alternate subtag as described in that section. Each individual 1274 subtag affected by a change MUST be sent to the ietf-languages list 1275 with its own registration form and in a separate message. 1277 3.4. Stability of IANA Registry Entries 1279 The stability of entries and their meaning in the registry is 1280 critical to the long-term stability of language tags. The rules in 1281 this section guarantee that a specific language tag's meaning is 1282 stable over time and will not change. 1284 These rules specifically deal with how changes to codes (including 1285 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1286 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1287 Subtag Registry. Assignments to the IANA Language Subtag Registry 1288 MUST follow the following stability rules: 1290 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1291 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1292 guaranteed to be stable over time. 1294 2. Values in the 'Description' field MUST NOT be changed in a way 1295 that would invalidate previously-existing tags. They MAY be 1296 broadened somewhat in scope, changed to add information, or 1297 adapted to the most common modern usage. For example, countries 1298 occasionally change their official names; a historical example 1299 of this would be "Upper Volta" changing to "Burkina Faso". 1301 3. Values in the field 'Prefix' MAY be added to records of type 1302 'variant' via the registration process. If a prefix is added to 1303 a variant record, 'Comment' fields SHOULD be used to explain 1304 different usages with the various prefixes. 1306 4. Values in the field 'Prefix' in records of type 'variant' MAY be 1307 modified, so long as the modifications broaden the set of 1308 prefixes. That is, a prefix MAY be replaced by one of its own 1309 prefixes. For example, the prefix "en-US" could be replaced by 1310 "en", but not by the prefixes "en-Latn", "fr", or "en-US-boont". 1312 If one of those prefixes were needed, a new Prefix SHOULD be 1313 registered. 1315 5. Values in the field 'Prefix' in records of type 'extlang' MUST 1316 NOT be modified. 1318 6. Values in the field 'Prefix' MUST NOT be removed. 1320 7. The field 'Comments' MAY be added, changed, modified, or removed 1321 via the registration process or any of the processes or 1322 considerations described in this section. 1324 8. The field 'Suppress-Script' MAY be added or removed via the 1325 registration process. 1327 9. The field 'Macrolanguage' MAY be added or removed via the 1328 registration process, but only in response to changes made by 1329 ISO 639. The Macrolanguage field appears whenever a language 1330 has a corresponding Macrolanguage in ISO 639. That is, the 1331 macrolanguage fields in the registry exactly match those of ISO 1332 639. No other macrolanguage mappings will be considered for 1333 registration. 1335 10. Codes assigned by ISO 639-1 that do not conflict with existing 1336 two-letter primary language subtags and which have no 1337 corresponding three-letter primary or extended language subtags 1338 defined in the registry are entered into the IANA registry as 1339 new records of type 'language'. 1341 11. Codes assigned by ISO 639-2 that do not conflict with existing 1342 three-letter primary or extended language subtags are entered 1343 into the IANA registry as new records of type 'language'. 1345 12. Codes assigned by ISO 639-3 that do not conflict with existing 1346 three-letter primary or extended language subtags are entered 1347 into the IANA registry as new records. 1349 1. Codes that have a defined "macrolanguage" mapping at the 1350 time of their registration MUST be entered into the registry 1351 as records of type 'extlang' with a 'Prefix' field 1352 containing the appropriate prefix tag. They MUST also 1353 include a "Macrolanguage" field in their record. 1355 2. Codes that represent sign languages MUST be entered into the 1356 registry as record of type 'extlang' with a 'Prefix' field 1357 that matches the Basic Language Range "sgn" (see Section 1358 3.3.1 "Basic Filtering" in [RFC4647]). 1360 3. All other codes MUST be entered into the registry as records 1361 of type 'language'. 1363 13. A record of type 'language' or 'extlang' MUST NOT be registered 1364 if there exists a record of either type with the same subtag 1365 value. For example, if an 'extlang' subtag 'foo' exists in the 1366 registry, all attempts to register a 'language' subtag 'foo' 1367 will be rejected. 1369 14. Codes assigned by ISO 15924 and ISO 3166 that do not conflict 1370 with existing subtags of the associated type and whose meaning 1371 is not the same as an existing subtag of the same type are 1372 entered into the IANA registry as new records. 1374 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1375 withdrawn by their respective maintenance or registration 1376 authority remain valid in language tags. A 'Deprecated' field 1377 containing the date of withdrawal MUST be added to the record. 1378 If a new record of the same type is added that represents a 1379 replacement value, then a 'Preferred-Value' field MAY also be 1380 added. The registration process MAY be used to add comments 1381 about the withdrawal of the code by the respective standard. 1383 Example The region code 'TL' was assigned to the country 1384 'Timor-Leste', replacing the code 'TP' (which was assigned to 1385 'East Timor' when it was under administration by Portugal). 1386 The subtag 'TP' remains valid in language tags, but its 1387 record contains the a 'Preferred-Value' of 'TL' and its field 1388 'Deprecated' contains the date the new code was assigned 1389 ('2004-07-06'). 1391 16. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1392 with existing subtags of the associated type, including subtags 1393 that are deprecated, MUST NOT be entered into the registry. The 1394 following additional considerations apply to subtag values that 1395 are reassigned: 1397 A. For ISO 639 codes, if the newly assigned code's meaning is 1398 not represented by a subtag in the IANA registry, the 1399 Language Subtag Reviewer, as described in Section 3.5, SHALL 1400 prepare a proposal for entering in the IANA registry as soon 1401 as practical a registered language subtag as an alternate 1402 value for the new code. The form of the registered language 1403 subtag will be at the discretion of the Language Subtag 1404 Reviewer and MUST conform to other restrictions on language 1405 subtags in this document. 1407 B. For all subtags whose meaning is derived from an external 1408 standard (that is, by ISO 639, ISO 15924, ISO 3166, or UN 1409 M.49), if a new meaning is assigned to an existing code and 1410 the new meaning broadens the meaning of that code, then the 1411 meaning for the associated subtag MAY be changed to match. 1412 The meaning of a subtag MUST NOT be narrowed, however, as 1413 this can result in an unknown proportion of the existing 1414 uses of a subtag becoming invalid. Note: ISO 639 1415 maintenance agency/registration authority (MA/RA) has 1416 adopted a similar stability policy. 1418 C. For ISO 15924 codes, if the newly assigned code's meaning is 1419 not represented by a subtag in the IANA registry, the 1420 Language Subtag Reviewer, as described in Section 3.5, SHALL 1421 prepare a proposal for entering in the IANA registry as soon 1422 as practical a registered variant subtag as an alternate 1423 value for the new code. The form of the registered variant 1424 subtag will be at the discretion of the Language Subtag 1425 Reviewer and MUST conform to other restrictions on variant 1426 subtags in this document. 1428 D. For ISO 3166 codes, if the newly assigned code's meaning is 1429 associated with the same UN M.49 code as another 'region' 1430 subtag, then the existing region subtag remains as the 1431 preferred value for that region and no new entry is created. 1432 A comment MAY be added to the existing region subtag 1433 indicating the relationship to the new ISO 3166 code. 1435 E. For ISO 3166 codes, if the newly assigned code's meaning is 1436 associated with a UN M.49 code that is not represented by an 1437 existing region subtag, then the Language Subtag Reviewer, 1438 as described in Section 3.5, SHALL prepare a proposal for 1439 entering the appropriate UN M.49 country code as an entry in 1440 the IANA registry. 1442 F. For ISO 3166 codes, if there is no associated UN numeric 1443 code, then the Language Subtag Reviewer SHALL petition the 1444 UN to create one. If there is no response from the UN 1445 within ninety days of the request being sent, the Language 1446 Subtag Reviewer SHALL prepare a proposal for entering in the 1447 IANA registry as soon as practical a registered variant 1448 subtag as an alternate value for the new code. The form of 1449 the registered variant subtag will be at the discretion of 1450 the Language Subtag Reviewer and MUST conform to other 1451 restrictions on variant subtags in this document. This 1452 situation is very unlikely to ever occur. 1454 17. UN M.49 has codes for both countries and areas (such as '276' 1455 for Germany) and geographical regions and sub-regions (such as 1456 '150' for Europe). UN M.49 country or area codes for which 1457 there is no corresponding ISO 3166 code SHOULD NOT be 1458 registered, except as a surrogate for an ISO 3166 code that is 1459 blocked from registration by an existing subtag. If such a code 1460 becomes necessary, then the registration authority for ISO 3166 1461 SHOULD first be petitioned to assign a code to the region. If 1462 the petition for a code assignment by ISO 3166 is refused or not 1463 acted on in a timely manner, the registration process described 1464 in Section 3.5 MAY then be used to register the corresponding UN 1465 M.49 code. This way, UN M.49 codes remain available as the 1466 value of last resort in cases where ISO 3166 reassigns a 1467 deprecated value in the registry. 1469 18. Stability provisions apply to grandfathered tags with this 1470 exception: should it be possible to compose one of the 1471 grandfathered tags from registered subtags, then the field 1472 'Type' in that record is changed from 'grandfathered' to 1473 'redundant'. Note that this will not affect language tags that 1474 match the grandfathered tag, since these tags will now match 1475 valid generative subtag sequences. For example, this document 1476 caused the ISO 639-3 code 'gan', used in the redundant tag "zh- 1477 gan", to be registered as an extended language subtag. The 1478 formerly-grandfathered tag "zh-gan" became a redundant tag as a 1479 result (but existing content or implementations that use "zh- 1480 gan" remain valid). 1482 Note: The redundant and grandfathered entries together are the 1483 complete list of tags registered under [RFC3066]. The redundant tags 1484 are those that can now be formed using the subtags defined in the 1485 registry together with the rules of Section 2.2. The grandfathered 1486 entries include those that can never be legal under those same 1487 provisions plus those tags that contain subtags not yet registered 1488 or, perhaps, inappropriate for registration. 1490 The set of redundant and grandfathered tags is permanent and stable: 1491 new entries in this section MUST NOT be added and existing entries 1492 MUST NOT be removed. Records of type 'grandfathered' MAY have their 1493 type converted to 'redundant'; see item 12 in Section 3.6 for more 1494 information. The decision-making process about which tags were 1495 initially grandfathered and which were made redundant is described in 1496 [RFC4645]. 1498 RFC 3066 tags that were deprecated prior to the adoption of [RFC4646] 1499 are part of the list of grandfathered tags, and their component 1500 subtags were not included as registered variants (although they 1501 remain eligible for registration). For example, the tag "art-lojban" 1502 was deprecated in favor of the language subtag 'jbo'. 1504 3.5. Registration Procedure for Subtags 1506 The procedure given here MUST be used by anyone who wants to use a 1507 subtag not currently in the IANA Language Subtag Registry. 1509 Only subtags of type 'language' and 'variant' will be considered for 1510 independent registration of new subtags. Subtags needed for 1511 stability and subtags necessary to keep the registry synchronized 1512 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1513 defined by this document also use this process, as described in 1514 Section 3.3. Stability provisions are described in Section 3.4. 1516 This procedure MAY also be used to register or alter the information 1517 for the 'Description', 'Comments', 'Deprecated', 'Prefix', or 1518 'Suppress-Script' fields in a subtag's record as described in 1519 Section 3.4. Changes to all other fields in the IANA registry are 1520 NOT permitted. 1522 Registering a new subtag or requesting modifications to an existing 1523 tag or subtag starts with the requester filling out the registration 1524 form reproduced below. Note that each response is not limited in 1525 size so that the request can adequately describe the registration. 1526 The fields in the "Record Requested" section SHOULD follow the 1527 requirements in Section 3.1. 1529 LANGUAGE SUBTAG REGISTRATION FORM 1530 1. Name of requester: 1531 2. E-mail address of requester: 1532 3. Record Requested: 1534 Type: 1535 Subtag: 1536 Description: 1537 Prefix: 1538 Preferred-Value: 1539 Deprecated: 1540 Suppress-Script: 1541 Macrolanguage: 1542 Comments: 1544 4. Intended meaning of the subtag: 1545 5. Reference to published description 1546 of the language (book or article): 1547 6. Any other relevant information: 1549 Figure 4: The Language Subtag Registration Form 1551 Examples of completed registration forms can be found in Appendix C 1552 or online at http://www.iana.org/assignments/lang-subtags-templates/. 1554 The subtag registration form MUST be sent to 1555 for a two-week review period before it can 1556 be submitted to IANA. If modifications are made to the request 1557 during the course of the registration process (such as corrections to 1558 meet the requirements in Section 3.1) the modified form MUST also be 1559 sent to at least one week prior to 1560 submission to IANA. 1562 Whenever an entry is created or modified in the registry, the 'File- 1563 Date' record at the start of the registry is updated to reflect the 1564 most recent modification date in the [RFC3339] "full-date" format. 1566 Before forwarding a new registration to IANA, the Language Subtag 1567 Reviewer MUST ensure that values in the 'Subtag' field match case 1568 according to the description in Section 3.1. 1570 The ietf-languages list is an open list and can be joined by sending 1571 a request to . The list can be 1572 hosted by IANA or by any third party at the request of IESG. 1574 Some fields in both the registration form as well as the registry 1575 record itself permit the use of non-ASCII characters. Registration 1576 requests SHOULD use the UTF-8 encoding for consistency and clarity. 1577 However, since some mail clients do not support this encoding, other 1578 encodings MAY be used for the registration request. The Language 1579 Subtag Reviewer is responsible for converting the record to UTF-8 and 1580 ensuring that the proper Unicode characters appear in both the 1581 archived request form and the registry record. In the case of a 1582 transcription or encoding error by IANA, the Language Subtag Reviewer 1583 will request that the registry be repaired, providing any necessary 1584 information to assist IANA. 1586 Variant subtags are usually registered for use with a particular 1587 range of language tags. For example, the subtag 'rozaj' is intended 1588 for use with language tags that start with the primary language 1589 subtag "sl", since Resian is a dialect of Slovenian. Thus, the 1590 subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" 1591 or "sl-IT-rozaj". This information is stored in the 'Prefix' field 1592 in the registry. Variant registration requests SHOULD include at 1593 least one 'Prefix' field in the registration form. 1595 Extended language subtags MUST include exactly one 'Prefix' field. 1597 The 'Prefix' field for a given registered subtag exists in the IANA 1598 registry as a guide to usage. Additional prefixes MAY be added by 1599 filing an additional registration form. In that form, the "Any other 1600 relevant information:" field MUST indicate that it is the addition of 1601 a prefix. 1603 Requests to add a prefix to a variant subtag that imply a different 1604 semantic meaning will probably be rejected. For example, a request 1605 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1606 nedis" represented some German dialect would be rejected. The 1607 'nedis' subtag represents a particular Slovenian dialect and the 1608 additional registration would change the semantic meaning assigned to 1609 the subtag. A separate subtag SHOULD be proposed instead. 1611 The 'Description' field MUST contain a description of the tag being 1612 registered written or transcribed into the Latin script; it MAY also 1613 include a description in a non-Latin script. The 'Description' field 1614 is used for identification purposes and doesn't necessarily represent 1615 the actual native name of the language or variation or to be in any 1616 particular language. 1618 While the 'Description' field itself is not guaranteed to be stable 1619 and errata corrections MAY be undertaken from time to time, attempts 1620 to provide translations or transcriptions of entries in the registry 1621 itself will probably be frowned upon by the community or rejected 1622 outright, as changes of this nature have an impact on the provisions 1623 in Section 3.4. 1625 When the two-week period has passed, the Language Subtag Reviewer 1626 MUST take one of the following actions: 1628 o Explicitly accept the request and forward the form containing the 1629 record to be inserted or modified to iana@iana.org according to 1630 the procedure described in Section 3.3. 1632 o Explicitly reject the request because of significant objections 1633 raised on the list or due to problems with constraints in this 1634 document (which MUST be explicitly cited). 1636 o Extend the review period by granting an additional two-week 1637 increment to permit further discussion. After each two-week 1638 increment, the Language Subtag Reviewer MUST indicate on the list 1639 whether the registration has been accepted, rejected, or extended. 1641 Note that the Language Subtag Reviewer MAY raise objections on the 1642 list if he or she so desires. The important thing is that the 1643 objection MUST be made publicly. 1645 Sometimes the request needs to be modified as a result of discussion 1646 during the review period or due to requirements in this document. 1648 The applicant, Language Subtag Reviewer, or others are free to submit 1649 a modified version of the completed registration form, which will be 1650 considered in lieu of the original request with the explicit approval 1651 of the applicant. Such changes do not restart the two-week 1652 discussion period, although an application containing the final 1653 record submitted to IANA MUST appear on the list at least one week 1654 prior to the Language Subtag Reviewer forwarding the record to IANA. 1655 The applicant is also free to modify a rejected application with 1656 additional information and submit it again; this starts a new two- 1657 week comment period. 1659 Registrations initiated due to the provisions of Section 3.3 or 1660 Section 3.4 SHALL NOT be rejected altogether (since they have to 1661 ultimately appear in the registry) and SHOULD be completed as quickly 1662 as possible. The review process allows list members to comment on 1663 the specific information in the form and the record it contains and 1664 thus help ensure that it is correct and consistent. The Language 1665 Subtag Reviewer MAY reject a specific version of the form, but MUST 1666 include in the rejection a suitable replacement, extending the review 1667 period as described above, until the form is in a format worthy of 1668 reviewer's approval. 1670 Decisions made by the Language Subtag Reviewer MAY be appealed to the 1671 IESG [RFC2028] under the same rules as other IETF decisions 1672 [RFC2026]. This includes a decision to extend the review period or 1673 the failure to announce a decision in a clear and timely manner. 1675 The approved records appear in the Language Subtag Registry. The 1676 approved registration forms are available online under 1677 http://www.iana.org/assignments/lang-subtags-templates/. 1679 Updates or changes to existing records follow the same procedure as 1680 new registrations. The Language Subtag Reviewer decides whether 1681 there is consensus to update the registration following the two week 1682 review period; normally, objections by the original registrant will 1683 carry extra weight in forming such a consensus. 1685 Registrations are permanent and stable. Once registered, subtags 1686 will not be removed from the registry and will remain a valid way in 1687 which to specify a specific language or variant. 1689 Note: The purpose of the "Reference to published description" section 1690 in the registration form is to aid in verifying whether a language is 1691 registered or what language or language variation a particular subtag 1692 refers to. In most cases, reference to an authoritative grammar or 1693 dictionary of that language will be useful; in cases where no such 1694 work exists, other well-known works describing that language or in 1695 that language MAY be appropriate. The Language Subtag Reviewer 1696 decides what constitutes "good enough" reference material. This 1697 requirement is not intended to exclude particular languages or 1698 dialects due to the size of the speaker population or lack of a 1699 standardized orthography. Minority languages will be considered 1700 equally on their own merits. 1702 3.6. Possibilities for Registration 1704 Possibilities for registration of subtags or information about 1705 subtags include: 1707 o Primary language subtags for languages not listed in ISO 639 that 1708 are not variants of any listed or registered language MAY be 1709 registered. At the time this document was created, there were no 1710 examples of this form of subtag. Before attempting to register a 1711 language subtag, there MUST be an attempt to register the language 1712 with ISO 639. Subtags MUST NOT be registered for languages 1713 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3, 1714 or that are under consideration by the ISO 639 registration 1715 authorities, or that have never been attempted for registration 1716 with those authorities. If ISO 639 has previously rejected a 1717 language for registration, it is reasonable to assume that there 1718 must be additional, very compelling evidence of need before it 1719 will be registered as a primary language subtag in the IANA 1720 registry (to the extent that it is very unlikely that any subtags 1721 will be registered of this type). 1723 o Dialect or other divisions or variations within a language, its 1724 orthography, writing system, regional or historical usage, 1725 transliteration or other transformation, or distinguishing 1726 variation MAY be registered as variant subtags. An example is the 1727 'rozaj' subtag (the Resian dialect of Slovenian). 1729 o The addition or maintenance of fields (generally of an 1730 informational nature) in Tag or Subtag records as described in 1731 Section 3.1 and subject to the stability provisions in 1732 Section 3.4. This includes descriptions, comments, deprecation 1733 and preferred values for obsolete or withdrawn codes, or the 1734 addition of script or extlang information to primary language 1735 subtags. 1737 o The addition of records and related field value changes necessary 1738 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1739 UN M.49 as described in Section 3.4. 1741 Subtags proposed for registration that would cause all or part of a 1742 grandfathered tag to become redundant but whose meaning conflicts 1743 with or alters the meaning of the grandfathered tag MUST be rejected. 1745 This document leaves the decision on what subtags or changes to 1746 subtags are appropriate (or not) to the registration process 1747 described in Section 3.5. 1749 Note: four-character primary language subtags are reserved to allow 1750 for the possibility of alpha4 codes in some future addition to the 1751 ISO 639 family of standards. 1753 ISO 639 defines a maintenance agency for additions to and changes in 1754 the list of languages in ISO 639. This agency is: 1756 International Information Centre for Terminology (Infoterm) 1757 Aichholzgasse 6/12, AT-1120 1758 Wien, Austria 1759 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1761 ISO 639-2 defines a maintenance agency for additions to and changes 1762 in the list of languages in ISO 639-2. This agency is: 1764 Library of Congress 1765 Network Development and MARC Standards Office 1766 Washington, D.C. 20540 USA 1767 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1768 URL: http://www.loc.gov/standards/iso639-2 1770 ISO 639-3 defines a maintenance agency for additions to and changes 1771 in the list of languages in ISO 639-3. This agency is: 1773 SIL International 1774 ISO 639-3 Registrar 1775 7500 W. Camp Wisdom Rd. 1776 Dallas, TX 75236 USA 1777 Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546 1778 Email: iso639-3@sil.org 1779 URL: http://www.sil.org/iso639-3 1781 The maintenance agency for ISO 3166 (country codes) is: 1783 ISO 3166 Maintenance Agency 1784 c/o International Organization for Standardization 1785 Case postale 56 1786 CH-1211 Geneva 20 Switzerland 1787 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1788 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1790 The registration authority for ISO 15924 (script codes) is: 1792 Unicode Consortium Box 391476 1793 Mountain View, CA 94039-1476, USA 1794 URL: http://www.unicode.org/iso15924 1796 The Statistics Division of the United Nations Secretariat maintains 1797 the Standard Country or Area Codes for Statistical Use and can be 1798 reached at: 1800 Statistical Services Branch 1801 Statistics Division 1802 United Nations, Room DC2-1620 1803 New York, NY 10017, USA 1805 Fax: +1-212-963-0623 1806 E-mail: statistics@un.org 1807 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1809 3.7. Extensions and Extensions Registry 1811 Extension subtags are those introduced by single-character subtags 1812 ("singletons") other than 'x'. They are reserved for the generation 1813 of identifiers that contain a language component and are compatible 1814 with applications that understand language tags. 1816 The structure and form of extensions are defined by this document so 1817 that implementations can be created that are forward compatible with 1818 applications that might be created using singletons in the future. 1819 In addition, defining a mechanism for maintaining singletons will 1820 lend stability to this document by reducing the likely need for 1821 future revisions or updates. 1823 Single-character subtags are assigned by IANA using the "IETF 1824 Consensus" policy defined by [RFC2434]. This policy requires the 1825 development of an RFC, which SHALL define the name, purpose, 1826 processes, and procedures for maintaining the subtags. The 1827 maintaining or registering authority, including name, contact email, 1828 discussion list email, and URL location of the registry, MUST be 1829 indicated clearly in the RFC. The RFC MUST specify or include each 1830 of the following: 1832 o The specification MUST reference the specific version or revision 1833 of this document that governs its creation and MUST reference this 1834 section of this document. 1836 o The specification and all subtags defined by the specification 1837 MUST follow the ABNF and other rules for the formation of tags and 1838 subtags as defined in this document. In particular, it MUST 1839 specify that case is not significant and that subtags MUST NOT 1840 exceed eight characters in length. 1842 o The specification MUST specify a canonical representation. 1844 o The specification of valid subtags MUST be available over the 1845 Internet and at no cost. 1847 o The specification MUST be in the public domain or available via a 1848 royalty-free license acceptable to the IETF and specified in the 1849 RFC. 1851 o The specification MUST be versioned, and each version of the 1852 specification MUST be numbered, dated, and stable. 1854 o The specification MUST be stable. That is, extension subtags, 1855 once defined by a specification, MUST NOT be retracted or change 1856 in meaning in any substantial way. 1858 o The specification MUST include in a separate section the 1859 registration form reproduced in this section (below) to be used in 1860 registering the extension upon publication as an RFC. 1862 o IANA MUST be informed of changes to the contact information and 1863 URL for the specification. 1865 IANA will maintain a registry of allocated single-character 1866 (singleton) subtags. This registry MUST use the record-jar format 1867 described by the ABNF in Section 3.1. Upon publication of an 1868 extension as an RFC, the maintaining authority defined in the RFC 1869 MUST forward this registration form to iesg@ietf.org, who MUST 1870 forward the request to iana@iana.org. The maintaining authority of 1871 the extension MUST maintain the accuracy of the record by sending an 1872 updated full copy of the record to iana@iana.org with the subject 1873 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only 1874 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY 1875 be modified in these updates. 1877 Failure to maintain this record, maintain the corresponding registry, 1878 or meet other conditions imposed by this section of this document MAY 1879 be appealed to the IESG [RFC2028] under the same rules as other IETF 1880 decisions (see [RFC2026]) and MAY result in the authority to maintain 1881 the extension being withdrawn or reassigned by the IESG. 1883 %% 1884 Identifier: 1885 Description: 1886 Comments: 1887 Added: 1888 RFC: 1889 Authority: 1890 Contact_Email: 1891 Mailing_List: 1892 URL: 1893 %% 1895 Figure 5: Format of Records in the Language Tag Extensions Registry 1897 'Identifier' contains the single-character subtag (singleton) 1898 assigned to the extension. The Internet-Draft submitted to define 1899 the extension SHOULD specify which letter or digit to use, although 1900 the IESG MAY change the assignment when approving the RFC. 1902 'Description' contains the name and description of the extension. 1904 'Comments' is an OPTIONAL field and MAY contain a broader description 1905 of the extension. 1907 'Added' contains the date the RFC was published in the "full-date" 1908 format specified in [RFC3339]. For example: 2004-06-28 represents 1909 June 28, 2004, in the Gregorian calendar. 1911 'RFC' contains the RFC number assigned to the extension. 1913 'Authority' contains the name of the maintaining authority for the 1914 extension. 1916 'Contact_Email' contains the email address used to contact the 1917 maintaining authority. 1919 'Mailing_List' contains the URL or subscription email address of the 1920 mailing list used by the maintaining authority. 1922 'URL' contains the URL of the registry for this extension. 1924 The determination of whether an Internet-Draft meets the above 1925 conditions and the decision to grant or withhold such authority rests 1926 solely with the IESG and is subject to the normal review and appeals 1927 process associated with the RFC process. 1929 Extension authors are strongly cautioned that many (including most 1930 well-formed) processors will be unaware of any special relationships 1931 or meaning inherent in the order of extension subtags. Extension 1932 authors SHOULD avoid subtag relationships or canonicalization 1933 mechanisms that interfere with matching or with length restrictions 1934 that sometimes exist in common protocols where the extension is used. 1935 In particular, applications MAY truncate the subtags in doing 1936 matching or in fitting into limited lengths, so it is RECOMMENDED 1937 that the most significant information be in the most significant 1938 (left-most) subtags and that the specification gracefully handle 1939 truncated subtags. 1941 When a language tag is to be used in a specific, known, protocol, it 1942 is RECOMMENDED that that the language tag not contain extensions not 1943 supported by that protocol. In addition, note that some protocols 1944 MAY impose upper limits on the length of the strings used to store or 1945 transport the language tag. 1947 3.8. Update of the Language Subtag Registry 1949 Upon adoption of this document the IANA Language Subtag Registry will 1950 need an update so that it contains the complete set of subtags valid 1951 in a language tag. This collection of subtags, along with a 1952 description of the process used to create it, is described by 1953 [registry-update]. IANA will publish the updated version of the 1954 registry described by this document using the instructions and 1955 content of [registry-update]. Once published by IANA, the 1956 maintenance procedures, rules, and registration processes described 1957 in this document will be available for new registrations or updates. 1959 Registrations that are in process under the rules defined in 1960 [RFC4646] when this document is adopted MUST be completed under the 1961 rules contained in this document. 1963 4. Formation and Processing of Language Tags 1965 This section addresses how to use the information in the registry 1966 with the tag syntax to choose, form, and process language tags. 1968 4.1. Choice of Language Tag 1970 The guiding principle in forming language tags is to "tag content 1971 wisely." Sometimes there is a choice between several possible tags 1972 for the same content. The choice of which tag to use depends on the 1973 content and application in question and some amount of judgment might 1974 be necessary when selecting a tag. 1976 Interoperability is best served when the same language tag is used 1977 consistently to represent the same language. If an application has 1978 requirements that make the rules here inapplicable, then that 1979 application risks damaging interoperability. It is strongly 1980 RECOMMENDED that users not define their own rules for language tag 1981 choice. 1983 A subtag SHOULD only be used when it adds useful distinguishing 1984 information to the tag. Extraneous subtags interfere with the 1985 meaning, understanding, and processing of language tags. In 1986 particular, users and implementations SHOULD follow the 'Prefix' and 1987 'Suppress-Script' fields in the registry (defined in Section 3.1): 1988 these fields provide guidance on when specific additional subtags 1989 SHOULD be used or avoided in a language tag. 1991 Some applications can benefit from the use of script subtags in 1992 language tags, as long as the use is consistent for a given context. 1993 Script subtags are never appropriate for unwritten content (such as 1994 audio recordings). 1996 Script subtags were not formally defined in [RFC3066] and their use 1997 can affect matching and subtag identification for implementations of 1998 RFC 3066, as these subtags appear between the primary language and 1999 region subtags. For example, if an implementation selects content 2000 using Basic Filtering [RFC4647] (originally described in Section 2.5 2001 of [RFC3066]) and the user requested the language range "en-US", 2002 content labeled "en-Latn-US" will not match the request and thus not 2003 be selected. Therefore, it is important to know when script subtags 2004 will customarily be used and when they ought not be used. In the 2005 registry, the Suppress-Script field helps ensure greater 2006 compatibility between the language tags by defining when users SHOULD 2007 NOT include a script subtag with a particular primary language 2008 subtag. 2010 Extended language subtags (type 'extlang' in the registry; see 2011 Section 3.1) also appear between the primary language and subsequent 2012 (script, region, or variant) subtags. In most cases, use the 2013 Macrolangauge (indicated by the Prefix) by itself to form the 2014 language tag in preference to including the extended language subtag. 2015 Only use the extended language subtag if it adds useful 2016 distinguishing information to the tag within your application. 2018 The choice of subtags used to form a language tag SHOULD be guided by 2019 the following rules: 2021 1. Use as precise a tag as possible, but no more specific than is 2022 justified. Avoid using subtags that are not important for 2023 distinguishing content in an application. 2025 * For example, 'de' might suffice for tagging an email written 2026 in German, while "de-CH-1996" is probably unnecessarily 2027 precise for such a task. 2029 2. The script subtag SHOULD NOT be used to form language tags unless 2030 the script adds some distinguishing information to the tag. The 2031 field 'Suppress-Script' in the primary language record in the 2032 registry indicates script subtags that do not add distinguishing 2033 information for most applications. For example: 2035 * The subtag 'Latn' should not be used with the primary language 2036 'en' because nearly all English documents are written in the 2037 Latin script and it adds no distinguishing information. 2038 However, if a document were written in English mixing Latin 2039 script with another script such as Braille ('Brai'), then it 2040 might be appropriate to choose to indicate both scripts to aid 2041 in content selection, such as the application of a style 2042 sheet. 2044 * When labeling content that is unwritten (such as a recording 2045 of human speech), the script subtag should not be used, even 2046 if the language is customarily written in several scripts. 2047 Thus the subtitles to a movie might use the tag "zh-cmn-Hant" 2048 (Chinese, Mandarin, Traditional script), but the audio track 2049 for the same language would be tagged "zh-cmn". 2051 3. If a tag or subtag has a 'Preferred-Value' field in its registry 2052 entry, then the value of that field SHOULD be used to form the 2053 language tag in preference to the tag or subtag in which the 2054 preferred value appears. 2056 * For example, use 'he' for Hebrew in preference to 'iw'. 2058 4. [ISO639-2] has defined several codes included in the subtag 2059 registry that require additional care when choosing language 2060 tags. In most of these cases, where omitting the language tag is 2061 permitted, such omission is preferable to using these codes. 2062 Language tags SHOULD NOT incorporate these subtags as a prefix, 2063 unless the additional information conveys some value to the 2064 application. 2066 1. Use specific language subtags or subtag sequences in 2067 preference to subtags for language collections. A "language 2068 collection" is a subtag derived from one of the [ISO639-2] 2069 codes that represents multiple related languages. These 2070 codes are included as primary language subtags in the 2071 registry. For example, the code 'cmc' represents "Chamic 2072 languages". The registry contains values for each of the 2073 approximately ten individual languages represented by this 2074 collective code. Some other examples include the subtags 2075 Germanic languages ('gem') or Algonquian languages ('alg'). 2076 Since these codes are interpreted inclusively, content tagged 2077 with "en" (English), "de" (German), or "gsw" (Swiss German, 2078 Alemannic) could also (but SHOULD NOT) be tagged with "gem" 2079 (Germanic languages). Subtags derived from collection codes 2080 SHOULD NOT be used be used unless more specific language 2081 information is not available. Note that matching 2082 implementations generally do not understand the relationship 2083 between the collection and its encompassed languages, and so 2084 users ought not assume a subtag based on a language 2085 collection is a useful means for selecting content in its 2086 encompassed languages. 2088 2. The 'mul' (Multiple) primary language subtag identifies 2089 content in multiple languages. It SHOULD NOT be used when a 2090 list of languages (such as Content-Language) or individual 2091 tags for each content element can be used instead. 2093 3. The 'und' (Undetermined) primary language subtag identifies 2094 linguistic content whose language is not known. It SHOULD 2095 NOT be used unless a language tag is required and language 2096 information is not available or cannot be determined. 2097 Omitting the language tag (where permitted) is preferred. 2098 The 'und' subtag MAY be useful for protocols that require a 2099 language tag to be provided or where a primary language 2100 subtag is required (such as in "und-Latn"). The 'und' subtag 2101 MAY also be useful when matching language tags in certain 2102 situations. 2104 4. The 'zxx' (Non-Linguistic) primary language subtag identifies 2105 content that has no language. Some examples might include 2106 instrumental or electronic music; sound recordings consisting 2107 of nonverbal sounds; audiovisual materials with no narration, 2108 printed titles, or subtitles; machine-readable data files 2109 consisting of machine languages or character codes; or 2110 programming source code. Note: where there are fragments of 2111 linguistic content, such as programming source code 2112 containing comments written in English, the subtag 'zxx' 2113 might still be used to indicate the primary status of the 2114 content, just as 'en' can be applied to a predominantly 2115 English text that contains a few French phrases. 2117 5. The 'mis' (Uncoded) primary language subtag identifies 2118 content whose language is known but which does not currently 2119 have a corresponding subtag. This subtag SHOULD NOT be used. 2120 Because the addition of other codes in the future can render 2121 its application invalid, it is inherently unstable and hence 2122 incompatible with the stability goals of BCP 47. It is 2123 always preferable to use other subtags: either 'und' or (with 2124 prior agreement) private use subtags. 2126 6. The grandfathered tag "i-default" (Default Language) was 2127 originally registered according to [RFC1766] to meet the 2128 needs of [RFC2277]. It is used to indicate not a specific 2129 language, but rather, it identifies the condition or content 2130 used where the language preferences of the user cannot be 2131 established. It SHOULD NOT be used except as a means of 2132 labeling the default content for applications or protocols 2133 that require default language content to be labeled with that 2134 specific tag. It MAY also be used by an application or 2135 protocol to identify when the default language content is 2136 being returned. 2138 5. The same variant subtag MUST NOT be used more than once within a 2139 language tag. 2141 * For example, the tag "de-DE-1901-1901" is not valid. 2143 Languages with a Macrolanguage field in the registry sometimes can be 2144 usefully referenced using their Macrolanguage. However, the 2145 Macrolanguage field doesn't define what the relationship is between 2146 the language subtag whose record it appears in and its encompassed 2147 language or languages. Nor does it define how the encompassed 2148 languages are related to one-another. In some cases, the 2149 Macrolanguage has a standard form as well as a variety of less-common 2150 dialects. In other cases there is no particular standard form and 2151 the encompassed subtags describe specific variations within the 2152 parent language. 2154 Applications MAY use Macrolanguage information to improve matching or 2155 language negotiation. For example, the information that 'sr' 2156 (Serbian) and 'hr' (Croatian) share a Macrolanguage expresses a 2157 closer relation between those languages than between, say, 'sr' 2158 (Serbian) and 'ma' (Macedonian). It is valid to use the encompassed 2159 language or just its Macrolanguage to form language tags. However, 2160 many matching applications will not be aware of the relationship 2161 between the languages. Care in selecting which subtags are used is 2162 crucial to interoperability. In general, use the most specific tag. 2163 However, where the standard form of an encompassed language is 2164 captured by the Macrolanguage, the Macrolanguage SHOULD be used in 2165 preference to one of its sublanguages unless there is a specific 2166 reason not to. 2168 In particular, the Chinese family of languages call for special 2169 consideration. Because the written form is very similar for most 2170 languages having 'zh' as a Macrolanguage (and because historically 2171 subtags for the various sub-languages and dialects were not 2172 available), languages such as 'yue' (Cantonese) have usually used 2173 tags beginning with the subtag 'zh'. This means that Macrolanguage 2174 information is can be usefully applied when searching for content or 2175 when providing fallbacks in language negotiation. For example, the 2176 information that 'yue' has a macrolangauge of 'zh' could be used in 2177 the Lookup algorithm to fallback from a request for "yue-Hans-CN" to 2178 "zh-Hans-CN" without losing the script and region information (even 2179 though the user did not specify "zh-Hans-CN" in their request). 2181 To ensure consistent backward compatibility, this document contains 2182 several provisions to account for potential instability in the 2183 standards used to define the subtags that make up language tags. 2184 These provisions mean that no language tag created under the rules in 2185 this document will become invalid. 2187 Standards, protocols, and applications that reference this document 2188 normatively but apply different rules to the ones given in this 2189 section MUST specify how language tag selection varies from the 2190 guidelines given here. 2192 4.2. Meaning of the Language Tag 2194 The meaning of a language tag is related to the meaning of the 2195 subtags that it contains. Each subtag, in turn, implies a certain 2196 range of expectations one might have for related content, although it 2197 is not a guarantee. For example, the use of a script subtag such as 2198 'Arab' (Arabic script) does not mean that the content contains only 2199 Arabic characters. It does mean that the language involved is 2200 predominently in the Arabic script. Some subtags encompass a very 2201 wide range of variation and yet remain valid in each particular 2202 instance. 2204 Validity of a tag is not everything. A tag can be valid yet 2205 meaningless. This is unavoidable with a generative system like the 2206 language subtag mechanism. For example, a tag such as "ar-Cyrl-CO" 2207 (Arabic, Cyrillic script, as used in Colombia) is perfectly valid. 2208 However, it is unlikely to be a useful tag, as it represents an 2209 unlikely combination of language attributes that is probably 2210 unrelated to any real language usage. 2212 The relationship between the tag and the information it relates to is 2213 defined by the context in which the tag appears. Accordingly, this 2214 section gives only possible examples of its usage. 2216 o For a single information object, the associated language tags 2217 might be interpreted as the set of languages that is necessary for 2218 a complete comprehension of the complete object. Example: Plain 2219 text documents. 2221 o For an aggregation of information objects, the associated language 2222 tags could be taken as the set of languages used inside components 2223 of that aggregation. Examples: Document stores and libraries. 2225 o For information objects whose purpose is to provide alternatives, 2226 the associated language tags could be regarded as a hint that the 2227 content is provided in several languages and that one has to 2228 inspect each of the alternatives in order to find its language or 2229 languages. In this case, the presence of multiple tags might not 2230 mean that one needs to be multi-lingual to get complete 2231 understanding of the document. Example: MIME multipart/ 2232 alternative. 2234 o In markup languages, such as HTML and XML, language information 2235 can be added to each part of the document identified by the markup 2236 structure (including the whole document itself). For example, one 2237 could write C'est la vie. inside a 2238 Norwegian document; the Norwegian-speaking user could then access 2239 a French-Norwegian dictionary to find out what the marked section 2240 meant. If the user were listening to that document through a 2241 speech synthesis interface, this formation could be used to signal 2242 the synthesizer to appropriately apply French text-to-speech 2243 pronunciation rules to that span of text, instead of applying the 2244 inappropriate Norwegian rules. 2246 Language tags are related when they contain a similar sequence of 2247 subtags. For example, if a language tag B contains language tag A as 2248 a prefix, then B is typically "narrower" or "more specific" than A. 2249 Thus, "zh-Hant-TW" is more specific than "zh-Hant". 2251 This relationship is not guaranteed in all cases: specifically, 2252 languages that begin with the same sequence of subtags are NOT 2253 guaranteed to be mutually intelligible, although they might be. For 2254 example, the tag "az" shares a prefix with both "az-Latn" 2255 (Azerbaijani written using the Latin script) and "az-Cyrl" 2256 (Azerbaijani written using the Cyrillic script). A person fluent in 2257 one script might not be able to read the other, even though the text 2258 might be identical. Content tagged as "az" most probably is written 2259 in just one script and thus might not be intelligible to a reader 2260 familiar with the other script. 2262 4.3. Length Considerations 2264 There is no defined upper limit on the size of language tags. While 2265 historically most language tags have consisted of language and region 2266 subtags with a combined total length of up to six characters, larger 2267 tags have always been both possible and actually appeared in use. 2269 Neither the language tag syntax nor other requirements in this 2270 document impose a fixed upper limit on the number of subtags in a 2271 language tag (and thus an upper bound on the size of a tag). The 2272 language tag syntax suggests that, depending on the specific 2273 language, more subtags (and thus a longer tag) are sometimes 2274 necessary to completely identify the language for certain 2275 applications; thus, it is possible to envision long or complex subtag 2276 sequences. 2278 4.3.1. Working with Limited Buffer Sizes 2280 Some applications and protocols are forced to allocate fixed buffer 2281 sizes or otherwise limit the length of a language tag. A conformant 2282 implementation or specification MAY refuse to support the storage of 2283 language tags that exceed a specified length. Any such limitation 2284 SHOULD be clearly documented, and such documentation SHOULD include 2285 what happens to longer tags (for example, whether an error value is 2286 generated or the language tag is truncated). A protocol that allows 2287 tags to be truncated at an arbitrary limit, without giving any 2288 indication of what that limit is, has the potential for causing harm 2289 by changing the meaning of tags in substantial ways. 2291 In practice, most language tags do not require more than a few 2292 subtags and will not approach reasonably sized buffer limitations; 2293 see Section 4.1. 2295 Some specifications or protocols have limits on tag length but do not 2296 have a fixed length limitation. For example, [RFC2231] has no 2297 explicit length limitation: the length available for the language tag 2298 is constrained by the length of other header components (such as the 2299 charset's name) coupled with the 76-character limit in [RFC2047]. 2300 Thus, the "limit" might be 50 or more characters, but it could 2301 potentially be quite small. 2303 The considerations for assigning a buffer limit are: 2305 Implementations SHOULD NOT truncate language tags unless the 2306 meaning of the tag is purposefully being changed, or unless the 2307 tag does not fit into a limited buffer size specified by a 2308 protocol for storage or transmission. 2310 Implementations SHOULD warn the user when a tag is truncated since 2311 truncation changes the semantic meaning of the tag. 2313 Implementations of protocols or specifications that are space 2314 constrained but do not have a fixed limit SHOULD use the longest 2315 possible tag in preference to truncation. 2317 Protocols or specifications that specify limited buffer sizes for 2318 language tags MUST allow for language tags of up to 33 characters. 2320 Protocols or specifications that specify limited buffer sizes for 2321 language tags SHOULD allow for language tags of at least 42 2322 characters. 2324 The following illustration shows how the 42-character recommendation 2325 was derived. The combination of language and extended language 2326 subtags was chosen for future compatibility. At up to 15 characters, 2327 this combination is longer than the longest possible primary language 2328 subtag (8 characters): 2330 language = 3 (ISO 639-2; ISO 639-1 requires 2) 2331 extlang1 = 4 (each subsequent subtag includes '-') 2332 extlang2 = 4 (unlikely: needs prefix="language-extlang1") 2333 extlang3 = 4 (extremely unlikely) 2334 script = 5 (if not suppressed: see Section 4.1) 2335 region = 4 (UN M.49; ISO 3166 requires 3) 2336 variant1 = 9 (needs 'language' as a prefix) 2337 variant2 = 9 (needs 'language-variant1' as a prefix) 2339 total = 42 characters 2341 Figure 6: Derivation of the Limit on Tag Length 2343 4.3.2. Truncation of Language Tags 2345 Truncation of a language tag alters the meaning of the tag, and thus 2346 SHOULD be avoided. However, truncation of language tags is sometimes 2347 necessary due to limited buffer sizes. Such truncation MUST NOT 2348 permit a subtag to be chopped off in the middle or the formation of 2349 invalid tags (for example, one ending with the "-" character). 2351 This means that applications or protocols that truncate tags MUST do 2352 so by progressively removing subtags along with their preceding "-" 2353 from the right side of the language tag until the tag is short enough 2354 for the given buffer. If the resulting tag ends with a single- 2355 character subtag, that subtag and its preceding "-" MUST also be 2356 removed. For example: 2358 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 2359 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 2360 2. zh-Latn-CN-variant1-a-extend1 2361 3. zh-Latn-CN-variant1 2362 4. zh-Latn-CN 2363 5. zh-Latn 2364 6. zh 2366 Figure 7: Example of Tag Truncation 2368 4.4. Canonicalization of Language Tags 2370 Since a particular language tag is sometimes used by many processes, 2371 language tags SHOULD always be created or generated in a canonical 2372 form. 2374 A language tag is in canonical form when: 2376 1. The tag is well-formed according the rules in Section 2.1 and 2377 Section 2.2. 2379 2. Subtags of type 'Region' that have a Preferred-Value mapping in 2380 the IANA registry (see Section 3.1) SHOULD be replaced with their 2381 mapped value. Note: In rare cases, the mapped value will also 2382 have a Preferred-Value. 2384 3. Redundant or grandfathered tags that have a Preferred-Value 2385 mapping in the IANA registry (see Section 3.1) MUST be replaced 2386 with their mapped value. These items either are deprecated 2387 mappings created before the adoption of this document (such as 2388 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 2389 the result of later registrations or additions to this document 2390 (for example, "zh-hakka" was deprecated in favor of the language- 2391 extlang combination "zh-hak" when this document was adopted). 2393 4. Other subtags that have a Preferred-Value mapping in the IANA 2394 registry (see Section 3.1) MUST be replaced with their mapped 2395 value. These items consist entirely of clerical corrections to 2396 ISO 639-1 in which the deprecated subtags have been maintained 2397 for compatibility purposes. 2399 5. If more than one extension subtag sequence exists, the extension 2400 sequences are ordered into case-insensitive ASCII order by 2401 singleton subtag. 2403 Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical 2404 form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in 2405 canonical form. 2407 Example: The language tag "en-BU" (English as used in Burma) is not 2408 canonical because the 'BU' subtag has a canonical mapping to 'MM' 2409 (Myanmar), although the tag "en-BU" maintains its validity. 2411 Canonicalization of language tags does not imply anything about the 2412 use of upper or lowercase letters when processing or comparing 2413 subtags (and as described in Section 2.1). All comparisons MUST be 2414 performed in a case-insensitive manner. 2416 When performing canonicalization of language tags, processors MAY 2417 regularize the case of the subtags (that is, this process is 2418 OPTIONAL), following the case used in the registry. Note that this 2419 corresponds to the following casing rules: uppercase all non-initial 2420 two-letter subtags; titlecase all non-initial four-letter subtags; 2421 lowercase everything else. 2423 Note: Case folding of ASCII letters in certain locales, unless 2424 carefully handled, sometimes produces non-ASCII character values. 2425 The Unicode Character Database file "SpecialCasing.txt" defines the 2426 specific cases that are known to cause problems with this. In 2427 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2428 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2429 Implementers SHOULD specify a locale-neutral casing operation to 2430 ensure that case folding of subtags does not produce this value, 2431 which is illegal in language tags. For example, if one were to 2432 uppercase the region subtag 'in' using Turkish locale rules, the 2433 sequence U+0130 U+004E would result instead of the expected 'IN'. 2435 Note: if the field 'Deprecated' appears in a registry record without 2436 an accompanying 'Preferred-Value' field, then that tag or subtag is 2437 deprecated without a replacement. Validating processors SHOULD NOT 2438 generate tags that include these values, although the values are 2439 canonical when they appear in a language tag. 2441 An extension MUST define any relationships that exist between the 2442 various subtags in the extension and thus MAY define an alternate 2443 canonicalization scheme for the extension's subtags. Extensions MAY 2444 define how the order of the extension's subtags are interpreted. For 2445 example, an extension could define that its subtags are in canonical 2446 order when the subtags are placed into ASCII order: that is, "en-a- 2447 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2448 define that the order of the subtags influences their semantic 2449 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2450 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2451 so that they are tolerant of the typical processes described in 2452 Section 3.7. 2454 4.5. Considerations for Private Use Subtags 2456 Private use subtags, like all other subtags, MUST conform to the 2457 format and content constraints in the ABNF. Private use subtags have 2458 no meaning outside the private agreement between the parties that 2459 intend to use or exchange language tags that employ them. The same 2460 subtags MAY be used with a different meaning under a separate private 2461 agreement. They SHOULD NOT be used where alternatives exist and 2462 SHOULD NOT be used in content or protocols intended for general use. 2464 Private use subtags are simply useless for information exchange 2465 without prior arrangement. The value and semantic meaning of private 2466 use tags and of the subtags used within such a language tag are not 2467 defined by this document. 2469 Subtags defined in the IANA registry as having a specific private use 2470 meaning convey more information that a purely private use tag 2471 prefixed by the singleton subtag 'x'. For applications, this 2472 additional information MAY be useful. 2474 For example, the region subtags 'AA', 'ZZ', and in the ranges 2475 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2476 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2477 great deal of public, interchangeable information about the language 2478 material (that it is Chinese in the simplified Chinese script and is 2479 suitable for some geographic region 'XQ'). While the precise 2480 geographic region is not known outside of private agreement, the tag 2481 conveys far more information than an opaque tag such as "x-someLang", 2482 which contains no information about the language subtag or script 2483 subtag outside of the private agreement. 2485 However, in some cases content tagged with private use subtags MAY 2486 interact with other systems in a different and possibly unsuitable 2487 manner compared to tags that use opaque, privately defined subtags, 2488 so the choice of the best approach sometimes depends on the 2489 particular domain in question. 2491 5. IANA Considerations 2493 This section deals with the processes and requirements necessary for 2494 IANA to undertake to maintain the subtag and extension registries as 2495 defined by this document and in accordance with the requirements of 2496 [RFC2434]. 2498 The impact on the IANA maintainers of the two registries defined by 2499 this document will be a small increase in the frequency of new 2500 entries or updates. 2502 5.1. Language Subtag Registry 2504 Upon adoption of this document, IANA will update the registry using 2505 instructions and content provided in a companion document: 2506 [registry-update]. The criteria and process for selecting the 2507 updated set of records are described in that document. The updated 2508 set of records represents no impact on IANA, since the work to create 2509 it will be performed externally. 2511 Future work on the Language Subtag Registry has been limited to 2512 inserting or replacing whole records preformatted for IANA by the 2513 Language Subtag Reviewer as described in Section 3.3 of this document 2514 and archiving and making publically available the forwarded 2515 registration form. 2517 Each registration form sent to IANA contains a single record for 2518 incorporation into the registry. The form MUST be sent to 2519 iana@iana.org by the Language Subtag Reviewer. It will have a 2520 subject line indicating whether the enclosed form represents an 2521 insertion of a new record (indicated by the word "INSERT" in the 2522 subject line) or a replacement of an existing record (indicated by 2523 the word "MODIFY" in the subject line). Records MUST NOT be deleted 2524 from the registry. 2526 IANA MUST extract the record from the form and place the inserted or 2527 modified record into the appropriate section of the language subtag 2528 registry, grouping the records by their 'Type' field. Inserted 2529 records MAY be placed anywhere in the appropriate section; there is 2530 no guarantee of the order of the records beyond grouping them 2531 together by 'Type'. Modified records MUST overwrite the record they 2532 replace. 2534 IANA MUST update the File-Date record to contain the most recent 2535 modification date when performing any inserting or modification: 2536 included in any request to insert or modify records will be a new 2537 File-Date record indicating the acceptance date of the record. This 2538 record MUST be placed first in the registry, replacing the existing 2539 File-Date record. In the event that the File-Date record present in 2540 the registry has a later date than the record being inserted or 2541 modified, then the latest (most recent) record MUST be preserved. 2542 IANA SHOULD process multiple registration requests in order according 2543 to the File-Date in the form, since one registration could otherwise 2544 cause a more recent change to be overwritten. 2546 The updated registry file MUST use the UTF-8 character encoding and 2547 IANA MUST check the registry file for proper encoding. Non-ASCII 2548 characters can be sent to IANA by attaching the registration form to 2549 the email message or by using various encodings in the mail message 2550 body (UTF-8 is recommended). IANA will verify any unclear or 2551 corrupted characters with the Language Subtag Reviewer prior to 2552 posting the updated registry. 2554 The registration form sent to IANA MUST be archived and made publicly 2555 available from 2556 "http://www.iana.org/assignments/lang-subtags-templates/". Note that 2557 multiple registrations can pertain to the same record in the 2558 registry. 2560 Developers who are dependent upon the language subtag registry 2561 sometimes would like to be informed of changes in the registry so 2562 that they can update their implementations. When any change is made 2563 to the language subtag registry, IANA MUST send an announcement 2564 message to ietf-languages-announcements@iana.org (a self-subscribing 2565 list that only IANA can post to). 2567 5.2. Extensions Registry 2569 The Language Tag Extensions Registry can contain at most 35 records 2570 and thus changes to this registry are expected to be very infrequent. 2572 Future work by IANA on the Language Tag Extensions Registry is 2573 limited to two cases. First, the IESG MAY request that new records 2574 be inserted into this registry from time to time. These requests 2575 MUST include the record to insert in the exact format described in 2576 Section 3.7. In addition, there MAY be occasional requests from the 2577 maintaining authority for a specific extension to update the contact 2578 information or URLs in the record. These requests MUST include the 2579 complete, updated record. IANA is not responsible for validating the 2580 information provided, only that it is properly formatted. It should 2581 reasonably be seen to come from the maintaining authority named in 2582 the record present in the registry. 2584 6. Security Considerations 2586 Language tags used in content negotiation, like any other information 2587 exchanged on the Internet, might be a source of concern because they 2588 might be used to infer the nationality of the sender, and thus 2589 identify potential targets for surveillance. 2591 This is a special case of the general problem that anything sent is 2592 visible to the receiving party and possibly to third parties as well. 2593 It is useful to be aware that such concerns can exist in some cases. 2595 The evaluation of the exact magnitude of the threat, and any possible 2596 countermeasures, is left to each application protocol (see BCP 72 2597 [RFC3552] for best current practice guidance on security threats and 2598 defenses). 2600 The language tag associated with a particular information item is of 2601 no consequence whatsoever in determining whether that content might 2602 contain possible homographs. The fact that a text is tagged as being 2603 in one language or using a particular script subtag provides no 2604 assurance whatsoever that it does not contain characters from scripts 2605 other than the one(s) associated with or specified by that language 2606 tag. 2608 Since there is no limit to the number of variant, private use, and 2609 extension subtags, and consequently no limit on the possible length 2610 of a tag, implementations need to guard against buffer overflow 2611 attacks. See Section 4.3 for details on language tag truncation, 2612 which can occur as a consequence of defenses against buffer overflow. 2614 Although the specification of valid subtags for an extension (see 2615 Section 3.7) MUST be available over the Internet, implementations 2616 SHOULD NOT mechanically depend on it being always accessible, to 2617 prevent denial-of-service attacks. 2619 7. Character Set Considerations 2621 The syntax in this document requires that language tags use only the 2622 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2623 character sets, so the composition of language tags should not have 2624 any character set issues. 2626 Rendering of characters based on the content of a language tag is not 2627 addressed in this memo. Historically, some languages have relied on 2628 the use of specific character sets or other information in order to 2629 infer how a specific character should be rendered (notably this 2630 applies to language- and culture-specific variations of Han 2631 ideographs as used in Japanese, Chinese, and Korean). When language 2632 tags are applied to spans of text, rendering engines sometimes use 2633 that information in deciding which font to use in the absence of 2634 other information, particularly where languages with distinct writing 2635 traditions use the same characters. 2637 8. Changes from RFC 4646 2639 The main goal for this revision of this document was to incorporate 2640 ISO 639-3 and its attendent set of language codes into the IANA 2641 Language Subtag Registry, permitting the identification of many more 2642 languages and dialects than previously supported. 2644 The specific changes in this document to meet these goals are: 2646 o Defines the incorporation of ISO 639-3 codes as language and 2647 extlang subtags. Extlangs are now permitted in language tags. 2648 The changes necessary to achieve this were: 2650 * something 2652 o Changed the ABNF related to grandfathered tags. The irregular 2653 tags are now listed. Well-formed grandfathered tags are now 2654 described by the 'langtag' production and the 'grandfathered' 2655 production was removed as a result. Also: added description of 2656 both types of grandfathered tags to Section 2.2.8. 2658 o Added the paragraph on "collections" to Section 4.1. 2660 o Changed the capitalization rules for 'Tag' fields in Section 3.1. 2662 o Split section 3.1 up into subsections. 2664 o Modified section 3.5 to allow Suppress-Script fields to be added, 2665 modified, or removed via the registration process. This was an 2666 erratum from RFC 4646. 2668 o Modified examples that used region code 'CS' (formerly Serbia and 2669 Montenegro) to use 'RS' (Serbia) instead. 2671 o Modified the rules for creating and maintaining record 2672 'Description' fields to prevent duplicates, including inverted 2673 duplicates. 2675 o Removed the lengthy description of why RFC 4646 was created from 2676 this section, which also caused the removal of the reference to 2677 XML Schema. 2679 o Modified the text in section 2.1 to place more emphasis on the 2680 fact that language tags are not case sensitive. 2682 o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS" 2683 and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the 2684 Suppress-Script on 'Latn' with 'fr'. 2686 o Changed the requirements for well-formedness to make singleton 2687 repetition checking optional (it is required for validity 2688 checking) in Section 2.2.9. 2690 o Changed the text in Section 2.2.9 refering to grandfathered 2691 checking to note that the list is now included in the ABNF. 2693 o Modified and added text to Section 3.2. The job description was 2694 placed first. A note was added making clear that the Language 2695 Subtag Reviewer may delegate various non-critical duties, 2696 including list moderation. Finally, additional text was added to 2697 make the appointment process clear and to clarify that decisions 2698 and performance of the reviewer are appealable. 2700 o Added text to Section 3.5 clarifying that the ietf-languages list 2701 is operated by whomever the IESG appoints. 2703 o Added text to Section 3.1.4 clarifying that the first Description 2704 in a 'language' or 'extlang' record matches the corresponding 2705 Reference Name for the language in ISO 639-3. 2707 o Modified Section 2.2.9 to define classes of conformance related to 2708 specific tags (formerly 'well-formed' and 'valid' referred to 2709 implementations). 2711 o Added text to the end of Section 3.1.2 noting that future versions 2712 of this document might add new field types and recommending that 2713 implementations ignore any unrecognized fields. 2715 o Modified the 'extlang' examples in Appendix A to use valid subtags 2716 and removed the note saying that they were only examples. 2718 o Added text about what the lack of a Suppress-Script field means in 2719 a record to Section 3.1.8. 2721 o Added text allowing the correction of misspellings and typographic 2722 errors to Section 3.1.4. 2724 o Added text to Section 3.1.7 disallowing Prefix field conflicts 2725 (such as circular prefix references). 2727 o Modified text in Section 3.5 to require the subtag reviewer to 2728 announce his/her decision (or extension) following the two-week 2729 period. Also clarified that any decision or failure to decide can 2730 be appealed. 2732 o Modified text in Section 4.1 to include the (heretofore anecdotal) 2733 guiding principle of tag choice, and clarifying the non-use of 2734 script subtags in non-written applications. Also updated examples 2735 in this section to use Chamic languages as an example of language 2736 collections. 2738 o Prohibited multiple use of the same variant in a tag (i.e. "de- 2739 1901-1901"). Previously this was only a recommendation 2740 ("SHOULD"). 2742 o Removed inappropriate [RFC2119] language from the illustration in 2743 Section 4.3.1. 2745 o Replaced the example of "zh-gouyu" with "zh-hakka"->"zh-hak" in 2746 Section 4.4, noting that it was this document that caused the 2747 change. 2749 o Replaced the section in Section 4.1 dealing with "mul"/"und" to 2750 include the subtags 'zxx' and 'mis', as well as the tag 2751 "i-default". A normative reference to RFC 2277 was added, along 2752 with an informative reference to MARC21. 2754 o Added text to Section 3.5 clarifying that any modifications of a 2755 registration request must be sent to the ietf-languages list 2756 before submission to IANA. 2758 o Changed the ABNF for the record-jar format from using the LWSP 2759 production to use a folding whitespace production similar to obs- 2760 FWS in [RFC4234]. This effectively prevents unintentional blank 2761 lines inside a field. 2763 o Clarified and revised text in Section 3.3, Section 3.5, and 2764 Section 5.1 to clarify that the Language Subtag Reviewer sends the 2765 complete registration forms to IANA, that IANA extracts the record 2766 from the form, and that the forms must also be archived separately 2767 from the registry. 2769 o Added text to Section 5 requiring IANA to send an announcement to 2770 an ietf-languages-announce list whenever the registry is updated. 2772 o Modification of the registry to use UTF-8 as its character 2773 encoding. This also entails additional instructions to IANA and 2774 the Language Subtag Reviewer in the registration process. 2776 [[Ed.Note: Open issues in this version: 2778 Whether encompassed language rules for the creation of extlang 2779 records in the registry should be retained or modified. 2781 Inclusion of additional information related to Suppress-Script in 2782 the registry (e.g. that it wasn't assigned on purpose) 2784 ]] 2786 9. References 2788 9.1. Normative References 2790 [ISO10646] 2791 International Organization for Standardization, "ISO/IEC 2792 10646:2003. Information technology -- Universal Multiple- 2793 Octet Coded Character Set (UCS)", 2003. 2795 [ISO15924] 2796 International Organization for Standardization, "ISO 2797 15924:2004. Information and documentation -- Codes for the 2798 representation of names of scripts", January 2004. 2800 [ISO3166-1] 2801 International Organization for Standardization, "ISO 3166- 2802 1:1997. Codes for the representation of names of countries 2803 and their subdivisions -- Part 1: Country codes", 1997. 2805 [ISO639-1] 2806 International Organization for Standardization, "ISO 639- 2807 1:2002. Codes for the representation of names of languages 2808 -- Part 1: Alpha-2 code", 2002. 2810 [ISO639-2] 2811 International Organization for Standardization, "ISO 639- 2812 2:1998. Codes for the representation of names of languages 2813 -- Part 2: Alpha-3 code, first edition", 1998. 2815 [ISO639-3] 2816 International Organization for Standardization, "ISO 639- 2817 3:2007. Codes for the representation of names of languages 2818 -- Part 3: Alpha-3 code for comprehensive coverage of 2819 languages", 2007. 2821 [ISO646] International Organization for Standardization, "ISO/IEC 2822 646:1991, Information technology -- ISO 7-bit coded 2823 character set for information interchange.", 1991. 2825 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2826 3", BCP 9, RFC 2026, October 1996. 2828 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2829 the IETF Standards Process", BCP 11, RFC 2028, 2830 October 1996. 2832 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2833 Requirement Levels", BCP 14, RFC 2119, March 1997. 2835 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 2836 Languages", BCP 18, RFC 2277, January 1998. 2838 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2839 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2840 October 1998. 2842 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2843 Understanding Concerning the Technical Work of the 2844 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2846 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2847 Timestamps", RFC 3339, July 2002. 2849 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 2850 Specifications: ABNF", RFC 4234, October 2005. 2852 [RFC4645] Ewell, D., Ed., "Initial Language Subtag Registry", 2853 September 2006, . 2855 [RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of Language 2856 Tags", September 2006, 2857 . 2859 [UN_M.49] Statistics Division, United Nations, "Standard Country or 2860 Area Codes for Statistical Use", UN Standard Country or 2861 Area Codes for Statistical Use, Revision 4 (United Nations 2862 publication, Sales No. 98.XVII.9, June 1999. 2864 9.2. Informative References 2866 [RFC1766] Alvestrand, H., "Tags for the Identification of 2867 Languages", RFC 1766, March 1995. 2869 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2870 Part Three: Message Header Extensions for Non-ASCII Text", 2871 RFC 2047, November 1996. 2873 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2874 Word Extensions: Character Sets, Languages, and 2875 Continuations", RFC 2231, November 1997. 2877 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2878 10646", RFC 2781, February 2000. 2880 [RFC3066] Alvestrand, H., "Tags for the Identification of 2881 Languages", BCP 47, RFC 3066, January 2001. 2883 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2884 Text on Security Considerations", BCP 72, RFC 3552, 2885 July 2003. 2887 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 2888 10646", STD 63, RFC 3629, November 2003. 2890 [RFC4646] Phillips, A., Ed. and M. Davis, Ed., "Tags for the 2891 Identification of Languages", September 2006, 2892 . 2894 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2895 Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003. 2896 ISBN 0-321-49081-0)", January 2007. 2898 [iso639.prin] 2899 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2900 Committee: Working principles for ISO 639 maintenance", 2901 March 2000, 2902 . 2905 [record-jar] 2906 Raymond, E., "The Art of Unix Programming", 2003, 2907 . 2909 [registry-update] 2910 Ewell, D., Ed., "Update to the Language Subtag Registry", 2911 September 2006, . 2914 Appendix A. Acknowledgements 2916 Any list of contributors is bound to be incomplete; please regard the 2917 following as only a selection from the group of people who have 2918 contributed to make this document what it is today. 2920 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the 2921 precursors of this document, made enormous contributions directly or 2922 indirectly to this document and are generally responsible for the 2923 success of language tags. 2925 The following people contributed to this document: 2927 Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan, 2928 Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion 2929 Gunn, Kent Karlsson, Chris Newman, Randy Presuhn, Stephen Silver, and 2930 many, many others. 2932 Very special thanks must go to Harald Tveit Alvestrand, who 2933 originated RFCs 1766 and 3066, and without whom this document would 2934 not have been possible. 2936 Special thanks go to Michael Everson, who served as the Language Tag 2937 Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as 2938 the Language Subtag Reviewer since the adoption of RFC 4646. 2940 Special thanks also to Doug Ewell, for his production of the first 2941 complete subtag registry, his work to support and maintain new 2942 registrations, and his careful editorship of both RFC 4645 and 2943 [registry-update]. 2945 Appendix B. Examples of Language Tags (Informative) 2947 Simple language subtag: 2949 de (German) 2951 fr (French) 2953 ja (Japanese) 2955 i-enochian (example of a grandfathered tag) 2957 Language subtag plus Script subtag: 2959 zh-Hant (Chinese written using the Traditional Chinese script) 2961 zh-Hans (Chinese written using the Simplified Chinese script) 2963 sr-Cyrl (Serbian written using the Cyrillic script) 2965 sr-Latn (Serbian written using the Latin script) 2967 Language-Script-Region: 2969 zh-Hans-CN (Chinese written using the Simplified script as used in 2970 mainland China) 2972 sr-Latn-RS (Serbian written using the Latin script as used in 2973 Serbia) 2975 Language-Variant: 2977 sl-rozaj (Resian dialect of Slovenian) 2979 sl-nedis (Nadiza dialect of Slovenian) 2981 Language-Region-Variant: 2983 de-CH-1901 (German as used in Switzerland using the 1901 variant 2984 [orthography]) 2986 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 2988 Language-Script-Region-Variant: 2990 hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as 2991 used in Italy) 2993 Language-Region: 2995 de-DE (German for Germany) 2997 en-US (English as used in the United States) 2999 es-419 (Spanish appropriate for the Latin America and Caribbean 3000 region using the UN region code) 3002 Private use subtags: 3004 de-CH-x-phonebk 3006 az-Arab-x-AZE-derbend 3008 Extended language subtags: 3010 zh-cmn 3012 zh-cmn-Hant-CN 3014 Private use registry values: 3016 x-whatever (private use using the singleton 'x') 3018 qaa-Qaaa-QM-x-southern (all private tags) 3020 de-Qaaa (German, with a private script) 3022 sr-Latn-QM (Serbian, Latin-script, private region) 3024 sr-Qaaa-RS (Serbian, private script, for Serbia) 3026 Tags that use extensions (examples ONLY: extensions MUST be defined 3027 by revision or update to this document or by RFC): 3029 en-US-u-islamCal 3031 zh-CN-a-myExt-x-private 3033 en-a-myExt-b-another 3035 Some Invalid Tags: 3037 de-419-DE (two region tags) 3039 a-DE (use of a single-character subtag in primary position; note 3040 that there are a few grandfathered tags that start with "i-" that 3041 are valid) 3043 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter 3044 prefix) 3046 Appendix C. Examples of Registration Forms 3047 LANGUAGE SUBTAG REGISTRATION FORM 3048 1. Name of requester: Han Steenwijk 3049 2. E-mail address of requester: han.steenwijk @ unipd.it 3050 3. Record Requested: 3052 Type: variant 3053 Subtag: biske 3054 Description: The San Giorgio dialect of Resian 3055 Description: The Bila dialect of Resian 3056 Prefix: sl-rozaj 3057 Comments: The dialect of San Giorgio/Bila is one of the 3058 four major local dialects of Resian 3060 4. Intended meaning of the subtag: The local variety of Resian as 3061 spoken in San Giorgio/Bila 3063 5. Reference to published description of the language (book or 3064 article): 3065 -- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich 3066 govorov, Varsava - Peterburg: Vende - Kozancikov, 1875. 3068 LANGUAGE SUBTAG REGISTRATION FORM 3069 1. Name of requester: Jaska Zedlik 3070 2. E-mail address of requester: jz53 @ zedlik.com 3071 3. Record Requested: 3073 Type: variant 3074 Subtag: tarask 3075 Description: Belarusian in Taraskievica orthography 3076 Prefix: be 3077 Comments: The subtag represents Branislau Taraskievic's Belarusian 3078 orthography as published in "Bielaruski klasycny pravapis" by Juras 3079 Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka 3080 (Vilnia-Miensk 2005). 3082 4. Intended meaning of the subtag: 3084 The subtag is intended to represent the Belarusian orthography as 3085 published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk 3086 Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005). 3088 5. Reference to published description of the language (book or article): 3090 Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd. 3091 "Bielaruskaha kamitetu", 1929, 5th edition. 3093 Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier. 3094 Bielaruski klasycny pravapis. Vilnia-Miensk, 2005. 3096 6. Any other relevant information: 3098 Belarusian in Taraskievica orthography became widely used, especially in 3099 Belarusian-speaking Internet segment, but besides this some books and 3100 newspapers are also printed using this orthography of Belarusian. 3102 Authors' Addresses 3104 Addison Phillips (editor) 3105 Yahoo! Inc. 3107 Email: addison@inter-locale.com 3108 URI: http://www.inter-locale.com 3110 Mark Davis (editor) 3111 Google 3113 Email: mark.davis@macchiato.com or mark.davis@google.com 3115 Full Copyright Statement 3117 Copyright (C) The IETF Trust (2007). 3119 This document is subject to the rights, licenses and restrictions 3120 contained in BCP 78, and except as set forth therein, the authors 3121 retain all their rights. 3123 This document and the information contained herein are provided on an 3124 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 3125 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 3126 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 3127 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 3128 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 3129 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 3131 Intellectual Property 3133 The IETF takes no position regarding the validity or scope of any 3134 Intellectual Property Rights or other rights that might be claimed to 3135 pertain to the implementation or use of the technology described in 3136 this document or the extent to which any license under such rights 3137 might or might not be available; nor does it represent that it has 3138 made any independent effort to identify any such rights. Information 3139 on the procedures with respect to rights in RFC documents can be 3140 found in BCP 78 and BCP 79. 3142 Copies of IPR disclosures made to the IETF Secretariat and any 3143 assurances of licenses to be made available, or the result of an 3144 attempt made to obtain a general license or permission for the use of 3145 such proprietary rights by implementers or users of this 3146 specification can be obtained from the IETF on-line IPR repository at 3147 http://www.ietf.org/ipr. 3149 The IETF invites any interested party to bring to its attention any 3150 copyrights, patents or patent applications, or other proprietary 3151 rights that may cover technology that may be required to implement 3152 this standard. Please address the information to the IETF at 3153 ietf-ipr@ietf.org. 3155 Acknowledgment 3157 Funding for the RFC Editor function is provided by the IETF 3158 Administrative Support Activity (IASA).