idnits 2.17.1 draft-ietf-ltru-4646bis-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 3148. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3159. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3166. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3172. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC4646, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 14, 2007) is 6007 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'ISO10646' is defined on line 2809, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2860 ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) ** Downref: Normative reference to an Informational RFC: RFC 4645 -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) -- Obsolete informational reference (is this intentional?): RFC 4646 (Obsoleted by RFC 5646) Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc. 4 Obsoletes: 4646 (if approved) M. Davis, Ed. 5 Intended status: Best Current Google 6 Practice November 14, 2007 7 Expires: May 17, 2008 9 Tags for Identifying Languages 10 draft-ietf-ltru-4646bis-09 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on May 17, 2008. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2007). 41 Abstract 43 This document describes the structure, content, construction, and 44 semantics of language tags for use in cases where it is desirable to 45 indicate the language used in an information object. It also 46 describes how to register values for use in language tags and the 47 creation of user-defined extensions for private interchange. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 5 53 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 5 54 2.2. Language Subtag Sources and Interpretation . . . . . . . . 8 55 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . 9 56 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11 57 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 12 58 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 13 59 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 15 60 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 16 61 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 17 62 2.2.8. Grandfathered Registrations . . . . . . . . . . . . . 18 63 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 18 64 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 20 65 3.1. Format of the IANA Language Subtag Registry . . . . . . . 20 66 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 20 67 3.1.2. Record Definitions . . . . . . . . . . . . . . . . . . 21 68 3.1.3. Subtag and Tag Fields . . . . . . . . . . . . . . . . 24 69 3.1.4. Description Field . . . . . . . . . . . . . . . . . . 24 70 3.1.5. Deprecated Field . . . . . . . . . . . . . . . . . . . 25 71 3.1.6. Preferred-Value Field . . . . . . . . . . . . . . . . 25 72 3.1.7. Prefix Field . . . . . . . . . . . . . . . . . . . . . 26 73 3.1.8. Suppress-Script Field . . . . . . . . . . . . . . . . 27 74 3.1.9. Macrolanguage Field . . . . . . . . . . . . . . . . . 27 75 3.1.10. Comments Field . . . . . . . . . . . . . . . . . . . . 28 76 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 28 77 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 29 78 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 29 79 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 34 80 3.6. Possibilities for Registration . . . . . . . . . . . . . . 38 81 3.7. Extensions and Extensions Registry . . . . . . . . . . . . 40 82 3.8. Update of the Language Subtag Registry . . . . . . . . . . 43 83 4. Formation and Processing of Language Tags . . . . . . . . . . 44 84 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 44 85 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 48 86 4.3. Length Considerations . . . . . . . . . . . . . . . . . . 50 87 4.3.1. Working with Limited Buffer Sizes . . . . . . . . . . 50 88 4.3.2. Truncation of Language Tags . . . . . . . . . . . . . 52 89 4.4. Canonicalization of Language Tags . . . . . . . . . . . . 52 90 4.5. Considerations for Private Use Subtags . . . . . . . . . . 54 91 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 56 92 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 56 93 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 57 94 6. Security Considerations . . . . . . . . . . . . . . . . . . . 58 95 7. Character Set Considerations . . . . . . . . . . . . . . . . . 59 96 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 60 97 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 64 98 9.1. Normative References . . . . . . . . . . . . . . . . . . . 64 99 9.2. Informative References . . . . . . . . . . . . . . . . . . 65 100 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 67 101 Appendix B. Examples of Language Tags (Informative) . . . . . . . 68 102 Appendix C. Examples of Registration Forms . . . . . . . . . . . 71 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 73 104 Intellectual Property and Copyright Statements . . . . . . . . . . 74 106 1. Introduction 108 Human beings on our planet have, past and present, used a number of 109 languages. There are many reasons why one would want to identify the 110 language used when presenting or requesting information. 112 A user's language preferences often need to be identified so that 113 appropriate processing can be applied. For example, the user's 114 language preferences in a Web browser can be used to select Web pages 115 appropriately. Language preferences can also be used to select among 116 tools (such as dictionaries) to assist in the processing or 117 understanding of content in different languages. 119 In addition, knowledge about the particular language used by some 120 piece of information content might be useful or even required by some 121 types of processing; for example, spell-checking, computer- 122 synthesized speech, Braille transcription, or high-quality print 123 renderings. 125 One means of indicating the language used is by labeling the 126 information content with an identifier or "tag". These tags can be 127 used to specify user preferences when selecting information content, 128 or for labeling additional attributes of content and associated 129 resources. 131 Tags can also be used to indicate additional language attributes of 132 content. For example, indicating specific information about the 133 dialect, writing system, or orthography used in a document or 134 resource may enable the user to obtain information in a form that 135 they can understand, or it can be important in processing or 136 rendering the given content into an appropriate form or style. 138 This document specifies a particular identifier mechanism (the 139 language tag) and a registration function for values to be used to 140 form tags. It also defines a mechanism for private use values and 141 future extension. 143 This document replaces [RFC4646], which replaced [RFC3066] and its 144 predecessor [RFC1766]. For a list of changes in this document, see 145 Section 8. 147 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 148 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 149 document are to be interpreted as described in [RFC2119]. 151 2. The Language Tag 153 Language tags are used to help identify languages, whether spoken, 154 written, signed, or otherwise signaled, for the purpose of 155 communication. This includes constructed and artificial languages, 156 but excludes languages not intended primarily for human 157 communication, such as programming languages. 159 2.1. Syntax 161 The language tag is composed of one or more parts, known as 162 "subtags". Each subtag consists of a sequence of alphanumeric 163 characters. Subtags are distinguished and separated from one another 164 by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a 165 "primary language" subtag and a (possibly empty) series of subsequent 166 subtags, each of which refines or narrows the range of languages 167 identified by the overall tag. 169 Usually, each type of subtag is distinguished by length, position in 170 the tag, and content: subtags can be recognized solely by these 171 features. The only exception to this is a fixed list of 172 grandfathered tags registered under RFC 3066 [RFC3066]. This makes 173 it possible to construct a parser that can extract and assign some 174 semantic information to the subtags, even if the specific subtag 175 values are not recognized. Thus, a parser need not have an up-to- 176 date copy (or any copy at all) of the subtag registry to perform most 177 searching and matching operations. 179 The syntax of the language tag in ABNF [RFC4234] is: 181 Language-Tag = langtag 182 / privateuse ; private use tag 183 / irregular ; tags grandfathered by rule 185 langtag = (language 186 ["-" script] 187 ["-" region] 188 *("-" variant) 189 *("-" extension) 190 ["-" privateuse]) 192 language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code 193 / 4ALPHA ; reserved for future use 194 / 5*8ALPHA ; registered language subtag 196 extlang = *3("-" 3ALPHA) ; specific ISO 639-3 codes 198 script = 4ALPHA ; ISO 15924 code 200 region = 2ALPHA ; ISO 3166 code 201 / 3DIGIT ; UN M.49 code 203 variant = 5*8alphanum ; registered variants 204 / (DIGIT 3alphanum) 206 extension = singleton 1*("-" (2*8alphanum)) 208 singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT 209 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" 210 ; Single alphanumerics 211 ; "x" is reserved for private use 213 privateuse = "x" 1*("-" (1*8alphanum)) 215 irregular = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default" 216 / "i-enochian" / "i-hak" / "i-klingon" / "i-lux" 217 / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" 218 / "i-tay" / "i-tsu" / "sgn-BE-fr" / "sgn-BE-nl" 219 / "sgn-CH-de" 221 alphanum = (ALPHA / DIGIT) ; letters and numbers 223 Figure 1: Language Tag ABNF 225 All subtags have a maximum length of eight characters and whitespace 226 is not permitted in a language tag. There is a subtlety in the ABNF 227 production 'variant': variants starting with a digit MAY be four 228 characters long, while those starting with a letter MUST be at least 229 five characters long. For examples of language tags, see Appendix B. 231 Note Well: the ABNF syntax does not distinguish between upper and 232 lowercase. The appearance of upper and lowercase letters in the 233 varous ABNF productions above do not affect how implementations 234 interpret tags. That is, the tag "I-AMI" matches the item "i-ami" in 235 the 'irregular' production. At all times, the tags and their 236 subtags, including private use and extensions, are to be treated as 237 case insensitive: there exist conventions for the capitalization of 238 some of the subtags, but these MUST NOT be taken to carry meaning. 240 For example: 242 o [ISO639-1] recommends that language codes be written in lowercase 243 ('mn' Mongolian). 245 o [ISO3166-1] recommends that country codes be capitalized ('MN' 246 Mongolia). 248 o [ISO15924] recommends that script codes use lowercase with the 249 initial letter capitalized ('Cyrl' Cyrillic). 251 However, in the tags defined by this document, the uppercase US-ASCII 252 letters in the range 'A' through 'Z' are considered equivalent and 253 mapped directly to their US-ASCII lowercase equivalents in the range 254 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from 255 "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of 256 these variations conveys the same meaning: Mongolian written in the 257 Cyrillic script as used in Mongolia. 259 Although case distinctions do not carry meaning in language tags, 260 consistent formatting and presentation of the tags will aid users. 261 The format of the tags and subtags in the registry is RECOMMENDED. 262 In this format, all non-initial two-letter subtags are uppercase, all 263 non-initial four-letter subtags are titlecase, and all other subtags 264 are lowercase. 266 Note that although [RFC4234] refers to octets, the language tags 267 described in this document are sequences of characters from the US- 268 ASCII [ISO646] repertoire. Language tags MAY be used in documents 269 and applications that use other encodings, so long as these encompass 270 the US-ASCII repertoire. An example of this would be an XML document 271 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 273 2.2. Language Subtag Sources and Interpretation 275 The namespace of language tags and their subtags is administered by 276 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 277 the rules in Section 5 of this document. The Language Subtag 278 Registry maintained by IANA is the source for valid subtags: other 279 standards referenced in this section provide the source material for 280 that registry. 282 Terminology used in this document: 284 o Tag or tags refers to a complete language tag, such as 285 "sr-Latn-RS" or "az-Arab-IR". Examples of tags in this document 286 are enclosed in double-quotes ("en-US"). 288 o Subtag refers to a specific section of a tag, delimited by hyphen, 289 such as the subtag 'Hant' in "zh-Hant-CN". Examples of subtags in 290 this document are enclosed in single quotes ('Hant'). 292 o Code or codes refers to values defined in external standards (and 293 which are used as subtags in this document). For example, 'Hant' 294 is an [ISO15924] script code that was used to define the 'Hant' 295 script subtag for use in a language tag. Examples of codes in 296 this document are enclosed in single quotes ('en', 'Hant'). 298 The definitions in this section apply to the various subtags within 299 the language tags defined by this document, excepting those 300 "grandfathered" tags defined in Section 2.2.8. 302 Language tags are designed so that each subtag type has unique length 303 and content restrictions. These make identification of the subtag's 304 type possible, even if the content of the subtag itself is 305 unrecognized. This allows tags to be parsed and processed without 306 reference to the latest version of the underlying standards or the 307 IANA registry and makes the associated exception handling when 308 parsing tags simpler. 310 Subtags in the IANA registry that do not come from an underlying 311 standard can only appear in specific positions in a tag. 312 Specifically, they can only occur as primary language subtags or as 313 variant subtags. 315 Note that sequences of private use and extension subtags MUST occur 316 at the end of the sequence of subtags and MUST NOT be interspersed 317 with subtags defined elsewhere in this document. 319 Single-letter and single-digit subtags are reserved for current or 320 future use. These include the following current uses: 322 o The single-letter subtag 'x' is reserved to introduce a sequence 323 of private use subtags. The interpretation of any private use 324 subtags is defined solely by private agreement and is not defined 325 by the rules in this section or in any standard or registry 326 defined in this document. 328 o All other single-letter subtags are reserved to introduce 329 standardized extension subtag sequences as described in 330 Section 3.7. 332 The single-letter subtag 'i' is used by some grandfathered tags, such 333 as "i-default", where it always appears in the first position and 334 cannot be confused with an extension. 336 2.2.1. Primary Language Subtag 338 The primary language subtag is the first subtag in a language tag 339 (with the exception of private use and certain grandfathered tags) 340 and cannot be omitted. The following rules apply to the primary 341 language subtag: 343 1. All two-character primary language subtags were defined in the 344 IANA registry according to the assignments found in the standard 345 ISO 639 Part 1, "ISO 639-1:2002, Codes for the representation of 346 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 347 assignments subsequently made by the ISO 639-1 registration 348 authority (RA) or governing standardization bodies. 350 2. All three-character primary language subtags were defined in the 351 IANA registry according to the assignments found in either ISO 352 639 Part 2, "ISO 639-2:1998 - Codes for the representation of 353 names of languages -- Part 2: Alpha-3 code - edition 1" 354 [ISO639-2], ISO 639 Part 3, "Codes for the representation of 355 names of languages -- Part 3: Alpha-3 code for comprehensive 356 coverage of languages" [ISO639-3], or assignments subsequently 357 made by the relevant ISO 639 registration authorities or 358 governing standardization bodies. 360 3. The subtags in the range 'qaa' through 'qtz' are reserved for 361 private use in language tags. These subtags correspond to codes 362 reserved by ISO 639-2 for private use. These codes MAY be used 363 for non-registered primary language subtags (instead of using 364 private use subtags following 'x-'). Please refer to Section 4.5 365 for more information on private use subtags. 367 4. All four-character language subtags are reserved for possible 368 future standardization. 370 5. All language subtags of 5 to 8 characters in length in the IANA 371 registry were defined via the registration process in Section 3.5 372 and MAY be used to form the primary language subtag. At the time 373 this document was created, there were no examples of this kind of 374 subtag and future registrations of this type will be discouraged: 375 primary languages are strongly RECOMMENDED for registration with 376 ISO 639, and proposals rejected by ISO 639/RA-JAC will be closely 377 scrutinized before they are registered with IANA. 379 6. The single-character subtag 'x' as the primary subtag indicates 380 that the language tag consists solely of subtags whose meaning is 381 defined by private agreement. For example, in the tag "x-fr-CH", 382 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 383 French language or the country of Switzerland (or any other value 384 in the IANA registry) unless there is a private agreement in 385 place to do so. See Section 4.5. 387 7. The single-character subtag 'i' is used by some grandfathered 388 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 389 grandfathered tags have a primary language subtag in their first 390 position.) 392 8. Other values MUST NOT be assigned to the primary subtag except by 393 revision or update of this document. 395 Note: For languages that have both an ISO 639-1 two-character code 396 and a three character code assigned by either ISO 639-2 or ISO 639-3, 397 only the ISO 639-1 two-character code is defined in the IANA 398 registry. 400 Note: For languages that have no ISO 639-1 two-character code and for 401 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 402 (Bibliographic) codes differ, only the Terminology code is defined in 403 the IANA registry. At the time this document was created, all 404 languages that had both kinds of three-character code were also 405 assigned a two-character code; it is expected that future assignments 406 of this nature will not occur. 408 Note: To avoid problems with versioning and subtag choice as 409 experienced during the transition between RFC 1766 and RFC 3066, as 410 well as the canonical nature of subtags defined by this document, the 411 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 412 RA-JAC) has included the following statement in [iso639.prin]: 414 "A language code already in ISO 639-2 at the point of freezing ISO 415 639-1 shall not later be added to ISO 639-1. This is to ensure 416 consistency in usage over time, since users are directed in 417 Internet applications to employ the alpha-3 code when an alpha-2 418 code for that language is not available." 420 In order to avoid instability in the canonical form of tags, if a 421 two-character code is added to ISO 639-1 for a language for which a 422 three-character code was already included in either ISO 639-2 or ISO 423 639-3, the two-character code MUST NOT be registered. See 424 Section 3.4. 426 For example, if some content were tagged with 'haw' (Hawaiian), which 427 currently has no two-character code, the tag would not be invalidated 428 if ISO 639-1 were to assign a two-character code to the Hawaiian 429 language at a later date. 431 Note: An example of independent primary language subtag registration 432 might include: one of the grandfathered IANA registrations is 433 "i-enochian". The subtag 'enochian' could be registered in the IANA 434 registry as a primary language subtag (assuming that ISO 639 does not 435 register this language first), making tags such as "enochian-AQ" and 436 "enochian-Latn" valid. 438 2.2.2. Extended Language Subtags 440 Extended language subtags are used to identify languages that are 441 encompassed by a "macrolanguage". ISO 639-3 defines certain 442 languages to be "macrolanguages"; that is, they are groups of very 443 closely related languages which are treated as a single language in 444 certain contexts. In order to improve matching behavior and tagging 445 consistency, each language encompassed by a ISO 639-3 macrolanguage 446 is represented in the IANA registry using an extended language 447 subtag, provided that it is not already represented using a language 448 subtag. The following rules apply to the extended language subtags: 450 1. These subtags were defined in the IANA registry according to 451 assignments found in ISO 639 Part 3. 453 2. A sequence of up to three extended language subtags MAY appear in 454 a language tag. This sequence MUST follow the primary language 455 subtag and precede any other subtags. 457 3. Each extended language subtag MUST only appear in a tag 458 immediately following the exact sequence of subtags that appears 459 in the 'Prefix' field in its registry record. 461 4. Other values MUST NOT be assigned to the extended language subtag 462 except by revision or update of this document. 464 Extended language subtag records MUST include exactly one 'Prefix' 465 field indicating an appropriate subtag or sequence of subtags for 466 that extended language subtag. 468 For example, the 'gan' and 'cmn' subtags represent the languages Gan 469 Chinese and Mandarin Chinese. Each is encompassed by the 470 macrolanguage 'zh' (Chinese). Therefore, they both have the prefix 471 "zh" in their registry records. Consequently, Gan Chinese is 472 represented as "zh-gan" and Mandarin Chinese as "zh-cmn". The 473 language subtag 'zh' can still be used without an extended language 474 subtag to label a resource as some unspecified variety of Chinese 475 (which in practice will usually be Mandarin, the dominant variety of 476 Chinese, but might also be some other variety). 478 Now suppose that, in the future, the ISO 639-3 Registration Authority 479 were to decide that Gan Chinese is actually two different closely 480 related languages: it might reclassify 'gan' as a macrolanguage and 481 introduce two new code elements. In that case, these code elements 482 would be added to the IANA registry as extended language subtags with 483 prefixes of "zh-gan". No change would be made to the registry record 484 for 'gan'. 486 2.2.3. Script Subtag 488 Script subtags are used to indicate the script or writing system 489 variations that distinguish the written forms of a language or its 490 dialects. The following rules apply to the script subtags: 492 1. All four-character subtags were defined according to 493 [ISO15924]--"Codes for the representation of the names of 494 scripts": alpha-4 script codes, or subsequently assigned by the 495 ISO 15924 maintenance agency or governing standardization bodies, 496 denoting the script or writing system used in conjunction with 497 this language. 499 2. Script subtags MUST immediately follow the primary language 500 subtag and all extended language subtags and MUST occur before 501 any other type of subtag described below. 503 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 504 use in language tags. These subtags correspond to codes reserved 505 by ISO 15924 for private use. These codes MAY be used for non- 506 registered script values. Please refer to Section 4.5 for more 507 information on private use subtags. 509 4. Script subtags MUST NOT be registered using the process in 510 Section 3.5 of this document. Variant subtags MAY be considered 511 for registration for that purpose. 513 5. There MUST be at most one script subtag in a language tag, and 514 the script subtag SHOULD be omitted when it adds no 515 distinguishing value to the tag or when the primary language 516 subtag's record includes a Suppress-Script field listing the 517 applicable script subtag. 519 Example: "sr-Latn" represents Serbian written using the Latin script. 521 2.2.4. Region Subtag 523 Region subtags are used to indicate linguistic variations associated 524 with or appropriate to a specific country, territory, or region. 525 Typically, a region subtag is used to indicate regional dialects or 526 usage, or region-specific spelling conventions. A region subtag can 527 also be used to indicate that content is expressed in a way that is 528 appropriate for use throughout a region, for instance, Spanish 529 content tailored to be useful throughout Latin America. 531 The following rules apply to the region subtags: 533 1. Region subtags MUST follow any language, extended language, or 534 script subtags and MUST precede all other subtags. 536 2. All two-character subtags following the primary subtag were 537 defined in the IANA registry according to the assignments found 538 in [ISO3166-1] ("Codes for the representation of names of 539 countries and their subdivisions -- Part 1: Country codes") using 540 the list of alpha-2 country codes, or using assignments 541 subsequently made by the ISO 3166 maintenance agency or governing 542 standardization bodies. In addition, the codes that are 543 "exceptionally reserved" (as opposed to "assigned") in ISO 3166-1 544 were also defined in the registry, with the exception of 'UK', 545 which is an exact synonym for the assigned code 'GB'. 547 3. All three-character subtags consisting of digit (numeric) 548 characters following the primary subtag were defined in the IANA 549 registry according to the assignments found in UN Standard 550 Country or Area Codes for Statistical Use [UN_M.49] or 551 assignments subsequently made by the governing standards body. 552 Note that not all of the UN M.49 codes are defined in the IANA 553 registry. The following rules define which codes are entered 554 into the registry as valid subtags: 556 A. UN numeric codes assigned to 'macro-geographical 557 (continental)' or sub-regions MUST be registered in the 558 registry. These codes are not associated with an assigned 559 ISO 3166 alpha-2 code and represent supra-national areas, 560 usually covering more than one nation, state, province, or 561 territory. 563 B. UN numeric codes for 'economic groupings' or 'other 564 groupings' MUST NOT be registered in the IANA registry and 565 MUST NOT be used to form language tags. 567 C. UN numeric codes for countries or areas with ambiguous ISO 568 3166 alpha-2 codes, when entered into the registry, MUST be 569 defined according to the rules in Section 3.4 and MUST be 570 used to form language tags that represent the country or 571 region for which they are defined. 573 D. UN numeric codes for countries or areas for which there is an 574 associated ISO 3166 alpha-2 code in the registry MUST NOT be 575 entered into the registry and MUST NOT be used to form 576 language tags. Note that the ISO 3166-based subtag in the 577 registry MUST actually be associated with the UN M.49 code in 578 question. 580 E. UN numeric codes and ISO 3166 alpha-2 codes for countries or 581 areas listed as eligible for registration in [RFC4645] but 582 not presently registered MAY be entered into the IANA 583 registry via the process described in Section 3.5. Once 584 registered, these codes MAY be used to form language tags. 586 F. All other UN numeric codes for countries or areas that do not 587 have an associated ISO 3166 alpha-2 code MUST NOT be entered 588 into the registry and MUST NOT be used to form language tags. 589 For more information about these codes, see Section 3.4. 591 4. Note: The alphanumeric codes in Appendix X of the UN document 592 MUST NOT be entered into the registry and MUST NOT be used to 593 form language tags. (At the time this document was created, 594 these values matched the ISO 3166 alpha-2 codes.) 596 5. There MUST be at most one region subtag in a language tag and the 597 region subtag MAY be omitted, as when it adds no distinguishing 598 value to the tag. 600 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 601 reserved for private use in language tags. These subtags 602 correspond to codes reserved by ISO 3166 for private use. These 603 codes MAY be used for private use region subtags (instead of 604 using a private use subtag sequence). Please refer to 605 Section 4.5 for more information on private use subtags. 607 "de-CH" represents German ('de') as used in Switzerland ('CH'). 609 "sr-Latn-RS" represents Serbian ('sr') written using Latin script 610 ('Latn') as used in Serbia ('RS'). 612 "es-419" represents Spanish ('es') appropriate to the UN-defined 613 Latin America and Caribbean region ('419'). 615 2.2.5. Variant Subtags 617 Variant subtags are used to indicate additional, well-recognized 618 variations that define a language or its dialects that are not 619 covered by other available subtags. The following rules apply to the 620 variant subtags: 622 1. Variant subtags are not associated with any external standard. 623 Variant subtags and their meanings are defined by the 624 registration process defined in Section 3.5. 626 2. Variant subtags MUST follow all of the other defined subtags, but 627 precede any extension or private use subtag sequences. 629 3. More than one variant MAY be used to form the language tag. 631 4. Variant subtags MUST be registered with IANA according to the 632 rules in Section 3.5 of this document before being used to form 633 language tags. In order to distinguish variants from other types 634 of subtags, registrations MUST meet the following length and 635 content restrictions: 637 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 638 at least five characters long. 640 2. Variant subtags that begin with a digit (0-9) MUST be at 641 least four characters long. 643 Variant subtag records in the language subtag registry MAY include 644 one or more 'Prefix' fields. The 'Prefix' indicates the language tag 645 or tags that would make a suitable prefix (with other subtags, as 646 appropriate) in forming a language tag with the variant. That is, 647 each of the subtags in the prefix SHOULD appear before the variant. 648 For example, the subtag 'nedis' has a Prefix of "sl", making it 649 suitable to form language tags such as "sl-nedis" and "sl-IT-nedis", 650 but not suitable for use in a tag such as "zh-nedis" or "it-IT- 651 nedis". 653 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 655 "de-CH-1996" represents German as used in Switzerland and as written 656 using the spelling reform beginning in the year 1996 C.E. 658 Most variants that share a prefix are mutually exclusive. For 659 example, the German orthographic variations '1996' and '1901' SHOULD 660 NOT be used in the same tag, as they represent the dates of different 661 spelling reforms. A variant that can meaningfully be used in 662 combination with another variant SHOULD include a 'Prefix' field in 663 its registry record that lists that other variant. For example, if 664 another German variant 'example' were created that made sense to use 665 with '1996', then 'example' should include two Prefix fields: "de" 666 and "de-1996". 668 2.2.6. Extension Subtags 670 Extensions provide a mechanism for extending language tags for use in 671 various applications. See Section 3.7. The following rules apply to 672 extensions: 674 1. Extension subtags are separated from the other subtags defined 675 in this document by a single-character subtag ("singleton"). 676 The singleton MUST be one allocated to a registration authority 677 via the mechanism described in Section 3.7 and MUST NOT be the 678 letter 'x', which is reserved for private use subtag sequences. 680 2. Note: Private use subtag sequences starting with the singleton 681 subtag 'x' are described in Section 2.2.7 below. 683 3. An extension MUST follow at least a primary language subtag. 684 That is, a language tag cannot begin with an extension. 685 Extensions extend language tags, they do not override or replace 686 them. For example, "a-value" is not a well-formed language tag, 687 while "de-a-value" is. 689 4. Each singleton subtag MUST appear at most one time in each tag 690 (other than as a private use subtag). That is, singleton 691 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 692 ccc" is invalid because the subtag 'a' appears twice. Note that 693 the tag "en-a-bbb-x-a-ccc" is valid because the second 694 appearance of the singleton 'a' is in a private use sequence. 696 5. Extension subtags MUST meet all of the requirements for the 697 content and format of subtags defined in this document. 699 6. Extension subtags MUST meet whatever requirements are set by the 700 document that defines their singleton prefix and whatever 701 requirements are provided by the maintaining authority. 703 7. Each extension subtag MUST be from two to eight characters long 704 and consist solely of letters or digits, with each subtag 705 separated by a single '-'. 707 8. Each singleton MUST be followed by at least one extension 708 subtag. For example, the tag "tlh-a-b-foo" is invalid because 709 the first singleton 'a' is followed immediately by another 710 singleton 'b'. 712 9. Extension subtags MUST follow all language, extended language, 713 script, region, and variant subtags in a tag. 715 10. All subtags following the singleton and before another singleton 716 are part of the extension. Example: In the tag "fr-a-Latn", the 717 subtag 'Latn' does not represent the script subtag 'Latn' 718 defined in the IANA Language Subtag Registry. Its meaning is 719 defined by the extension 'a'. 721 11. In the event that more than one extension appears in a single 722 tag, the tag SHOULD be canonicalized as described in 723 Section 4.4. 725 For example, if the prefix singleton 'r' and the shown subtags were 726 defined, then the following tag would be a valid example: "en-Latn- 727 GB-boont-r-extended-sequence-x-private" 729 2.2.7. Private Use Subtags 731 Private use subtags are used to indicate distinctions in language 732 important in a given context by private agreement. The following 733 rules apply to private use subtags: 735 1. Private use subtags are separated from the other subtags defined 736 in this document by the reserved single-character subtag 'x'. 738 2. Private use subtags MUST conform to the format and content 739 constraints defined in the ABNF for all subtags. 741 3. Private use subtags MUST follow all language, extended language, 742 script, region, variant, and extension subtags in the tag. 743 Another way of saying this is that all subtags following the 744 singleton 'x' MUST be considered private use. Example: The 745 subtag 'US' in the tag "en-x-US" is a private use subtag. 747 4. A tag MAY consist entirely of private use subtags. 749 5. No source is defined for private use subtags. Use of private use 750 subtags is by private agreement only. 752 6. Private use subtags are NOT RECOMMENDED where alternatives exist 753 or for general interchange. See Section 4.5 for more information 754 on private use subtag choice. 756 For example: Users who wished to utilize codes from the Ethnologue 757 publication of SIL International for language identification might 758 agree to exchange tags such as "az-Arab-x-AZE-derbend". This example 759 contains two private use subtags. The first is 'AZE' and the second 760 is 'derbend'. 762 2.2.8. Grandfathered Registrations 764 Prior to RFC 4646, whole language tags were registered according to 765 the rules in RFC 1766 and/or RFC 3066. These registered tags 766 maintain their validity. Of those tags, those that were made 767 obsolete or redundant by the advent of RFC 4646, by this document, or 768 by subsequent registration of subtags are maintained in the registry 769 in records as "redundant" records. Those tags that do not match the 770 'langtag' production in the ABNF in this document or that contain 771 subtags that do not individually appear in the registry are 772 maintained in the registry in records of the "grandfathered" type. 774 Grandfathered tags contain one or more subtags that are not defined 775 in the Language Subtag Registry (see Section 3). Redundant tags 776 consist entirely of subtags defined above and whose independent 777 registration was superseded by [RFC4646]. For more information see 778 Section 3.8. 780 Some grandfathered tags are "regular" in that they match the 781 'langtag' production in Figure 1. In some cases, these tags could 782 become redundant if their (current unregistered) subtags were to be 783 registered (as variants, for example). In other cases, although the 784 subtags match the language tag pattern, the meaning assigned to the 785 various subtags is prohibited by rules elsewhere in this document. 786 Those tags can never become redundant. 788 The remaining grandfathered tags are "irregular" and do not match the 789 'langtag' production. These are listed in the 'irregular' production 790 in Figure 1. These grandfathered tags can never become redundant. 791 Many of these tags have been superseded by other registrations: their 792 record contains a Preferred-Value field that really ought to be used 793 to form language tags representing that value. 795 2.2.9. Classes of Conformance 797 Implementations sometimes need to describe their capabilities with 798 regard to the rules and practices described in this document. Tags 799 can be checked or verified in a number of ways, but two particular 800 classes of tag conformance are formally defined here. 802 A tag is considered "well-formed" if it conforms to the ABNF 803 (Section 2.1). Note that irregular grandfathered tags are now listed 804 in the 'irregular' production. 806 A tag is considered "valid" if it well-formed and it also satisfies 807 these conditions: 809 o The tag is either a grandfathered tag, or all of its language, 810 extended language, script, region, and variant subtags appear in 811 the IANA language subtag registry as of the particular registry 812 date. 814 o There are no duplicate singleton (extension) subtags and no 815 duplicate variant subtags. 817 o For each subtag that has a 'Prefix' field in the registry, the 818 Prefix matches the language tag using Extended Filtering 819 [RFC4647]. That is, each subtag in the Prefix is present in the 820 tag and in the same order. Furthermore, all of the Prefix's 821 subtags MUST appear before the subtag. For example, the Prefix 822 "zh-TW" matches the tag "zh-Hant-TW". 824 Note that a tag's validity depends on the date of the registry used 825 to validate the tag. A more-recent copy of the registry might 826 contain a subtag that an older version does not. 828 A tag is considered "valid" for a given extension (Section 3.7) (as 829 of a particular version, revision, and date) if it meets the criteria 830 for "valid" above and also satisfies this condition: 832 Each subtag used in the extension part of the tag is valid 833 according to the extension. 835 3. Registry Format and Maintenance 837 This section defines the Language Subtag Registry and the maintenance 838 and update procedures associated with it, as well as a registry for 839 extensions to language tags (Section 3.7). 841 The Language Subtag Registry contains a comprehensive list of all of 842 the subtags valid in language tags. This allows implementers a 843 straightforward and reliable way to validate language tags. The 844 Language Subtag Registry will be maintained so that, except for 845 extension subtags, it is possible to validate all of the subtags that 846 appear in a language tag under the provisions of this document or its 847 revisions or successors. In addition, the meaning of the various 848 subtags will be unambiguous and stable over time. (The meaning of 849 private use subtags, of course, is not defined by the IANA registry.) 851 3.1. Format of the IANA Language Subtag Registry 853 The IANA Language Subtag Registry ("the registry") is a machine- 854 readable file in the format described in this section, plus copies of 855 the registration forms approved in accordance with the process 856 described in Section 3.5. The existing registration forms for 857 grandfathered and redundant tags taken from RFC 3066 will be 858 maintained as part of the obsolete RFC 3066 registry. The remaining 859 set of subtags created by either [RFC4645] or [registry-update] will 860 not have registration forms created for them. 862 3.1.1. File Format 864 The registry consists of a series of records stored in the record-jar 865 format (described in [record-jar]). Each record, in turn, consists 866 of a series of fields that describe the various subtags and tags. 867 The registry is a Unicode [Unicode] text file, using the UTF-8 868 [RFC3629] character encoding. 870 Each field can be considered a single, logical line of Unicode 871 [Unicode] characters, comprising a field-name and a field-body 872 separated by a COLON character (%x3A). Each field is terminated by 873 the newline sequence CRLF. The text in each field MUST be in Unicode 874 Normalization Form C (NFC). 876 A collection of fields forms a 'record'. Records are separated by 877 lines containing only the sequence "%%" (%x25.25). 879 Although fields are logically a single line of text, each line of 880 text in the file format is limited to 72 bytes in length. To 881 accommodate this, the field-body can be split into a multiple-line 882 representation; this is called "folding". Folding is always done on 883 Unicode code point boundaries (never in the middle of a multibyte 884 UTF-8 sequence) and MUST NOT occur just prior to a combining mark. 886 Although the file format uses the UTF-8 encoding, unless otherwise 887 indicated, fields are restricted to the printable characters from the 888 US-ASCII [ISO646] repertoire. 890 The format of the registry is described by the following ABNF (per 891 [RFC4234]): 893 registry = record *("%%" CRLF record) 894 record = 1*( field-name *SP ":" *SP field-body CRLF ) 895 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] 896 field-body = *([[*SP CRLF] 1*SP] 1*CHARS) 897 CHARS = (%x21-10FFFF) ; Unicode code points 899 Figure 2: Registry Format ABNF 901 The sequence '..' (%x2E.2E) in a field-body denotes a range of 902 values. Such a range represents all subtags of the same length that 903 are in alphabetic or numeric order within that range, including the 904 values explicitly mentioned. For example 'a..c' denotes the values 905 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 906 '13'. 908 All fields whose field-body contains a date value use the "full-date" 909 format specified in [RFC3339]. For example: "2004-06-28" represents 910 June 28, 2004, in the Gregorian calendar. 912 3.1.2. Record Definitions 914 There are three types of records in the registry: "File-Date", 915 "Subtag", and "Tag" records. 917 The first record in the registry is a "File-Date" record. This 918 record contains the single field whose field-name is "File-Date" (see 919 Figure 2). The field-body of this record contains the last 920 modification date of this copy of the registry, making it possible to 921 compare different versions of the registry. The registry on the IANA 922 website is the most current. Versions with an older date than that 923 one are not up-to-date. 925 File-Date: 2004-06-28 926 %% 928 Figure 3: Example of the File-Date Record 930 Subsequent records represent either subtags or tags in the registry. 932 "Subtag" records contain a field with a field-name of "Subtag", 933 while, unsurprisingly, "Tag" records contain a field with a field- 934 name of "Tag". Each of the fields in each record MUST occur no more 935 than once, unless otherwise noted below. Each record MUST contain 936 the following fields: 938 o 'Type' 940 * Type's field-body MUST consist of one of the following strings: 941 "language", "extlang", "script", "region", "variant", 942 "grandfathered", and "redundant" and denotes the type of tag or 943 subtag. 945 o Either 'Subtag' or 'Tag' 947 * Subtag's field-body contains the subtag being defined. This 948 field MUST only appear in records of whose 'Type' has one of 949 these values: "language", "extlang", "script", "region", or 950 "variant". 952 * Tag's field-body contains a complete language tag. This field 953 MUST only appear in records whose 'Type' has one of these 954 values: "grandfathered" or "redundant". Note that the field- 955 body will always follow the 'grandfathered' production in the 956 ABNF in Section 2.1 958 o Description 960 * Description's field-body contains a non-normative description 961 of the subtag or tag. 963 o Added 965 * Added's field-body contains the date the record was added to 966 the registry. 968 Each record MAY also contain the following fields: 970 o Preferred-Value 972 * For fields of type 'script', 'region', and 'variant', 973 'Preferred-Value' contains the subtag of the same 'Type' that 974 is preferred for forming the language tag. 976 * For fields of type 'language' and 'extlang', 'Preferred-Value' 977 contains the language production (see Figure 1) that is 978 preferred when forming the language tag. This can be simply a 979 'language' subtag, or it can be a 'language' subtag followed by 980 an extended language sequence. 982 * For fields of type 'grandfathered' and 'redundant', a canonical 983 mapping to a complete language tag. 985 o Deprecated 987 * Deprecated's field-body contains the date the record was 988 deprecated. 990 o Prefix 992 * Prefix's field-body contains a language tag with which this 993 subtag MAY be used to form a new language tag, perhaps with 994 other subtags as well. The Prefix's subtags appear before the 995 subtag. This field MUST only appear in records whose 'Type' 996 field-body is 'variant' or 'extlang'. For example, the 997 'Prefix' for the variant 'nedis' is 'sl', meaning that the tags 998 "sl-nedis" and "sl-IT-nedis" might be appropriate while the tag 999 "is-nedis" is not. 1001 o Comments 1003 * Comments contains additional information about the subtag, as 1004 deemed appropriate for understanding the registry and 1005 implementing language tags using the subtag or tag. 1007 o Suppress-Script 1009 * Suppress-Script contains a script subtag that SHOULD NOT be 1010 used to form language tags with the associated primary language 1011 subtag. This field MUST only appear in records whose 'Type' 1012 field-body is 'language'. See Section 4.1. 1014 o Macrolanguage 1016 * Macrolanguage contains a primary or extended language subtag 1017 defined by ISO 639 as a "macrolanguage" that encompasses this 1018 language subtag. This field MUST only appear in records whose 1019 'Type' field-body is 'language' or 'extlang'. 1021 Future versions of this document might add additional fields to the 1022 registry, so implementations SHOULD ignore fields found in the 1023 registry that are not defined in this document. 1025 3.1.3. Subtag and Tag Fields 1027 The 'Subtag' field MUST use lowercase letters to form the subtag, 1028 with two exceptions. Subtags whose 'Type' field is 'script' (in 1029 other words, subtags defined by ISO 15924) MUST use titlecase. 1030 Subtags whose 'Type' field is 'region' (in other words, the non- 1031 numeric region subtags defined by ISO 3166) MUST use uppercase. 1032 These exceptions mirror the use of case in the underlying standards. 1034 Each subtag in the tags contained in a 'Tag' field MUST be formatted 1035 using the rules in the preceeding paragraph. That is, all subtags 1036 are lowercase except for subtags that represent script or region 1037 codes. 1039 3.1.4. Description Field 1041 The field 'Description' contains a description of the tag or subtag 1042 in the record. The 'Description' field MAY appear more than once per 1043 record, that is, there can be multiple descriptions for a given 1044 record. The 'Description' field MAY include the full range of 1045 Unicode characters. At least one of the 'Description' fields MUST be 1046 written or transcribed into the Latin script; additional 1047 'Description' fields MAY also include a description in a non-Latin 1048 script. Each 'Description' field MUST be unique, both within the 1049 record in which it appears and for the collection of records of the 1050 same type. Moreover, formatting variations of the same description 1051 MUST NOT occur in that specific record or in any other record of the 1052 same type. For example, while the ISO 639-1 code 'fy' contains both 1053 the descriptions "Western Frisian" and "Frisian, Western", only one 1054 of these descriptions appears in the registry. 1056 The 'Description' field is used for identification purposes and 1057 SHOULD NOT be taken to represent the actual native name of the 1058 language or variation or to be in any particular language. 1060 For records taken from a source standard (such as ISO 639 or ISO 1061 3166), the 'Description' value(s) SHOULD also be taken from the 1062 source standard. Multiple descriptions in the source standard MUST 1063 be split into separate 'Description' fields. The source standard's 1064 descriptions MAY be edited, either prior to insertion or via the 1065 registration process. For fields of type 'language' or 'extlang', 1066 the first 'Description' field appearing in the Registry corresponds 1067 to the Reference Name assigned by ISO 639-3. This helps facilitate 1068 cross-referencing between ISO 639 and the registry. 1070 When creating or updating a record due to the action of one of the 1071 source standards, the Language Subtag Reviewer SHOULD remove 1072 duplicate or redundant descriptions and MAY edit descriptions to 1073 correct irregularities in formatting (such as misspellings, 1074 inappropriate apostrophes or other punctuation, or excessive or 1075 missing spaces) prior to submitting the proposed record to the ietf- 1076 languages list. 1078 Note: Descriptions in registry entries that correspond to ISO 639, 1079 ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate 1080 the meaning of that identifier as defined in the source standard at 1081 the time it was added to the registry. The description does not 1082 replace the content of the source standard itself. The descriptions 1083 are not intended to be the English localized names for the subtags. 1084 Localization or translation of language tag and subtag descriptions 1085 is out of scope of this document. 1087 3.1.5. Deprecated Field 1089 The field 'Deprecated' MAY be added to any record via the maintenance 1090 process described in Section 3.3 or via the registration process 1091 described in Section 3.5. Usually, the addition of a 'Deprecated' 1092 field is due to the action of one of the standards bodies, such as 1093 ISO 3166, withdrawing a code. In some historical cases, it might not 1094 have been possible to reconstruct the original deprecation date. For 1095 these cases, an approximate date appears in the registry. Although 1096 valid in language tags, subtags and tags with a 'Deprecated' field 1097 are deprecated and validating processors SHOULD NOT generate these 1098 subtags. Note that a record that contains a 'Deprecated' field and 1099 no corresponding 'Preferred-Value' field has no replacement mapping. 1101 3.1.6. Preferred-Value Field 1103 The field 'Preferred-Value' contains a mapping between the record in 1104 which it appears and another tag or subtag. The value in this field 1105 is strongly RECOMMENDED as the best choice to represent the value of 1106 this record when selecting a language tag. These values form three 1107 groups: 1109 1. ISO 639 language codes that were later withdrawn in favor of 1110 other codes. These values are mostly a historical curiosity. 1112 2. ISO 3166 region codes that have been withdrawn in favor of a new 1113 code. This sometimes happens when a country changes its name or 1114 administration in such a way that warrants a new region code. 1116 3. Grandfathered or redundant tags from RFC 3066. In many cases, 1117 these tags have become obsolete because the values they represent 1118 were later encoded by ISO 639. 1120 Records that contain a 'Preferred-Value' field MUST also have a 1121 'Deprecated' field. This field contains a date of deprecation. 1122 Thus, a language tag processor can use the registry to construct the 1123 valid, non-deprecated set of subtags for a given date. In addition, 1124 for any given tag, a processor can construct the set of valid 1125 language tags that correspond to that tag for all dates up to the 1126 date of the registry. The ability to do these mappings MAY be 1127 beneficial to applications that are matching, selecting, for 1128 filtering content based on its language tags. 1130 Note that 'Preferred-Value' mappings in records of type 'region' 1131 sometimes do not represent exactly the same meaning as the original 1132 value. There are many reasons for a country code to be changed, and 1133 the effect this has on the formation of language tags will depend on 1134 the nature of the change in question. 1136 In particular, the 'Preferred-Value' field does not imply retagging 1137 content that uses the affected subtag. 1139 The field 'Preferred-Value' MUST NOT be modified once created in the 1140 registry. The field MAY be added to records according to the rules 1141 in Section 3.3. 1143 The 'Preferred-Value' field in records of type "grandfathered" and 1144 "redundant" contains whole language tags that are strongly 1145 RECOMMENDED for use in place of the record's value. In many cases, 1146 the mappings were created by deprecation of the tags during the 1147 period before this document was adopted. For example, the tag "no- 1148 nyn" was deprecated in favor of the ISO 639-1-defined language code 1149 'nn'. 1151 3.1.7. Prefix Field 1153 The 'Prefix' field contains an extended language range whose subtags 1154 are appropriate to use with this subtag: each of the subtags in one 1155 of the subtag's Prefix fields MUST appear before the variant in a 1156 valid tag. For example, the variant subtag '1996' has a 'Prefix' 1157 field of "de". This means that tags starting with the sequence "de-" 1158 are appropriate with this subtag, so "de-Latg-1996" and "de-CH-1996" 1159 are both acceptable, while the tag "fr-1996" is an inappropriate 1160 choice. 1162 The field of type 'Prefix' MUST NOT be removed from any record. The 1163 field-body for this type of field MAY be modified, but only if the 1164 modification broadens the meaning of the subtag. That is, the field- 1165 body can be replaced only by a prefix a prefix of itself. For 1166 example, the Prefix "be-Latn" (Belarusian, Latin script) could be 1167 replaced by the Prefix "be" (Belarusian) but not by the Prefix "ru- 1168 Latn" (Russian, Latin script). 1170 Records of type 'variant' MAY have more than one field of type 1171 'Prefix'. Additional fields of this type MAY be added to a 'variant' 1172 record via the registration process. 1174 The field-body of the 'Prefix' field MUST NOT conflict with any 1175 'Prefix' already registered for a given record. Such a conflict 1176 would occur when when no valid tag could be constructed that would 1177 contain the prefix, such as when when two subtags each have a 1178 'Prefix' that contains the other subtag. For example, suppose that 1179 the subtag 'avariant' has the prefix "es-bvariant". Then the subtag 1180 'bvariant' cannot given the prefix 'avariant', for that would require 1181 a tag of the form "es-avariant-bvariant-avariant", which would not be 1182 valid. 1184 Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. 1186 3.1.8. Suppress-Script Field 1188 The field 'Suppress-Script' contains a script subtag (whose record 1189 appears in the registry). The field 'Suppress-Script' MUST only 1190 appear in records whose 'Type' field-body is 'language'. This field 1191 MUST NOT appear more than one time in a record. This field indicates 1192 a script used to write the overwhelming majority of documents for the 1193 given language. This script code therefore adds no distinguishing 1194 information to a language tag. This helps ensure greater 1195 compatibility between the language tags generated according to the 1196 rules in this document and language tags and tag processors or 1197 consumers based on RFC 3066 by indicating that the script subtag 1198 SHOULD NOT be used for most documents in that language. For example, 1199 virtually all Icelandic documents are written in the Latin script, 1200 making the subtag 'Latn' redundant in the tag "is-Latn". 1202 Many language subtag records do not have a Suppress-Script field. 1203 The lack of a Suppress-Script might indicate that the language is 1204 customarily written in more than one script or that the language is 1205 not customarily written at all. It might also mean that sufficient 1206 information was not available when the record was created and thus 1207 remains a candidate for future registration. 1209 3.1.9. Macrolanguage Field 1211 The Macrolanguage field contains a primary or extended language 1212 subtag that encompasses this subtag's language. That is, the 1213 language subtag whose record this field appears in is sometimes 1214 considered to be a sub-language of the Macrolanguage. Macrolanguage 1215 values are defined by ISO 639-3 and the exact nature of the 1216 relationship between the encompassed and encompassing languages 1217 varies on a case-by-case basis. 1219 This field can be useful to applications or users when selecting 1220 language tags or as additional metadata useful in matching. The 1221 Macrolanguage field can only occur in records of type 'language' or 1222 'extlang'. Only values assigned by ISO 639-3 will be considered for 1223 inclusion. Macrolanguage fields MAY be added or removed via the 1224 normal registration process whenever ISO 639-3 defines new values. 1225 Macrolanguages are informational, and MAY be removed or changed if 1226 ISO 639-3 changes the values. 1228 For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn' 1229 (Norwegian Nynorsk) each have a Macrolanguage entry of 'no' 1230 (Norwegian). For more information see Section 4.1. 1232 3.1.10. Comments Field 1234 The field 'Comments' conveys additional information about the record 1235 and MAY appear more than once per record. The field-body MAY include 1236 the full range of Unicode characters and is not restricted to any 1237 particular script. This field MAY be inserted or changed via the 1238 registration process and no guarantee of stability is provided. The 1239 content of this field is not restricted, except by the need to 1240 register the information, the suitability of the request, and by 1241 reasonable practical size limitations. 1243 3.2. Language Subtag Reviewer 1245 The Language Subtag Reviewer moderates the ietf-languages mailing 1246 list, responds to requests for registration, and performs the other 1247 registry maintenance duties described in Section 3.3. Only the 1248 Language Subtag Reviewer is permitted to request IANA to change, 1249 update, or add records to the Language Subtag Registry. The Language 1250 Subtag Reviewer MAY delegate list moderation and other clerical 1251 duties as needed. 1253 The Language Subtag Reviewer is appointed by the IESG for an 1254 indefinite term, subject to removal or replacement at the IESG's 1255 discretion. The IESG will solicit nominees for the position (upon 1256 adoption of this document or upon a vacancy) and then solicit 1257 feedback on the nominees' qualifications. Qualified candidates 1258 should be familiar with BCP 47 and its requirements; be willing to 1259 fairly, responsively, and judiciously administer the registration 1260 process; and be suitably informed about the issues of language 1261 identification so that they can draw upon and assess the claim and 1262 contributions of language experts and subtag requesters. 1264 The subsequent performance or decisions of the Language Subtag 1265 Reviewer MAY be appealed to the IESG under the same rules as other 1266 IETF decisions (see [RFC2026]). The IESG can reverse or overturn the 1267 decision of the Language Subtag Reviewer, provide guidance, or take 1268 other appropriate actions. 1270 3.3. Maintenance of the Registry 1272 Maintenance of the registry requires that as codes are assigned or 1273 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1274 Subtag Reviewer MUST evaluate each change and determine the 1275 appropriate course of action according to the rules in this document. 1276 Such updates follow the registration process described in 1277 Section 3.5. Usually the Language Subtag Reviewer will start the 1278 process for the new or updated record by filling in the registration 1279 form and submitting it. If a change to one of these standards takes 1280 place and the Language Subtag Reviewer does not do this in a timely 1281 manner, then any interested party MAY submit the form. Thereafter 1282 the registration process continues normally. 1284 The Language Subtag Reviewer MUST ensure that new subtags meet the 1285 requirements elsewhere in this document (and most especially in 1286 Section 3.4) or submit an appropriate registration form for an 1287 alternate subtag as described in that section. Each individual 1288 subtag affected by a change MUST be sent to the ietf-languages list 1289 with its own registration form and in a separate message. 1291 3.4. Stability of IANA Registry Entries 1293 The stability of entries and their meaning in the registry is 1294 critical to the long-term stability of language tags. The rules in 1295 this section guarantee that a specific language tag's meaning is 1296 stable over time and will not change. 1298 These rules specifically deal with how changes to codes (including 1299 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1300 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1301 Subtag Registry. Assignments to the IANA Language Subtag Registry 1302 MUST follow the following stability rules: 1304 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1305 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1306 guaranteed to be stable over time. 1308 2. Values in the 'Description' field MUST NOT be changed in a way 1309 that would invalidate previously-existing tags. They MAY be 1310 broadened somewhat in scope, changed to add information, or 1311 adapted to the most common modern usage. For example, countries 1312 occasionally change their official names; a historical example 1313 of this would be "Upper Volta" changing to "Burkina Faso". 1315 3. Values in the field 'Prefix' MAY be added to records of type 1316 'variant' via the registration process. If a prefix is added to 1317 a variant record, 'Comment' fields SHOULD be used to explain 1318 different usages with the various prefixes. 1320 4. Values in the field 'Prefix' in records of type 'variant' MAY be 1321 modified, so long as the modifications broaden the set of 1322 prefixes. That is, a prefix MAY be replaced by one of its own 1323 prefixes. For example, the prefix "en-US" could be replaced by 1324 "en", but not by the prefixes "en-Latn", "fr", or "en-US-boont". 1325 If one of those prefixes were needed, a new Prefix SHOULD be 1326 registered. 1328 5. Values in the field 'Prefix' in records of type 'extlang' MUST 1329 NOT be modified. 1331 6. Values in the field 'Prefix' MUST NOT be removed. 1333 7. The field 'Comments' MAY be added, changed, modified, or removed 1334 via the registration process or any of the processes or 1335 considerations described in this section. 1337 8. The field 'Suppress-Script' MAY be added or removed via the 1338 registration process. 1340 9. The field 'Macrolanguage' MAY be added or removed via the 1341 registration process, but only in response to changes made by 1342 ISO 639. The Macrolanguage field appears whenever a language 1343 has a corresponding Macrolanguage in ISO 639. That is, the 1344 macrolanguage fields in the registry exactly match those of ISO 1345 639. No other macrolanguage mappings will be considered for 1346 registration. 1348 10. Codes assigned by ISO 639-1 that do not conflict with existing 1349 two-letter primary language subtags and which have no 1350 corresponding three-letter primary or extended language subtags 1351 defined in the registry are entered into the IANA registry as 1352 new records of type 'language'. 1354 11. Codes assigned by ISO 639-2 that do not conflict with existing 1355 three-letter primary or extended language subtags are entered 1356 into the IANA registry as new records of type 'language'. 1358 12. Codes assigned by ISO 639-3 that do not conflict with existing 1359 three-letter primary or extended language subtags are entered 1360 into the IANA registry as new records. 1362 1. Codes that have a defined "macrolanguage" mapping at the 1363 time of their registration MUST be entered into the registry 1364 as records of type 'extlang' with a 'Prefix' field 1365 containing the appropriate prefix tag. They MUST also 1366 include a "Macrolanguage" field in their record. 1368 2. Codes that represent sign languages MUST be entered into the 1369 registry as record of type 'extlang' with a 'Prefix' field 1370 that matches the Basic Language Range "sgn" (see Section 1371 3.3.1 "Basic Filtering" in [RFC4647]). 1373 3. All other codes MUST be entered into the registry as records 1374 of type 'language'. 1376 13. A record of type 'language' or 'extlang' MUST NOT be registered 1377 if there exists a record of either type with the same subtag 1378 value. For example, if an 'extlang' subtag 'foo' exists in the 1379 registry, all attempts to register a 'language' subtag 'foo' 1380 will be rejected. 1382 14. Codes assigned by ISO 15924 and ISO 3166 that do not conflict 1383 with existing subtags of the associated type and whose meaning 1384 is not the same as an existing subtag of the same type are 1385 entered into the IANA registry as new records. 1387 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1388 withdrawn by their respective maintenance or registration 1389 authority remain valid in language tags. A 'Deprecated' field 1390 containing the date of withdrawal MUST be added to the record. 1391 If a new record of the same type is added that represents a 1392 replacement value, then a 'Preferred-Value' field MAY also be 1393 added. The registration process MAY be used to add comments 1394 about the withdrawal of the code by the respective standard. 1396 Example The region code 'TL' was assigned to the country 1397 'Timor-Leste', replacing the code 'TP' (which was assigned to 1398 'East Timor' when it was under administration by Portugal). 1399 The subtag 'TP' remains valid in language tags, but its 1400 record contains the a 'Preferred-Value' of 'TL' and its field 1401 'Deprecated' contains the date the new code was assigned 1402 ('2004-07-06'). 1404 16. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1405 with existing subtags of the associated type, including subtags 1406 that are deprecated, MUST NOT be entered into the registry. The 1407 following additional considerations apply to subtag values that 1408 are reassigned: 1410 A. For ISO 639 codes, if the newly assigned code's meaning is 1411 not represented by a subtag in the IANA registry, the 1412 Language Subtag Reviewer, as described in Section 3.5, SHALL 1413 prepare a proposal for entering in the IANA registry as soon 1414 as practical a registered language subtag as an alternate 1415 value for the new code. The form of the registered language 1416 subtag will be at the discretion of the Language Subtag 1417 Reviewer and MUST conform to other restrictions on language 1418 subtags in this document. 1420 B. For all subtags whose meaning is derived from an external 1421 standard (that is, by ISO 639, ISO 15924, ISO 3166, or UN 1422 M.49), if a new meaning is assigned to an existing code and 1423 the new meaning broadens the meaning of that code, then the 1424 meaning for the associated subtag MAY be changed to match. 1425 The meaning of a subtag MUST NOT be narrowed, however, as 1426 this can result in an unknown proportion of the existing 1427 uses of a subtag becoming invalid. Note: ISO 639 1428 maintenance agency/registration authority (MA/RA) has 1429 adopted a similar stability policy. 1431 C. For ISO 15924 codes, if the newly assigned code's meaning is 1432 not represented by a subtag in the IANA registry, the 1433 Language Subtag Reviewer, as described in Section 3.5, SHALL 1434 prepare a proposal for entering in the IANA registry as soon 1435 as practical a registered variant subtag as an alternate 1436 value for the new code. The form of the registered variant 1437 subtag will be at the discretion of the Language Subtag 1438 Reviewer and MUST conform to other restrictions on variant 1439 subtags in this document. 1441 D. For ISO 3166 codes, if the newly assigned code's meaning is 1442 associated with the same UN M.49 code as another 'region' 1443 subtag, then the existing region subtag remains as the 1444 preferred value for that region and no new entry is created. 1445 A comment MAY be added to the existing region subtag 1446 indicating the relationship to the new ISO 3166 code. 1448 E. For ISO 3166 codes, if the newly assigned code's meaning is 1449 associated with a UN M.49 code that is not represented by an 1450 existing region subtag, then the Language Subtag Reviewer, 1451 as described in Section 3.5, SHALL prepare a proposal for 1452 entering the appropriate UN M.49 country code as an entry in 1453 the IANA registry. 1455 F. For ISO 3166 codes, if there is no associated UN numeric 1456 code, then the Language Subtag Reviewer SHALL petition the 1457 UN to create one. If there is no response from the UN 1458 within ninety days of the request being sent, the Language 1459 Subtag Reviewer SHALL prepare a proposal for entering in the 1460 IANA registry as soon as practical a registered variant 1461 subtag as an alternate value for the new code. The form of 1462 the registered variant subtag will be at the discretion of 1463 the Language Subtag Reviewer and MUST conform to other 1464 restrictions on variant subtags in this document. This 1465 situation is very unlikely to ever occur. 1467 17. UN M.49 has codes for both countries and areas (such as '276' 1468 for Germany) and geographical regions and sub-regions (such as 1469 '150' for Europe). UN M.49 country or area codes for which 1470 there is no corresponding ISO 3166 code SHOULD NOT be 1471 registered, except as a surrogate for an ISO 3166 code that is 1472 blocked from registration by an existing subtag. If such a code 1473 becomes necessary, then the registration authority for ISO 3166 1474 SHOULD first be petitioned to assign a code to the region. If 1475 the petition for a code assignment by ISO 3166 is refused or not 1476 acted on in a timely manner, the registration process described 1477 in Section 3.5 MAY then be used to register the corresponding UN 1478 M.49 code. This way, UN M.49 codes remain available as the 1479 value of last resort in cases where ISO 3166 reassigns a 1480 deprecated value in the registry. 1482 18. Stability provisions apply to grandfathered tags with this 1483 exception: should it be possible to compose one of the 1484 grandfathered tags from registered subtags, then the field 1485 'Type' in that record is changed from 'grandfathered' to 1486 'redundant'. Note that this will not affect language tags that 1487 match the grandfathered tag, since these tags will now match 1488 valid generative subtag sequences. For example, this document 1489 caused the ISO 639-3 code 'gan', used in the redundant tag "zh- 1490 gan", to be registered as an extended language subtag. The 1491 formerly-grandfathered tag "zh-gan" became a redundant tag as a 1492 result (but existing content or implementations that use "zh- 1493 gan" remain valid). 1495 Note: The redundant and grandfathered entries together are the 1496 complete list of tags registered under [RFC3066]. The redundant tags 1497 are those that can now be formed using the subtags defined in the 1498 registry together with the rules of Section 2.2. The grandfathered 1499 entries include those that can never be legal under those same 1500 provisions plus those tags that contain subtags not yet registered 1501 or, perhaps, inappropriate for registration. 1503 The set of redundant and grandfathered tags is permanent and stable: 1504 new entries in this section MUST NOT be added and existing entries 1505 MUST NOT be removed. Records of type 'grandfathered' MAY have their 1506 type converted to 'redundant'; see item 12 in Section 3.6 for more 1507 information. The decision-making process about which tags were 1508 initially grandfathered and which were made redundant is described in 1509 [RFC4645]. 1511 RFC 3066 tags that were deprecated prior to the adoption of [RFC4646] 1512 are part of the list of grandfathered tags, and their component 1513 subtags were not included as registered variants (although they 1514 remain eligible for registration). For example, the tag "art-lojban" 1515 was deprecated in favor of the language subtag 'jbo'. 1517 3.5. Registration Procedure for Subtags 1519 The procedure given here MUST be used by anyone who wants to use a 1520 subtag not currently in the IANA Language Subtag Registry. 1522 Only subtags of type 'language' and 'variant' will be considered for 1523 independent registration of new subtags. Subtags needed for 1524 stability and subtags necessary to keep the registry synchronized 1525 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1526 defined by this document also use this process, as described in 1527 Section 3.3. Stability provisions are described in Section 3.4. 1529 This procedure MAY also be used to register or alter the information 1530 for the 'Description', 'Comments', 'Deprecated', 'Prefix', or 1531 'Suppress-Script' fields in a subtag's record as described in 1532 Section 3.4. Changes to all other fields in the IANA registry are 1533 NOT permitted. 1535 Registering a new subtag or requesting modifications to an existing 1536 tag or subtag starts with the requester filling out the registration 1537 form reproduced below. Note that each response is not limited in 1538 size so that the request can adequately describe the registration. 1539 The fields in the "Record Requested" section SHOULD follow the 1540 requirements in Section 3.1. 1542 LANGUAGE SUBTAG REGISTRATION FORM 1543 1. Name of requester: 1544 2. E-mail address of requester: 1545 3. Record Requested: 1547 Type: 1548 Subtag: 1549 Description: 1550 Prefix: 1551 Preferred-Value: 1552 Deprecated: 1553 Suppress-Script: 1554 Macrolanguage: 1555 Comments: 1557 4. Intended meaning of the subtag: 1558 5. Reference to published description 1559 of the language (book or article): 1560 6. Any other relevant information: 1562 Figure 4: The Language Subtag Registration Form 1564 Examples of completed registration forms can be found in Appendix C 1565 or online at http://www.iana.org/assignments/lang-subtags-templates/. 1567 The subtag registration form MUST be sent to 1568 for a two-week review period before it can 1569 be submitted to IANA. If modifications are made to the request 1570 during the course of the registration process (such as corrections to 1571 meet the requirements in Section 3.1) the modified form MUST also be 1572 sent to at least one week prior to 1573 submission to IANA. 1575 Whenever an entry is created or modified in the registry, the 'File- 1576 Date' record at the start of the registry is updated to reflect the 1577 most recent modification date in the [RFC3339] "full-date" format. 1579 Before forwarding a new registration to IANA, the Language Subtag 1580 Reviewer MUST ensure that values in the 'Subtag' field match case 1581 according to the description in Section 3.1. 1583 The ietf-languages list is an open list and can be joined by sending 1584 a request to . The list can be 1585 hosted by IANA or by any third party at the request of IESG. 1587 Some fields in both the registration form as well as the registry 1588 record itself permit the use of non-ASCII characters. Registration 1589 requests SHOULD use the UTF-8 encoding for consistency and clarity. 1591 However, since some mail clients do not support this encoding, other 1592 encodings MAY be used for the registration request. The Language 1593 Subtag Reviewer is responsible for ensuring that the proper Unicode 1594 characters appear in both the archived request form and the registry 1595 record. In the case of a transcription or encoding error by IANA, 1596 the Language Subtag Reviewer will request that the registry be 1597 repaired, providing any necessary information to assist IANA. 1599 Variant subtags are usually registered for use with a particular 1600 range of language tags. For example, the subtag 'rozaj' is intended 1601 for use with language tags that start with the primary language 1602 subtag "sl", since Resian is a dialect of Slovenian. Thus, the 1603 subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" 1604 or "sl-IT-rozaj". This information is stored in the 'Prefix' field 1605 in the registry. Variant registration requests SHOULD include at 1606 least one 'Prefix' field in the registration form. 1608 Extended language subtags MUST include exactly one 'Prefix' field. 1610 The 'Prefix' field for a given registered subtag exists in the IANA 1611 registry as a guide to usage. Additional prefixes MAY be added by 1612 filing an additional registration form. In that form, the "Any other 1613 relevant information:" field MUST indicate that it is the addition of 1614 a prefix. 1616 Requests to add a prefix to a variant subtag that imply a different 1617 semantic meaning will probably be rejected. For example, a request 1618 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1619 nedis" represented some German dialect would be rejected. The 1620 'nedis' subtag represents a particular Slovenian dialect and the 1621 additional registration would change the semantic meaning assigned to 1622 the subtag. A separate subtag SHOULD be proposed instead. 1624 The 'Description' field MUST contain a description of the tag being 1625 registered written or transcribed into the Latin script; it MAY also 1626 include a description in a non-Latin script. The 'Description' field 1627 is used for identification purposes and doesn't necessarily represent 1628 the actual native name of the language or variation or to be in any 1629 particular language. 1631 While the 'Description' field itself is not guaranteed to be stable 1632 and errata corrections MAY be undertaken from time to time, attempts 1633 to provide translations or transcriptions of entries in the registry 1634 itself will probably be frowned upon by the community or rejected 1635 outright, as changes of this nature have an impact on the provisions 1636 in Section 3.4. 1638 When the two-week period has passed, the Language Subtag Reviewer 1639 MUST take one of the following actions: 1641 o Explicitly accept the request and forward the form containing the 1642 record to be inserted or modified to iana@iana.org according to 1643 the procedure described in Section 3.3. 1645 o Explicitly reject the request because of significant objections 1646 raised on the list or due to problems with constraints in this 1647 document (which MUST be explicitly cited). 1649 o Extend the review period by granting an additional two-week 1650 increment to permit further discussion. After each two-week 1651 increment, the Language Subtag Reviewer MUST indicate on the list 1652 whether the registration has been accepted, rejected, or extended. 1654 Note that the Language Subtag Reviewer MAY raise objections on the 1655 list if he or she so desires. The important thing is that the 1656 objection MUST be made publicly. 1658 Sometimes the request needs to be modified as a result of discussion 1659 during the review period or due to requirements in this document. 1660 The applicant, Language Subtag Reviewer, or others are free to submit 1661 a modified version of the completed registration form, which will be 1662 considered in lieu of the original request with the explicit approval 1663 of the applicant. Such changes do not restart the two-week 1664 discussion period, although an application containing the final 1665 record submitted to IANA MUST appear on the list at least one week 1666 prior to the Language Subtag Reviewer forwarding the record to IANA. 1667 The applicant is also free to modify a rejected application with 1668 additional information and submit it again; this starts a new two- 1669 week comment period. 1671 Registrations initiated due to the provisions of Section 3.3 or 1672 Section 3.4 SHALL NOT be rejected altogether (since they have to 1673 ultimately appear in the registry) and SHOULD be completed as quickly 1674 as possible. The review process allows list members to comment on 1675 the specific information in the form and the record it contains and 1676 thus help ensure that it is correct and consistent. The Language 1677 Subtag Reviewer MAY reject a specific version of the form, but MUST 1678 include in the rejection a suitable replacement, extending the review 1679 period as described above, until the form is in a format worthy of 1680 reviewer's approval. 1682 Decisions made by the Language Subtag Reviewer MAY be appealed to the 1683 IESG [RFC2028] under the same rules as other IETF decisions 1684 [RFC2026]. This includes a decision to extend the review period or 1685 the failure to announce a decision in a clear and timely manner. 1687 The approved records appear in the Language Subtag Registry. The 1688 approved registration forms are available online under 1689 http://www.iana.org/assignments/lang-subtags-templates/. 1691 Updates or changes to existing records follow the same procedure as 1692 new registrations. The Language Subtag Reviewer decides whether 1693 there is consensus to update the registration following the two week 1694 review period; normally, objections by the original registrant will 1695 carry extra weight in forming such a consensus. 1697 Registrations are permanent and stable. Once registered, subtags 1698 will not be removed from the registry and will remain a valid way in 1699 which to specify a specific language or variant. 1701 Note: The purpose of the "Reference to published description" section 1702 in the registration form is to aid in verifying whether a language is 1703 registered or what language or language variation a particular subtag 1704 refers to. In most cases, reference to an authoritative grammar or 1705 dictionary of that language will be useful; in cases where no such 1706 work exists, other well-known works describing that language or in 1707 that language MAY be appropriate. The Language Subtag Reviewer 1708 decides what constitutes "good enough" reference material. This 1709 requirement is not intended to exclude particular languages or 1710 dialects due to the size of the speaker population or lack of a 1711 standardized orthography. Minority languages will be considered 1712 equally on their own merits. 1714 3.6. Possibilities for Registration 1716 Possibilities for registration of subtags or information about 1717 subtags include: 1719 o Primary language subtags for languages not listed in ISO 639 that 1720 are not variants of any listed or registered language MAY be 1721 registered. At the time this document was created, there were no 1722 examples of this form of subtag. Before attempting to register a 1723 language subtag, there MUST be an attempt to register the language 1724 with ISO 639. Subtags MUST NOT be registered for languages 1725 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3, 1726 or that are under consideration by the ISO 639 registration 1727 authorities, or that have never been attempted for registration 1728 with those authorities. If ISO 639 has previously rejected a 1729 language for registration, it is reasonable to assume that there 1730 must be additional, very compelling evidence of need before it 1731 will be registered as a primary language subtag in the IANA 1732 registry (to the extent that it is very unlikely that any subtags 1733 will be registered of this type). 1735 o Dialect or other divisions or variations within a language, its 1736 orthography, writing system, regional or historical usage, 1737 transliteration or other transformation, or distinguishing 1738 variation MAY be registered as variant subtags. An example is the 1739 'rozaj' subtag (the Resian dialect of Slovenian). 1741 o The addition or maintenance of fields (generally of an 1742 informational nature) in Tag or Subtag records as described in 1743 Section 3.1 and subject to the stability provisions in 1744 Section 3.4. This includes descriptions, comments, deprecation 1745 and preferred values for obsolete or withdrawn codes, or the 1746 addition of script or extlang information to primary language 1747 subtags. 1749 o The addition of records and related field value changes necessary 1750 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1751 UN M.49 as described in Section 3.4. 1753 Subtags proposed for registration that would cause all or part of a 1754 grandfathered tag to become redundant but whose meaning conflicts 1755 with or alters the meaning of the grandfathered tag MUST be rejected. 1757 This document leaves the decision on what subtags or changes to 1758 subtags are appropriate (or not) to the registration process 1759 described in Section 3.5. 1761 Note: four-character primary language subtags are reserved to allow 1762 for the possibility of alpha4 codes in some future addition to the 1763 ISO 639 family of standards. 1765 ISO 639 defines a maintenance agency for additions to and changes in 1766 the list of languages in ISO 639. This agency is: 1768 International Information Centre for Terminology (Infoterm) 1769 Aichholzgasse 6/12, AT-1120 1770 Wien, Austria 1771 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1773 ISO 639-2 defines a maintenance agency for additions to and changes 1774 in the list of languages in ISO 639-2. This agency is: 1776 Library of Congress 1777 Network Development and MARC Standards Office 1778 Washington, D.C. 20540 USA 1779 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1780 URL: http://www.loc.gov/standards/iso639-2 1782 ISO 639-3 defines a maintenance agency for additions to and changes 1783 in the list of languages in ISO 639-3. This agency is: 1785 SIL International 1786 ISO 639-3 Registrar 1787 7500 W. Camp Wisdom Rd. 1788 Dallas, TX 75236 USA 1789 Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546 1790 Email: iso639-3@sil.org 1791 URL: http://www.sil.org/iso639-3 1793 The maintenance agency for ISO 3166 (country codes) is: 1795 ISO 3166 Maintenance Agency 1796 c/o International Organization for Standardization 1797 Case postale 56 1798 CH-1211 Geneva 20 Switzerland 1799 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1800 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1802 The registration authority for ISO 15924 (script codes) is: 1804 Unicode Consortium Box 391476 1805 Mountain View, CA 94039-1476, USA 1806 URL: http://www.unicode.org/iso15924 1808 The Statistics Division of the United Nations Secretariat maintains 1809 the Standard Country or Area Codes for Statistical Use and can be 1810 reached at: 1812 Statistical Services Branch 1813 Statistics Division 1814 United Nations, Room DC2-1620 1815 New York, NY 10017, USA 1817 Fax: +1-212-963-0623 1818 E-mail: statistics@un.org 1819 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1821 3.7. Extensions and Extensions Registry 1823 Extension subtags are those introduced by single-character subtags 1824 ("singletons") other than 'x'. They are reserved for the generation 1825 of identifiers that contain a language component and are compatible 1826 with applications that understand language tags. 1828 The structure and form of extensions are defined by this document so 1829 that implementations can be created that are forward compatible with 1830 applications that might be created using singletons in the future. 1832 In addition, defining a mechanism for maintaining singletons will 1833 lend stability to this document by reducing the likely need for 1834 future revisions or updates. 1836 Single-character subtags are assigned by IANA using the "IETF 1837 Consensus" policy defined by [RFC2434]. This policy requires the 1838 development of an RFC, which SHALL define the name, purpose, 1839 processes, and procedures for maintaining the subtags. The 1840 maintaining or registering authority, including name, contact email, 1841 discussion list email, and URL location of the registry, MUST be 1842 indicated clearly in the RFC. The RFC MUST specify or include each 1843 of the following: 1845 o The specification MUST reference the specific version or revision 1846 of this document that governs its creation and MUST reference this 1847 section of this document. 1849 o The specification and all subtags defined by the specification 1850 MUST follow the ABNF and other rules for the formation of tags and 1851 subtags as defined in this document. In particular, it MUST 1852 specify that case is not significant and that subtags MUST NOT 1853 exceed eight characters in length. 1855 o The specification MUST specify a canonical representation. 1857 o The specification of valid subtags MUST be available over the 1858 Internet and at no cost. 1860 o The specification MUST be in the public domain or available via a 1861 royalty-free license acceptable to the IETF and specified in the 1862 RFC. 1864 o The specification MUST be versioned, and each version of the 1865 specification MUST be numbered, dated, and stable. 1867 o The specification MUST be stable. That is, extension subtags, 1868 once defined by a specification, MUST NOT be retracted or change 1869 in meaning in any substantial way. 1871 o The specification MUST include in a separate section the 1872 registration form reproduced in this section (below) to be used in 1873 registering the extension upon publication as an RFC. 1875 o IANA MUST be informed of changes to the contact information and 1876 URL for the specification. 1878 IANA will maintain a registry of allocated single-character 1879 (singleton) subtags. This registry MUST use the record-jar format 1880 described by the ABNF in Section 3.1. Upon publication of an 1881 extension as an RFC, the maintaining authority defined in the RFC 1882 MUST forward this registration form to iesg@ietf.org, who MUST 1883 forward the request to iana@iana.org. The maintaining authority of 1884 the extension MUST maintain the accuracy of the record by sending an 1885 updated full copy of the record to iana@iana.org with the subject 1886 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only 1887 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY 1888 be modified in these updates. 1890 Failure to maintain this record, maintain the corresponding registry, 1891 or meet other conditions imposed by this section of this document MAY 1892 be appealed to the IESG [RFC2028] under the same rules as other IETF 1893 decisions (see [RFC2026]) and MAY result in the authority to maintain 1894 the extension being withdrawn or reassigned by the IESG. 1895 %% 1896 Identifier: 1897 Description: 1898 Comments: 1899 Added: 1900 RFC: 1901 Authority: 1902 Contact_Email: 1903 Mailing_List: 1904 URL: 1905 %% 1907 Figure 5: Format of Records in the Language Tag Extensions Registry 1909 'Identifier' contains the single-character subtag (singleton) 1910 assigned to the extension. The Internet-Draft submitted to define 1911 the extension SHOULD specify which letter or digit to use, although 1912 the IESG MAY change the assignment when approving the RFC. 1914 'Description' contains the name and description of the extension. 1916 'Comments' is an OPTIONAL field and MAY contain a broader description 1917 of the extension. 1919 'Added' contains the date the RFC was published in the "full-date" 1920 format specified in [RFC3339]. For example: 2004-06-28 represents 1921 June 28, 2004, in the Gregorian calendar. 1923 'RFC' contains the RFC number assigned to the extension. 1925 'Authority' contains the name of the maintaining authority for the 1926 extension. 1928 'Contact_Email' contains the email address used to contact the 1929 maintaining authority. 1931 'Mailing_List' contains the URL or subscription email address of the 1932 mailing list used by the maintaining authority. 1934 'URL' contains the URL of the registry for this extension. 1936 The determination of whether an Internet-Draft meets the above 1937 conditions and the decision to grant or withhold such authority rests 1938 solely with the IESG and is subject to the normal review and appeals 1939 process associated with the RFC process. 1941 Extension authors are strongly cautioned that many (including most 1942 well-formed) processors will be unaware of any special relationships 1943 or meaning inherent in the order of extension subtags. Extension 1944 authors SHOULD avoid subtag relationships or canonicalization 1945 mechanisms that interfere with matching or with length restrictions 1946 that sometimes exist in common protocols where the extension is used. 1947 In particular, applications MAY truncate the subtags in doing 1948 matching or in fitting into limited lengths, so it is RECOMMENDED 1949 that the most significant information be in the most significant 1950 (left-most) subtags and that the specification gracefully handle 1951 truncated subtags. 1953 When a language tag is to be used in a specific, known, protocol, it 1954 is RECOMMENDED that that the language tag not contain extensions not 1955 supported by that protocol. In addition, note that some protocols 1956 MAY impose upper limits on the length of the strings used to store or 1957 transport the language tag. 1959 3.8. Update of the Language Subtag Registry 1961 Upon adoption of this document the IANA Language Subtag Registry will 1962 need an update so that it contains the complete set of subtags valid 1963 in a language tag. This collection of subtags, along with a 1964 description of the process used to create it, is described by 1965 [registry-update]. IANA will publish the updated version of the 1966 registry described by this document using the instructions and 1967 content of [registry-update]. Once published by IANA, the 1968 maintenance procedures, rules, and registration processes described 1969 in this document will be available for new registrations or updates. 1971 Registrations that are in process under the rules defined in 1972 [RFC4646] when this document is adopted MUST be completed under the 1973 rules contained in this document. 1975 4. Formation and Processing of Language Tags 1977 This section addresses how to use the information in the registry 1978 with the tag syntax to choose, form, and process language tags. 1980 4.1. Choice of Language Tag 1982 The guiding principle in forming language tags is to "tag content 1983 wisely." Sometimes there is a choice between several possible tags 1984 for the same content. The choice of which tag to use depends on the 1985 content and application in question and some amount of judgment might 1986 be necessary when selecting a tag. 1988 Interoperability is best served when the same language tag is used 1989 consistently to represent the same language. If an application has 1990 requirements that make the rules here inapplicable, then that 1991 application risks damaging interoperability. It is strongly 1992 RECOMMENDED that users not define their own rules for language tag 1993 choice. 1995 A subtag SHOULD only be used when it adds useful distinguishing 1996 information to the tag. Extraneous subtags interfere with the 1997 meaning, understanding, and processing of language tags. In 1998 particular, users and implementations SHOULD follow the 'Prefix' and 1999 'Suppress-Script' fields in the registry (defined in Section 3.1): 2000 these fields provide guidance on when specific additional subtags 2001 SHOULD be used or avoided in a language tag. 2003 Some applications can benefit from the use of script subtags in 2004 language tags, as long as the use is consistent for a given context. 2005 Script subtags are never appropriate for unwritten content (such as 2006 audio recordings). 2008 Script subtags were not formally defined in [RFC3066] and their use 2009 can affect matching and subtag identification for implementations of 2010 RFC 3066, as these subtags appear between the primary language and 2011 region subtags. For example, if an implementation selects content 2012 using Basic Filtering [RFC4647] (originally described in Section 2.5 2013 of [RFC3066]) and the user requested the language range "en-US", 2014 content labeled "en-Latn-US" will not match the request and thus not 2015 be selected. Therefore, it is important to know when script subtags 2016 will customarily be used and when they ought not be used. In the 2017 registry, the Suppress-Script field helps ensure greater 2018 compatibility between the language tags by defining when users SHOULD 2019 NOT include a script subtag with a particular primary language 2020 subtag. 2022 Extended language subtags (type 'extlang' in the registry; see 2023 Section 3.1) also appear between the primary language and subsequent 2024 (script, region, or variant) subtags. In most cases, use the 2025 Macrolangauge (indicated by the Prefix) by itself to form the 2026 language tag in preference to including the extended language subtag. 2027 Only use the extended language subtag if it adds useful 2028 distinguishing information to the tag within your application. 2030 The choice of subtags used to form a language tag SHOULD be guided by 2031 the following rules: 2033 1. Use as precise a tag as possible, but no more specific than is 2034 justified. Avoid using subtags that are not important for 2035 distinguishing content in an application. 2037 * For example, 'de' might suffice for tagging an email written 2038 in German, while "de-CH-1996" is probably unnecessarily 2039 precise for such a task. 2041 2. The script subtag SHOULD NOT be used to form language tags unless 2042 the script adds some distinguishing information to the tag. The 2043 field 'Suppress-Script' in the primary language record in the 2044 registry indicates script subtags that do not add distinguishing 2045 information for most applications. For example: 2047 * The subtag 'Latn' should not be used with the primary language 2048 'en' because nearly all English documents are written in the 2049 Latin script and it adds no distinguishing information. 2050 However, if a document were written in English mixing Latin 2051 script with another script such as Braille ('Brai'), then it 2052 might be appropriate to choose to indicate both scripts to aid 2053 in content selection, such as the application of a style 2054 sheet. 2056 * When labeling content that is unwritten (such as a recording 2057 of human speech), the script subtag should not be used, even 2058 if the language is customarily written in several scripts. 2059 Thus the subtitles to a movie might use the tag "zh-cmn-Hant" 2060 (Chinese, Mandarin, Traditional script), but the audio track 2061 for the same language would be tagged "zh-cmn". 2063 3. If a tag or subtag has a 'Preferred-Value' field in its registry 2064 entry, then the value of that field SHOULD be used to form the 2065 language tag in preference to the tag or subtag in which the 2066 preferred value appears. 2068 * For example, use 'he' for Hebrew in preference to 'iw'. 2070 4. [ISO639-2] has defined several codes included in the subtag 2071 registry that require additional care when choosing language 2072 tags. In most of these cases, where omitting the language tag is 2073 permitted, such omission is preferable to using these codes. 2074 Language tags SHOULD NOT incorporate these subtags as a prefix, 2075 unless the additional information conveys some value to the 2076 application. 2078 1. Use specific language subtags or subtag sequences in 2079 preference to subtags for language collections. A "language 2080 collection" is a subtag derived from one of the [ISO639-2] 2081 codes that represents multiple related languages. These 2082 codes are included as primary language subtags in the 2083 registry. For example, the code 'cmc' represents "Chamic 2084 languages". The registry contains values for each of the 2085 approximately ten individual languages represented by this 2086 collective code. Some other examples include the subtags 2087 Germanic languages ('gem') or Algonquian languages ('alg'). 2088 Since these codes are interpreted inclusively, content tagged 2089 with "en" (English), "de" (German), or "gsw" (Swiss German, 2090 Alemannic) could also (but SHOULD NOT) be tagged with "gem" 2091 (Germanic languages). Subtags derived from collection codes 2092 SHOULD NOT be used be used unless more specific language 2093 information is not available. Note that matching 2094 implementations generally do not understand the relationship 2095 between the collection and its encompassed languages, and so 2096 users ought not assume a subtag based on a language 2097 collection is a useful means for selecting content in its 2098 encompassed languages. 2100 2. The 'mul' (Multiple) primary language subtag identifies 2101 content in multiple languages. It SHOULD NOT be used when a 2102 list of languages (such as Content-Language) or individual 2103 tags for each content element can be used instead. 2105 3. The 'und' (Undetermined) primary language subtag identifies 2106 linguistic content whose language is not known. It SHOULD 2107 NOT be used unless a language tag is required and language 2108 information is not available or cannot be determined. 2109 Omitting the language tag (where permitted) is preferred. 2110 The 'und' subtag MAY be useful for protocols that require a 2111 language tag to be provided or where a primary language 2112 subtag is required (such as in "und-Latn"). The 'und' subtag 2113 MAY also be useful when matching language tags in certain 2114 situations. 2116 4. The 'zxx' (Non-Linguistic) primary language subtag identifies 2117 content that has no language. Some examples might include 2118 instrumental or electronic music; sound recordings consisting 2119 of nonverbal sounds; audiovisual materials with no narration, 2120 printed titles, or subtitles; machine-readable data files 2121 consisting of machine languages or character codes; or 2122 programming source code. Note: where there are fragments of 2123 linguistic content, such as programming source code 2124 containing comments written in English, the subtag 'zxx' 2125 might still be used to indicate the primary status of the 2126 content, just as 'en' can be applied to a predominantly 2127 English text that contains a few French phrases. 2129 5. The 'mis' (Uncoded) primary language subtag identifies 2130 content whose language is known but which does not currently 2131 have a corresponding subtag. This subtag SHOULD NOT be used. 2132 Because the addition of other codes in the future can render 2133 its application invalid, it is inherently unstable and hence 2134 incompatible with the stability goals of BCP 47. It is 2135 always preferable to use other subtags: either 'und' or (with 2136 prior agreement) private use subtags. 2138 6. The grandfathered tag "i-default" (Default Language) was 2139 originally registered according to [RFC1766] to meet the 2140 needs of [RFC2277]. It is used to indicate not a specific 2141 language, but rather, it identifies the condition or content 2142 used where the language preferences of the user cannot be 2143 established. It SHOULD NOT be used except as a means of 2144 labeling the default content for applications or protocols 2145 that require default language content to be labeled with that 2146 specific tag. It MAY also be used by an application or 2147 protocol to identify when the default language content is 2148 being returned. 2150 5. The same variant subtag MUST NOT be used more than once within a 2151 language tag. 2153 * For example, the tag "de-DE-1901-1901" is not valid. 2155 Languages with a Macrolanguage field in the registry sometimes can be 2156 usefully referenced using their Macrolanguage. However, the 2157 Macrolanguage field doesn't define what the relationship is between 2158 the language subtag whose record it appears in and its encompassed 2159 language or languages. Nor does it define how the encompassed 2160 languages are related to one-another. In some cases, the 2161 Macrolanguage has a standard form as well as a variety of less-common 2162 dialects. In other cases there is no particular standard form and 2163 the encompassed subtags describe specific variations within the 2164 parent language. 2166 Applications MAY use Macrolanguage information to improve matching or 2167 language negotiation. For example, the information that 'sr' 2168 (Serbian) and 'hr' (Croatian) share a Macrolanguage expresses a 2169 closer relation between those languages than between, say, 'sr' 2170 (Serbian) and 'ma' (Macedonian). It is valid to use the encompassed 2171 language or just its Macrolanguage to form language tags. However, 2172 many matching applications will not be aware of the relationship 2173 between the languages. Care in selecting which subtags are used is 2174 crucial to interoperability. In general, use the most specific tag. 2175 However, where the standard form of an encompassed language is 2176 captured by the Macrolanguage, the Macrolanguage SHOULD be used in 2177 preference to one of its sublanguages unless there is a specific 2178 reason not to. 2180 In particular, the Chinese family of languages call for special 2181 consideration. Because the written form is very similar for most 2182 languages having 'zh' as a Macrolanguage (and because historically 2183 subtags for the various sub-languages and dialects were not 2184 available), languages such as 'yue' (Cantonese) have usually used 2185 tags beginning with the subtag 'zh'. This means that Macrolanguage 2186 information is can be usefully applied when searching for content or 2187 when providing fallbacks in language negotiation. For example, the 2188 information that 'yue' has a macrolangauge of 'zh' could be used in 2189 the Lookup algorithm to fallback from a request for "yue-Hans-CN" to 2190 "zh-Hans-CN" without losing the script and region information (even 2191 though the user did not specify "zh-Hans-CN" in their request). 2193 To ensure consistent backward compatibility, this document contains 2194 several provisions to account for potential instability in the 2195 standards used to define the subtags that make up language tags. 2196 These provisions mean that no language tag created under the rules in 2197 this document will become invalid. 2199 Standards, protocols, and applications that reference this document 2200 normatively but apply different rules to the ones given in this 2201 section MUST specify how language tag selection varies from the 2202 guidelines given here. 2204 4.2. Meaning of the Language Tag 2206 The meaning of a language tag is related to the meaning of the 2207 subtags that it contains. Each subtag, in turn, implies a certain 2208 range of expectations one might have for related content, although it 2209 is not a guarantee. For example, the use of a script subtag such as 2210 'Arab' (Arabic script) does not mean that the content contains only 2211 Arabic characters. It does mean that the language involved is 2212 predominently in the Arabic script. Thus a language tag and its 2213 subtags can encompass a very wide range of variation and yet remain 2214 valid in each particular instance. 2216 Validity of a tag is not everything. While every valid tag has a 2217 meaning, it might not represent any real-world language usage. This 2218 is unavoidable in a system in which subtags can be combined freely. 2219 For example, tags such as "ar-Cyrl-CO" (Arabic, Cyrillic script, as 2220 used in Colombia ) or "tlh-Kore-AQ-fonipa" (Klingon, Korean script, 2221 as used in Antarctica, IPA phonetic transcription) are both valid and 2222 unlikely to represent a useful combination of language attributes. 2224 The relationship between the tag and the information it identifies is 2225 defined by the context in which the tag appears. Accordingly, this 2226 section gives only possible examples of its usage. 2228 o For a single information object, the associated language tags 2229 might be interpreted as the set of languages that is necessary for 2230 a complete comprehension of the complete object. Example: Plain 2231 text documents. 2233 o For an aggregation of information objects, the associated language 2234 tags could be taken as the set of languages used inside components 2235 of that aggregation. Examples: Document stores and libraries. 2237 o For information objects whose purpose is to provide alternatives, 2238 the associated language tags could be regarded as a hint that the 2239 content is provided in several languages and that one has to 2240 inspect each of the alternatives in order to find its language or 2241 languages. In this case, the presence of multiple tags might not 2242 mean that one needs to be multi-lingual to get complete 2243 understanding of the document. Example: MIME multipart/ 2244 alternative. 2246 o In markup languages, such as HTML and XML, language information 2247 can be added to each part of the document identified by the markup 2248 structure (including the whole document itself). For example, one 2249 could write C'est la vie. inside a 2250 Norwegian document; the Norwegian-speaking user could then access 2251 a French-Norwegian dictionary to find out what the marked section 2252 meant. If the user were listening to that document through a 2253 speech synthesis interface, this formation could be used to signal 2254 the synthesizer to appropriately apply French text-to-speech 2255 pronunciation rules to that span of text, instead of applying the 2256 inappropriate Norwegian rules. 2258 Language tags are related when they contain a similar sequence of 2259 subtags. For example, if a language tag B contains language tag A as 2260 a prefix, then B is typically "narrower" or "more specific" than A. 2261 Thus, "zh-Hant-TW" is more specific than "zh-Hant". 2263 This relationship is not guaranteed in all cases: specifically, 2264 languages that begin with the same sequence of subtags are NOT 2265 guaranteed to be mutually intelligible, although they might be. For 2266 example, the tag "az" shares a prefix with both "az-Latn" 2267 (Azerbaijani written using the Latin script) and "az-Cyrl" 2268 (Azerbaijani written using the Cyrillic script). A person fluent in 2269 one script might not be able to read the other, even though the text 2270 might be identical. Content tagged as "az" most probably is written 2271 in just one script and thus might not be intelligible to a reader 2272 familiar with the other script. 2274 4.3. Length Considerations 2276 There is no defined upper limit on the size of language tags. While 2277 historically most language tags have consisted of language and region 2278 subtags with a combined total length of up to six characters, larger 2279 tags have always been both possible and actually appeared in use. 2281 Neither the language tag syntax nor other requirements in this 2282 document impose a fixed upper limit on the number of subtags in a 2283 language tag (and thus an upper bound on the size of a tag). The 2284 language tag syntax suggests that, depending on the specific 2285 language, more subtags (and thus a longer tag) are sometimes 2286 necessary to completely identify the language for certain 2287 applications; thus, it is possible to envision long or complex subtag 2288 sequences. 2290 4.3.1. Working with Limited Buffer Sizes 2292 Some applications and protocols are forced to allocate fixed buffer 2293 sizes or otherwise limit the length of a language tag. A conformant 2294 implementation or specification MAY refuse to support the storage of 2295 language tags that exceed a specified length. Any such limitation 2296 SHOULD be clearly documented, and such documentation SHOULD include 2297 what happens to longer tags (for example, whether an error value is 2298 generated or the language tag is truncated). A protocol that allows 2299 tags to be truncated at an arbitrary limit, without giving any 2300 indication of what that limit is, has the potential for causing harm 2301 by changing the meaning of tags in substantial ways. 2303 In practice, most language tags do not require more than a few 2304 subtags and will not approach reasonably sized buffer limitations; 2305 see Section 4.1. 2307 Some specifications or protocols have limits on tag length but do not 2308 have a fixed length limitation. For example, [RFC2231] has no 2309 explicit length limitation: the length available for the language tag 2310 is constrained by the length of other header components (such as the 2311 charset's name) coupled with the 76-character limit in [RFC2047]. 2312 Thus, the "limit" might be 50 or more characters, but it could 2313 potentially be quite small. 2315 The considerations for assigning a buffer limit are: 2317 Implementations SHOULD NOT truncate language tags unless the 2318 meaning of the tag is purposefully being changed, or unless the 2319 tag does not fit into a limited buffer size specified by a 2320 protocol for storage or transmission. 2322 Implementations SHOULD warn the user when a tag is truncated since 2323 truncation changes the semantic meaning of the tag. 2325 Implementations of protocols or specifications that are space 2326 constrained but do not have a fixed limit SHOULD use the longest 2327 possible tag in preference to truncation. 2329 Protocols or specifications that specify limited buffer sizes for 2330 language tags MUST allow for language tags of up to 33 characters. 2332 Protocols or specifications that specify limited buffer sizes for 2333 language tags SHOULD allow for language tags of at least 42 2334 characters. 2336 The following illustration shows how the 42-character recommendation 2337 was derived. The combination of language and extended language 2338 subtags was chosen for future compatibility. At up to 15 characters, 2339 this combination is longer than the longest possible primary language 2340 subtag (8 characters): 2342 language = 3 (ISO 639-2; ISO 639-1 requires 2) 2343 extlang1 = 4 (each subsequent subtag includes '-') 2344 extlang2 = 4 (unlikely: needs prefix="language-extlang1") 2345 extlang3 = 4 (extremely unlikely) 2346 script = 5 (if not suppressed: see Section 4.1) 2347 region = 4 (UN M.49; ISO 3166 requires 3) 2348 variant1 = 9 (needs 'language' as a prefix) 2349 variant2 = 9 (needs 'language-variant1' as a prefix) 2351 total = 42 characters 2353 Figure 6: Derivation of the Limit on Tag Length 2355 4.3.2. Truncation of Language Tags 2357 Truncation of a language tag alters the meaning of the tag, and thus 2358 SHOULD be avoided. However, truncation of language tags is sometimes 2359 necessary due to limited buffer sizes. Such truncation MUST NOT 2360 permit a subtag to be chopped off in the middle or the formation of 2361 invalid tags (for example, one ending with the "-" character). 2363 This means that applications or protocols that truncate tags MUST do 2364 so by progressively removing subtags along with their preceding "-" 2365 from the right side of the language tag until the tag is short enough 2366 for the given buffer. If the resulting tag ends with a single- 2367 character subtag, that subtag and its preceding "-" MUST also be 2368 removed. For example: 2370 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 2371 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 2372 2. zh-Latn-CN-variant1-a-extend1 2373 3. zh-Latn-CN-variant1 2374 4. zh-Latn-CN 2375 5. zh-Latn 2376 6. zh 2378 Figure 7: Example of Tag Truncation 2380 4.4. Canonicalization of Language Tags 2382 Since a particular language tag is sometimes used by many processes, 2383 language tags SHOULD always be created or generated in a canonical 2384 form. 2386 A language tag is in canonical form when: 2388 1. The tag is well-formed according the rules in Section 2.1 and 2389 Section 2.2. 2391 2. Subtags of type 'Region' that have a Preferred-Value mapping in 2392 the IANA registry (see Section 3.1) SHOULD be replaced with their 2393 mapped value. Note: In rare cases, the mapped value will also 2394 have a Preferred-Value. 2396 3. Redundant or grandfathered tags that have a Preferred-Value 2397 mapping in the IANA registry (see Section 3.1) MUST be replaced 2398 with their mapped value. These items either are deprecated 2399 mappings created before the adoption of this document (such as 2400 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 2401 the result of later registrations or additions to this document 2402 (for example, "zh-hakka" was deprecated in favor of the language- 2403 extlang combination "zh-hak" when this document was adopted). 2405 4. Other subtags that have a Preferred-Value mapping in the IANA 2406 registry (see Section 3.1) MUST be replaced with their mapped 2407 value. These items consist entirely of clerical corrections to 2408 ISO 639-1 in which the deprecated subtags have been maintained 2409 for compatibility purposes. 2411 5. If more than one extension subtag sequence exists, the extension 2412 sequences are ordered into case-insensitive ASCII order by 2413 singleton subtag. 2415 Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical 2416 form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in 2417 canonical form. 2419 Example: The language tag "en-BU" (English as used in Burma) is not 2420 canonical because the 'BU' subtag has a canonical mapping to 'MM' 2421 (Myanmar), although the tag "en-BU" maintains its validity. 2423 Canonicalization of language tags does not imply anything about the 2424 use of upper or lowercase letters when processing or comparing 2425 subtags (and as described in Section 2.1). All comparisons MUST be 2426 performed in a case-insensitive manner. 2428 When performing canonicalization of language tags, processors MAY 2429 regularize the case of the subtags (that is, this process is 2430 OPTIONAL), following the case used in the registry. Note that this 2431 corresponds to the following casing rules: uppercase all non-initial 2432 two-letter subtags; titlecase all non-initial four-letter subtags; 2433 lowercase everything else. 2435 Note: Case folding of ASCII letters in certain locales, unless 2436 carefully handled, sometimes produces non-ASCII character values. 2437 The Unicode Character Database file "SpecialCasing.txt" defines the 2438 specific cases that are known to cause problems with this. In 2439 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2440 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2441 Implementers SHOULD specify a locale-neutral casing operation to 2442 ensure that case folding of subtags does not produce this value, 2443 which is illegal in language tags. For example, if one were to 2444 uppercase the region subtag 'in' using Turkish locale rules, the 2445 sequence U+0130 U+004E would result instead of the expected 'IN'. 2447 Note: if the field 'Deprecated' appears in a registry record without 2448 an accompanying 'Preferred-Value' field, then that tag or subtag is 2449 deprecated without a replacement. Validating processors SHOULD NOT 2450 generate tags that include these values, although the values are 2451 canonical when they appear in a language tag. 2453 An extension MUST define any relationships that exist between the 2454 various subtags in the extension and thus MAY define an alternate 2455 canonicalization scheme for the extension's subtags. Extensions MAY 2456 define how the order of the extension's subtags are interpreted. For 2457 example, an extension could define that its subtags are in canonical 2458 order when the subtags are placed into ASCII order: that is, "en-a- 2459 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2460 define that the order of the subtags influences their semantic 2461 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2462 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2463 so that they are tolerant of the typical processes described in 2464 Section 3.7. 2466 4.5. Considerations for Private Use Subtags 2468 Private use subtags, like all other subtags, MUST conform to the 2469 format and content constraints in the ABNF. Private use subtags have 2470 no meaning outside the private agreement between the parties that 2471 intend to use or exchange language tags that employ them. The same 2472 subtags MAY be used with a different meaning under a separate private 2473 agreement. They SHOULD NOT be used where alternatives exist and 2474 SHOULD NOT be used in content or protocols intended for general use. 2476 Private use subtags are simply useless for information exchange 2477 without prior arrangement. The value and semantic meaning of private 2478 use tags and of the subtags used within such a language tag are not 2479 defined by this document. 2481 Subtags defined in the IANA registry as having a specific private use 2482 meaning convey more information that a purely private use tag 2483 prefixed by the singleton subtag 'x'. For applications, this 2484 additional information MAY be useful. 2486 For example, the region subtags 'AA', 'ZZ', and in the ranges 2487 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2488 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2489 great deal of public, interchangeable information about the language 2490 material (that it is Chinese in the simplified Chinese script and is 2491 suitable for some geographic region 'XQ'). While the precise 2492 geographic region is not known outside of private agreement, the tag 2493 conveys far more information than an opaque tag such as "x-someLang", 2494 which contains no information about the language subtag or script 2495 subtag outside of the private agreement. 2497 However, in some cases content tagged with private use subtags MAY 2498 interact with other systems in a different and possibly unsuitable 2499 manner compared to tags that use opaque, privately defined subtags, 2500 so the choice of the best approach sometimes depends on the 2501 particular domain in question. 2503 5. IANA Considerations 2505 This section deals with the processes and requirements necessary for 2506 IANA to undertake to maintain the subtag and extension registries as 2507 defined by this document and in accordance with the requirements of 2508 [RFC2434]. 2510 The impact on the IANA maintainers of the two registries defined by 2511 this document will be a small increase in the frequency of new 2512 entries or updates. 2514 5.1. Language Subtag Registry 2516 Upon adoption of this document, IANA will update the registry using 2517 instructions and content provided in a companion document: 2518 [registry-update]. The criteria and process for selecting the 2519 updated set of records are described in that document. The updated 2520 set of records represents no impact on IANA, since the work to create 2521 it will be performed externally. 2523 Future work on the Language Subtag Registry has been limited to 2524 inserting or replacing whole records preformatted for IANA by the 2525 Language Subtag Reviewer as described in Section 3.3 of this document 2526 and archiving and making publically available the forwarded 2527 registration form. 2529 Each registration form sent to IANA contains a single record for 2530 incorporation into the registry. The form MUST be sent to 2531 iana@iana.org by the Language Subtag Reviewer. It will have a 2532 subject line indicating whether the enclosed form represents an 2533 insertion of a new record (indicated by the word "INSERT" in the 2534 subject line) or a replacement of an existing record (indicated by 2535 the word "MODIFY" in the subject line). Records MUST NOT be deleted 2536 from the registry. 2538 IANA MUST extract the record from the form and place the inserted or 2539 modified record into the appropriate section of the language subtag 2540 registry, grouping the records by their 'Type' field. Inserted 2541 records MAY be placed anywhere in the appropriate section; there is 2542 no guarantee of the order of the records beyond grouping them 2543 together by 'Type'. Modified records MUST overwrite the record they 2544 replace. 2546 IANA MUST update the File-Date record to contain the most recent 2547 modification date when performing any inserting or modification: 2548 included in any request to insert or modify records will be a new 2549 File-Date record indicating the acceptance date of the record. This 2550 record MUST be placed first in the registry, replacing the existing 2551 File-Date record. In the event that the File-Date record present in 2552 the registry has a later date than the record being inserted or 2553 modified, then the latest (most recent) record MUST be preserved. 2554 IANA SHOULD process multiple registration requests in order according 2555 to the File-Date in the form, since one registration could otherwise 2556 cause a more recent change to be overwritten. 2558 The updated registry file MUST use the UTF-8 character encoding and 2559 IANA MUST check the registry file for proper encoding. Non-ASCII 2560 characters can be sent to IANA by attaching the registration form to 2561 the email message or by using various encodings in the mail message 2562 body (UTF-8 is recommended). IANA will verify any unclear or 2563 corrupted characters with the Language Subtag Reviewer prior to 2564 posting the updated registry. 2566 The registration form sent to IANA MUST be archived and made publicly 2567 available from 2568 "http://www.iana.org/assignments/lang-subtags-templates/". Note that 2569 multiple registrations can pertain to the same record in the 2570 registry. 2572 Developers who are dependent upon the language subtag registry 2573 sometimes would like to be informed of changes in the registry so 2574 that they can update their implementations. When any change is made 2575 to the language subtag registry, IANA MUST send an announcement 2576 message to ietf-languages-announcements@iana.org (a self-subscribing 2577 list that only IANA can post to). 2579 5.2. Extensions Registry 2581 The Language Tag Extensions Registry can contain at most 35 records 2582 and thus changes to this registry are expected to be very infrequent. 2584 Future work by IANA on the Language Tag Extensions Registry is 2585 limited to two cases. First, the IESG MAY request that new records 2586 be inserted into this registry from time to time. These requests 2587 MUST include the record to insert in the exact format described in 2588 Section 3.7. In addition, there MAY be occasional requests from the 2589 maintaining authority for a specific extension to update the contact 2590 information or URLs in the record. These requests MUST include the 2591 complete, updated record. IANA is not responsible for validating the 2592 information provided, only that it is properly formatted. It should 2593 reasonably be seen to come from the maintaining authority named in 2594 the record present in the registry. 2596 6. Security Considerations 2598 Language tags used in content negotiation, like any other information 2599 exchanged on the Internet, might be a source of concern because they 2600 might be used to infer the nationality of the sender, and thus 2601 identify potential targets for surveillance. 2603 This is a special case of the general problem that anything sent is 2604 visible to the receiving party and possibly to third parties as well. 2605 It is useful to be aware that such concerns can exist in some cases. 2607 The evaluation of the exact magnitude of the threat, and any possible 2608 countermeasures, is left to each application protocol (see BCP 72 2609 [RFC3552] for best current practice guidance on security threats and 2610 defenses). 2612 The language tag associated with a particular information item is of 2613 no consequence whatsoever in determining whether that content might 2614 contain possible homographs. The fact that a text is tagged as being 2615 in one language or using a particular script subtag provides no 2616 assurance whatsoever that it does not contain characters from scripts 2617 other than the one(s) associated with or specified by that language 2618 tag. 2620 Since there is no limit to the number of variant, private use, and 2621 extension subtags, and consequently no limit on the possible length 2622 of a tag, implementations need to guard against buffer overflow 2623 attacks. See Section 4.3 for details on language tag truncation, 2624 which can occur as a consequence of defenses against buffer overflow. 2626 Although the specification of valid subtags for an extension (see 2627 Section 3.7) MUST be available over the Internet, implementations 2628 SHOULD NOT mechanically depend on it being always accessible, to 2629 prevent denial-of-service attacks. 2631 7. Character Set Considerations 2633 The syntax in this document requires that language tags use only the 2634 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2635 character sets, so the composition of language tags should not have 2636 any character set issues. 2638 Rendering of characters based on the content of a language tag is not 2639 addressed in this memo. Historically, some languages have relied on 2640 the use of specific character sets or other information in order to 2641 infer how a specific character should be rendered (notably this 2642 applies to language- and culture-specific variations of Han 2643 ideographs as used in Japanese, Chinese, and Korean). When language 2644 tags are applied to spans of text, rendering engines sometimes use 2645 that information in deciding which font to use in the absence of 2646 other information, particularly where languages with distinct writing 2647 traditions use the same characters. 2649 8. Changes from RFC 4646 2651 The main goal for this revision of this document was to incorporate 2652 ISO 639-3 and its attendent set of language codes into the IANA 2653 Language Subtag Registry, permitting the identification of many more 2654 languages and dialects than previously supported. 2656 The specific changes in this document to meet these goals are: 2658 o Defines the incorporation of ISO 639-3 codes as language and 2659 extlang subtags. Extlangs are now permitted in language tags. 2660 The changes necessary to achieve this were: 2662 * something 2664 o Changed the ABNF related to grandfathered tags. The irregular 2665 tags are now listed. Well-formed grandfathered tags are now 2666 described by the 'langtag' production and the 'grandfathered' 2667 production was removed as a result. Also: added description of 2668 both types of grandfathered tags to Section 2.2.8. 2670 o Added the paragraph on "collections" to Section 4.1. 2672 o Changed the capitalization rules for 'Tag' fields in Section 3.1. 2674 o Split section 3.1 up into subsections. 2676 o Modified section 3.5 to allow Suppress-Script fields to be added, 2677 modified, or removed via the registration process. This was an 2678 erratum from RFC 4646. 2680 o Modified examples that used region code 'CS' (formerly Serbia and 2681 Montenegro) to use 'RS' (Serbia) instead. 2683 o Modified the rules for creating and maintaining record 2684 'Description' fields to prevent duplicates, including inverted 2685 duplicates. 2687 o Removed the lengthy description of why RFC 4646 was created from 2688 this section, which also caused the removal of the reference to 2689 XML Schema. 2691 o Modified the text in section 2.1 to place more emphasis on the 2692 fact that language tags are not case sensitive. 2694 o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS" 2695 and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the 2696 Suppress-Script on 'Latn' with 'fr'. 2698 o Changed the requirements for well-formedness to make singleton 2699 repetition checking optional (it is required for validity 2700 checking) in Section 2.2.9. 2702 o Changed the text in Section 2.2.9 refering to grandfathered 2703 checking to note that the list is now included in the ABNF. 2705 o Modified and added text to Section 3.2. The job description was 2706 placed first. A note was added making clear that the Language 2707 Subtag Reviewer may delegate various non-critical duties, 2708 including list moderation. Finally, additional text was added to 2709 make the appointment process clear and to clarify that decisions 2710 and performance of the reviewer are appealable. 2712 o Added text to Section 3.5 clarifying that the ietf-languages list 2713 is operated by whomever the IESG appoints. 2715 o Added text to Section 3.1.4 clarifying that the first Description 2716 in a 'language' or 'extlang' record matches the corresponding 2717 Reference Name for the language in ISO 639-3. 2719 o Modified Section 2.2.9 to define classes of conformance related to 2720 specific tags (formerly 'well-formed' and 'valid' referred to 2721 implementations). 2723 o Added text to the end of Section 3.1.2 noting that future versions 2724 of this document might add new field types and recommending that 2725 implementations ignore any unrecognized fields. 2727 o Modified the 'extlang' examples in Appendix A to use valid subtags 2728 and removed the note saying that they were only examples. 2730 o Added text about what the lack of a Suppress-Script field means in 2731 a record to Section 3.1.8. 2733 o Added text allowing the correction of misspellings and typographic 2734 errors to Section 3.1.4. 2736 o Added text to Section 3.1.7 disallowing Prefix field conflicts 2737 (such as circular prefix references). 2739 o Modified text in Section 3.5 to require the subtag reviewer to 2740 announce his/her decision (or extension) following the two-week 2741 period. Also clarified that any decision or failure to decide can 2742 be appealed. 2744 o Modified text in Section 4.1 to include the (heretofore anecdotal) 2745 guiding principle of tag choice, and clarifying the non-use of 2746 script subtags in non-written applications. Also updated examples 2747 in this section to use Chamic languages as an example of language 2748 collections. 2750 o Prohibited multiple use of the same variant in a tag (i.e. "de- 2751 1901-1901"). Previously this was only a recommendation 2752 ("SHOULD"). 2754 o Removed inappropriate [RFC2119] language from the illustration in 2755 Section 4.3.1. 2757 o Replaced the example of "zh-gouyu" with "zh-hakka"->"zh-hak" in 2758 Section 4.4, noting that it was this document that caused the 2759 change. 2761 o Replaced the section in Section 4.1 dealing with "mul"/"und" to 2762 include the subtags 'zxx' and 'mis', as well as the tag 2763 "i-default". A normative reference to RFC 2277 was added, along 2764 with an informative reference to MARC21. 2766 o Added text to Section 3.5 clarifying that any modifications of a 2767 registration request must be sent to the ietf-languages list 2768 before submission to IANA. 2770 o Changed the ABNF for the record-jar format from using the LWSP 2771 production to use a folding whitespace production similar to obs- 2772 FWS in [RFC4234]. This effectively prevents unintentional blank 2773 lines inside a field. 2775 o Clarified and revised text in Section 3.3, Section 3.5, and 2776 Section 5.1 to clarify that the Language Subtag Reviewer sends the 2777 complete registration forms to IANA, that IANA extracts the record 2778 from the form, and that the forms must also be archived separately 2779 from the registry. 2781 o Added text to Section 5 requiring IANA to send an announcement to 2782 an ietf-languages-announce list whenever the registry is updated. 2784 o Modification of the registry to use UTF-8 as its character 2785 encoding. This also entails additional instructions to IANA and 2786 the Language Subtag Reviewer in the registration process. 2788 o Modified the rules in Section 2.2.4 so that "exceptionally 2789 reserved" ISO 3166-1 codes other than 'UK' were included into the 2790 registry. In particular, this allows the code 'EU' (European 2791 Union) to be used to form language tags or (more commonly) for 2792 applications that use the registry for region codes to reference 2793 this subtag. 2795 [[Ed.Note: Open issues in this version: 2797 Whether encompassed language rules for the creation of extlang 2798 records in the registry should be retained or modified. 2800 Inclusion of additional information related to Suppress-Script in 2801 the registry (e.g. that it wasn't assigned on purpose) 2803 ]] 2805 9. References 2807 9.1. Normative References 2809 [ISO10646] 2810 International Organization for Standardization, "ISO/IEC 2811 10646:2003. Information technology -- Universal Multiple- 2812 Octet Coded Character Set (UCS)", 2003. 2814 [ISO15924] 2815 International Organization for Standardization, "ISO 2816 15924:2004. Information and documentation -- Codes for the 2817 representation of names of scripts", January 2004. 2819 [ISO3166-1] 2820 International Organization for Standardization, "ISO 3166- 2821 1:1997. Codes for the representation of names of countries 2822 and their subdivisions -- Part 1: Country codes", 1997. 2824 [ISO639-1] 2825 International Organization for Standardization, "ISO 639- 2826 1:2002. Codes for the representation of names of languages 2827 -- Part 1: Alpha-2 code", 2002. 2829 [ISO639-2] 2830 International Organization for Standardization, "ISO 639- 2831 2:1998. Codes for the representation of names of languages 2832 -- Part 2: Alpha-3 code, first edition", 1998. 2834 [ISO639-3] 2835 International Organization for Standardization, "ISO 639- 2836 3:2007. Codes for the representation of names of languages 2837 -- Part 3: Alpha-3 code for comprehensive coverage of 2838 languages", 2007. 2840 [ISO646] International Organization for Standardization, "ISO/IEC 2841 646:1991, Information technology -- ISO 7-bit coded 2842 character set for information interchange.", 1991. 2844 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2845 3", BCP 9, RFC 2026, October 1996. 2847 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2848 the IETF Standards Process", BCP 11, RFC 2028, 2849 October 1996. 2851 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2852 Requirement Levels", BCP 14, RFC 2119, March 1997. 2854 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 2855 Languages", BCP 18, RFC 2277, January 1998. 2857 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2858 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2859 October 1998. 2861 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2862 Understanding Concerning the Technical Work of the 2863 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2865 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2866 Timestamps", RFC 3339, July 2002. 2868 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 2869 Specifications: ABNF", RFC 4234, October 2005. 2871 [RFC4645] Ewell, D., Ed., "Initial Language Subtag Registry", 2872 September 2006, . 2874 [RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of Language 2875 Tags", September 2006, 2876 . 2878 [UN_M.49] Statistics Division, United Nations, "Standard Country or 2879 Area Codes for Statistical Use", UN Standard Country or 2880 Area Codes for Statistical Use, Revision 4 (United Nations 2881 publication, Sales No. 98.XVII.9, June 1999. 2883 9.2. Informative References 2885 [RFC1766] Alvestrand, H., "Tags for the Identification of 2886 Languages", RFC 1766, March 1995. 2888 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2889 Part Three: Message Header Extensions for Non-ASCII Text", 2890 RFC 2047, November 1996. 2892 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2893 Word Extensions: Character Sets, Languages, and 2894 Continuations", RFC 2231, November 1997. 2896 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2897 10646", RFC 2781, February 2000. 2899 [RFC3066] Alvestrand, H., "Tags for the Identification of 2900 Languages", BCP 47, RFC 3066, January 2001. 2902 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2903 Text on Security Considerations", BCP 72, RFC 3552, 2904 July 2003. 2906 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 2907 10646", STD 63, RFC 3629, November 2003. 2909 [RFC4646] Phillips, A., Ed. and M. Davis, Ed., "Tags for the 2910 Identification of Languages", September 2006, 2911 . 2913 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2914 Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003. 2915 ISBN 0-321-49081-0)", January 2007. 2917 [iso639.prin] 2918 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2919 Committee: Working principles for ISO 639 maintenance", 2920 March 2000, 2921 . 2924 [record-jar] 2925 Raymond, E., "The Art of Unix Programming", 2003, 2926 . 2928 [registry-update] 2929 Ewell, D., Ed., "Update to the Language Subtag Registry", 2930 September 2006, . 2933 Appendix A. Acknowledgements 2935 Any list of contributors is bound to be incomplete; please regard the 2936 following as only a selection from the group of people who have 2937 contributed to make this document what it is today. 2939 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the 2940 precursors of this document, made enormous contributions directly or 2941 indirectly to this document and are generally responsible for the 2942 success of language tags. 2944 The following people contributed to this document: 2946 Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan, 2947 Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion 2948 Gunn, Kent Karlsson, Chris Newman, Randy Presuhn, Stephen Silver, and 2949 many, many others. 2951 Very special thanks must go to Harald Tveit Alvestrand, who 2952 originated RFCs 1766 and 3066, and without whom this document would 2953 not have been possible. 2955 Special thanks go to Michael Everson, who served as the Language Tag 2956 Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as 2957 the Language Subtag Reviewer since the adoption of RFC 4646. 2959 Special thanks also to Doug Ewell, for his production of the first 2960 complete subtag registry, his work to support and maintain new 2961 registrations, and his careful editorship of both RFC 4645 and 2962 [registry-update]. 2964 Appendix B. Examples of Language Tags (Informative) 2966 Simple language subtag: 2968 de (German) 2970 fr (French) 2972 ja (Japanese) 2974 i-enochian (example of a grandfathered tag) 2976 Language subtag plus Script subtag: 2978 zh-Hant (Chinese written using the Traditional Chinese script) 2980 zh-Hans (Chinese written using the Simplified Chinese script) 2982 sr-Cyrl (Serbian written using the Cyrillic script) 2984 sr-Latn (Serbian written using the Latin script) 2986 Language-Script-Region: 2988 zh-Hans-CN (Chinese written using the Simplified script as used in 2989 mainland China) 2991 sr-Latn-RS (Serbian written using the Latin script as used in 2992 Serbia) 2994 Language-Variant: 2996 sl-rozaj (Resian dialect of Slovenian) 2998 sl-nedis (Nadiza dialect of Slovenian) 3000 Language-Region-Variant: 3002 de-CH-1901 (German as used in Switzerland using the 1901 variant 3003 [orthography]) 3005 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 3007 Language-Script-Region-Variant: 3009 hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as 3010 used in Italy) 3012 Language-Region: 3014 de-DE (German for Germany) 3016 en-US (English as used in the United States) 3018 es-419 (Spanish appropriate for the Latin America and Caribbean 3019 region using the UN region code) 3021 Private use subtags: 3023 de-CH-x-phonebk 3025 az-Arab-x-AZE-derbend 3027 Extended language subtags: 3029 zh-cmn 3031 zh-cmn-Hant-CN 3033 Private use registry values: 3035 x-whatever (private use using the singleton 'x') 3037 qaa-Qaaa-QM-x-southern (all private tags) 3039 de-Qaaa (German, with a private script) 3041 sr-Latn-QM (Serbian, Latin-script, private region) 3043 sr-Qaaa-RS (Serbian, private script, for Serbia) 3045 Tags that use extensions (examples ONLY: extensions MUST be defined 3046 by revision or update to this document or by RFC): 3048 en-US-u-islamCal 3050 zh-CN-a-myExt-x-private 3052 en-a-myExt-b-another 3054 Some Invalid Tags: 3056 de-419-DE (two region tags) 3058 a-DE (use of a single-character subtag in primary position; note 3059 that there are a few grandfathered tags that start with "i-" that 3060 are valid) 3062 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter 3063 prefix) 3065 Appendix C. Examples of Registration Forms 3066 LANGUAGE SUBTAG REGISTRATION FORM 3067 1. Name of requester: Han Steenwijk 3068 2. E-mail address of requester: han.steenwijk @ unipd.it 3069 3. Record Requested: 3071 Type: variant 3072 Subtag: biske 3073 Description: The San Giorgio dialect of Resian 3074 Description: The Bila dialect of Resian 3075 Prefix: sl-rozaj 3076 Comments: The dialect of San Giorgio/Bila is one of the 3077 four major local dialects of Resian 3079 4. Intended meaning of the subtag: The local variety of Resian as 3080 spoken in San Giorgio/Bila 3082 5. Reference to published description of the language (book or 3083 article): 3084 -- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich 3085 govorov, Varsava - Peterburg: Vende - Kozancikov, 1875. 3087 LANGUAGE SUBTAG REGISTRATION FORM 3088 1. Name of requester: Jaska Zedlik 3089 2. E-mail address of requester: jz53 @ zedlik.com 3090 3. Record Requested: 3092 Type: variant 3093 Subtag: tarask 3094 Description: Belarusian in Taraskievica orthography 3095 Prefix: be 3096 Comments: The subtag represents Branislau Taraskievic's Belarusian 3097 orthography as published in "Bielaruski klasycny pravapis" by Juras 3098 Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka 3099 (Vilnia-Miensk 2005). 3101 4. Intended meaning of the subtag: 3103 The subtag is intended to represent the Belarusian orthography as 3104 published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk 3105 Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005). 3107 5. Reference to published description of the language (book or article): 3109 Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd. 3110 "Bielaruskaha kamitetu", 1929, 5th edition. 3112 Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier. 3113 Bielaruski klasycny pravapis. Vilnia-Miensk, 2005. 3115 6. Any other relevant information: 3117 Belarusian in Taraskievica orthography became widely used, especially in 3118 Belarusian-speaking Internet segment, but besides this some books and 3119 newspapers are also printed using this orthography of Belarusian. 3121 Authors' Addresses 3123 Addison Phillips (editor) 3124 Yahoo! Inc. 3126 Email: addison@inter-locale.com 3127 URI: http://www.inter-locale.com 3129 Mark Davis (editor) 3130 Google 3132 Email: mark.davis@macchiato.com or mark.davis@google.com 3134 Full Copyright Statement 3136 Copyright (C) The IETF Trust (2007). 3138 This document is subject to the rights, licenses and restrictions 3139 contained in BCP 78, and except as set forth therein, the authors 3140 retain all their rights. 3142 This document and the information contained herein are provided on an 3143 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 3144 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 3145 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 3146 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 3147 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 3148 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 3150 Intellectual Property 3152 The IETF takes no position regarding the validity or scope of any 3153 Intellectual Property Rights or other rights that might be claimed to 3154 pertain to the implementation or use of the technology described in 3155 this document or the extent to which any license under such rights 3156 might or might not be available; nor does it represent that it has 3157 made any independent effort to identify any such rights. Information 3158 on the procedures with respect to rights in RFC documents can be 3159 found in BCP 78 and BCP 79. 3161 Copies of IPR disclosures made to the IETF Secretariat and any 3162 assurances of licenses to be made available, or the result of an 3163 attempt made to obtain a general license or permission for the use of 3164 such proprietary rights by implementers or users of this 3165 specification can be obtained from the IETF on-line IPR repository at 3166 http://www.ietf.org/ipr. 3168 The IETF invites any interested party to bring to its attention any 3169 copyrights, patents or patent applications, or other proprietary 3170 rights that may cover technology that may be required to implement 3171 this standard. Please address the information to the IETF at 3172 ietf-ipr@ietf.org. 3174 Acknowledgment 3176 Funding for the RFC Editor function is provided by the IETF 3177 Administrative Support Activity (IASA).