idnits 2.17.1 draft-ietf-ltru-4646bis-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 3170. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3181. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3188. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3194. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC4646, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 14, 2008) is 5887 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2860 ** Downref: Normative reference to an Informational RFC: RFC 4645 -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX14' -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) -- Obsolete informational reference (is this intentional?): RFC 4646 (Obsoleted by RFC 5646) Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc. 4 Obsoletes: 4646 (if approved) M. Davis, Ed. 5 Intended status: Best Current Google 6 Practice March 14, 2008 7 Expires: September 15, 2008 9 Tags for Identifying Languages 10 draft-ietf-ltru-4646bis-12 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on September 15, 2008. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2008). 41 Abstract 43 This document describes the structure, content, construction, and 44 semantics of language tags for use in cases where it is desirable to 45 indicate the language used in an information object. It also 46 describes how to register values for use in language tags and the 47 creation of user-defined extensions for private interchange. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 5 53 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 5 54 2.2. Language Subtag Sources and Interpretation . . . . . . . . 8 55 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . 9 56 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11 57 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 11 58 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 12 59 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 14 60 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 15 61 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 17 62 2.2.8. Grandfathered Registrations . . . . . . . . . . . . . 17 63 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 18 64 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 20 65 3.1. Format of the IANA Language Subtag Registry . . . . . . . 20 66 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 20 67 3.1.2. Record Definitions . . . . . . . . . . . . . . . . . . 21 68 3.1.3. Subtag and Tag Fields . . . . . . . . . . . . . . . . 24 69 3.1.4. Description Field . . . . . . . . . . . . . . . . . . 24 70 3.1.5. Deprecated Field . . . . . . . . . . . . . . . . . . . 25 71 3.1.6. Preferred-Value Field . . . . . . . . . . . . . . . . 25 72 3.1.7. Prefix Field . . . . . . . . . . . . . . . . . . . . . 26 73 3.1.8. Suppress-Script Field . . . . . . . . . . . . . . . . 27 74 3.1.9. Macrolanguage Field . . . . . . . . . . . . . . . . . 27 75 3.1.10. Comments Field . . . . . . . . . . . . . . . . . . . . 28 76 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 28 77 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 29 78 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 29 79 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 33 80 3.6. Possibilities for Registration . . . . . . . . . . . . . . 37 81 3.7. Extensions and the Extensions Registry . . . . . . . . . . 39 82 3.8. Update of the Language Subtag Registry . . . . . . . . . . 42 83 4. Formation and Processing of Language Tags . . . . . . . . . . 44 84 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 44 85 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 49 86 4.3. Length Considerations . . . . . . . . . . . . . . . . . . 50 87 4.3.1. Working with Limited Buffer Sizes . . . . . . . . . . 51 88 4.3.2. Truncation of Language Tags . . . . . . . . . . . . . 52 89 4.4. Canonicalization of Language Tags . . . . . . . . . . . . 53 90 4.5. Considerations for Private Use Subtags . . . . . . . . . . 55 91 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 56 92 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 56 93 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 57 94 6. Security Considerations . . . . . . . . . . . . . . . . . . . 59 95 7. Character Set Considerations . . . . . . . . . . . . . . . . . 60 96 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 61 97 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 65 98 9.1. Normative References . . . . . . . . . . . . . . . . . . . 65 99 9.2. Informative References . . . . . . . . . . . . . . . . . . 66 100 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 68 101 Appendix B. Examples of Language Tags (Informative) . . . . . . . 69 102 Appendix C. Examples of Registration Forms . . . . . . . . . . . 72 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 74 104 Intellectual Property and Copyright Statements . . . . . . . . . . 75 106 1. Introduction 108 Human beings on our planet have, past and present, used a number of 109 languages. There are many reasons why one would want to identify the 110 language used when presenting or requesting information. 112 A user's language preferences often need to be identified so that 113 appropriate processing can be applied. For example, the user's 114 language preferences in a Web browser can be used to select Web pages 115 appropriately. Language preferences can also be used to select among 116 tools (such as dictionaries) to assist in the processing or 117 understanding of content in different languages. 119 In addition, knowledge about the particular language used by some 120 piece of information content might be useful or even required by some 121 types of processing; for example, spell-checking, computer- 122 synthesized speech, Braille transcription, or high-quality print 123 renderings. 125 One means of indicating the language used is by labeling the 126 information content with an identifier or "tag". These tags can be 127 used to specify user preferences when selecting information content, 128 or for labeling additional attributes of content and associated 129 resources. 131 Tags can also be used to indicate additional language attributes of 132 content. For example, indicating specific information about the 133 dialect, writing system, or orthography used in a document or 134 resource may enable the user to obtain information in a form that 135 they can understand, or it can be important in processing or 136 rendering the given content into an appropriate form or style. 138 This document specifies a particular identifier mechanism (the 139 language tag) and a registration function for values to be used to 140 form tags. It also defines a mechanism for private use values and 141 future extension. 143 This document replaces [RFC4646], which replaced [RFC3066] and its 144 predecessor [RFC1766]. For a list of changes in this document, see 145 Section 8. 147 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 148 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 149 document are to be interpreted as described in [RFC2119]. 151 2. The Language Tag 153 Language tags are used to help identify languages, whether spoken, 154 written, signed, or otherwise signaled, for the purpose of 155 communication. This includes constructed and artificial languages, 156 but excludes languages not intended primarily for human 157 communication,such as programming languages. 159 2.1. Syntax 161 The language tag is composed of one or more parts, known as 162 "subtags". Each subtag consists of a sequence of alphanumeric 163 characters. Subtags are distinguished and separated from one another 164 by a hyphen ("-", ABNF [RFC5234] %x2D). A language tag consists of a 165 "primary language" subtag and a (possibly empty) series of subsequent 166 subtags, each of which refines or narrows the range of languages 167 identified by the overall tag. 169 Usually, each type of subtag is distinguished by length, position in 170 the tag, and content: subtags can be recognized solely by these 171 features. The only exception to this is a fixed list of 172 grandfathered tags registered under RFC 3066 [RFC3066]. This makes 173 it possible to construct a parser that can extract and assign some 174 semantic information to the subtags, even if the specific subtag 175 values are not recognized. Thus, a parser need not have an up-to- 176 date copy (or any copy at all) of the subtag registry to perform 177 common searching and matching operations. 179 The syntax of the language tag in ABNF [RFC5234] is: 181 Language-Tag = langtag 182 / privateuse ; private use tag 183 / irregular ; tags grandfathered by rule 185 langtag = (language 186 ["-" script] 187 ["-" region] 188 *("-" variant) 189 *("-" extension) 190 ["-" privateuse]) 192 language = (2*3ALPHA) ; shortest ISO 639 code 193 / 4ALPHA ; reserved for future use 194 / 5*8ALPHA ; registered language subtag 196 script = 4ALPHA ; ISO 15924 code 198 region = 2ALPHA ; ISO 3166-1 code 199 / 3DIGIT ; UN M.49 code 201 variant = 5*8alphanum ; registered variants 202 / (DIGIT 3alphanum) 204 extension = singleton 1*("-" (2*8alphanum)) 206 singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT 207 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" 208 ; Single alphanumerics 209 ; "x" is reserved for private use 211 privateuse = "x" 1*("-" (1*8alphanum)) 213 irregular = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default" 214 / "i-enochian" / "i-hak" / "i-klingon" / "i-lux" 215 / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" 216 / "i-tay" / "i-tsu" / "no-bok" / "no-nyn" 217 / "sgn-BE-fr" / "sgn-BE-nl" / "sgn-CH-de" / "zh-cmn" 218 / "zh-cmn-Hans" / "zh-cmn-Hant" / "zh-gan" 219 / "zh-min" / "zh-min-nan" / "zh-wuu" / "zh-yue" 221 alphanum = (ALPHA / DIGIT) ; letters and numbers 223 Figure 1: Language Tag ABNF 225 All subtags have a maximum length of eight characters and whitespace 226 is not permitted in a language tag. There is a subtlety in the ABNF 227 production 'variant': variants starting with a digit MAY be four 228 characters long, while those starting with a letter MUST be at least 229 five characters long. For examples of language tags, see Appendix B. 231 Note Well: the ABNF syntax does not distinguish between upper and 232 lowercase. The appearance of upper and lowercase letters in the 233 various ABNF productions above do not affect how implementations 234 interpret tags. That is, the tag "I-AMI" matches the item "i-ami" in 235 the 'irregular' production. At all times, the tags and their 236 subtags, including private use and extensions, are to be treated as 237 case insensitive: there exist conventions for the capitalization of 238 some of the subtags, but these MUST NOT be taken to carry meaning. 240 For example: 242 o [ISO639-1] recommends that language codes be written in lowercase 243 ('mn' Mongolian). 245 o [ISO3166-1] recommends that country codes be capitalized ('MN' 246 Mongolia). 248 o [ISO15924] recommends that script codes use lowercase with the 249 initial letter capitalized ('Cyrl' Cyrillic). 251 However, in the tags defined by this document, the uppercase US-ASCII 252 letters in the range 'A' through 'Z' are considered equivalent and 253 mapped directly to their US-ASCII lowercase equivalents in the range 254 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from 255 "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of 256 these variations conveys the same meaning: Mongolian written in the 257 Cyrillic script as used in Mongolia. 259 Although case distinctions do not carry meaning in language tags, 260 consistent formatting and presentation of the tags will aid users. 261 The format of the tags and subtags in the registry is RECOMMENDED. 262 In this format, all non-initial two-letter subtags are uppercase, all 263 non-initial four-letter subtags are titlecase, and all other subtags 264 are lowercase. 266 Note that although [RFC5234] refers to octets, the language tags 267 described in this document are sequences of characters from the US- 268 ASCII [ISO646] repertoire. Language tags MAY be used in documents 269 and applications that use other encodings, so long as these encompass 270 the US-ASCII repertoire. An example of this would be an XML document 271 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 273 2.2. Language Subtag Sources and Interpretation 275 The namespace of language tags and their subtags is administered by 276 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 277 the rules in Section 5 of this document. The Language Subtag 278 Registry maintained by IANA is the source for valid subtags: other 279 standards referenced in this section provide the source material for 280 that registry. 282 Terminology used in this document: 284 o "Tag" refers to a complete language tag, such as "sr-Latn-RS" or 285 "az-Arab-IR". Examples of tags in this document are enclosed in 286 double-quotes ("en-US"). 288 o "Subtag" refers to a specific section of a tag, delimited by 289 hyphen, such as the subtag 'Hant' in "zh-Hant-CN". Examples of 290 subtags in this document are enclosed in single quotes ('Hant'). 292 o "Code" refers to values defined in external standards (and which 293 are used as subtags in this document). For example, 'Hant' is an 294 [ISO15924] script code that was used to define the 'Hant' script 295 subtag for use in a language tag. Examples of codes in this 296 document are enclosed in single quotes ('en', 'Hant'). 298 The definitions in this section apply to the various subtags within 299 the language tags defined by this document, excepting those 300 "grandfathered" tags defined in Section 2.2.8. 302 Language tags are designed so that each subtag type has unique length 303 and content restrictions. These make identification of the subtag's 304 type possible, even if the content of the subtag itself is 305 unrecognized. This allows tags to be parsed and processed without 306 reference to the latest version of the underlying standards or the 307 IANA registry and makes the associated exception handling when 308 parsing tags simpler. 310 Subtags in the IANA registry that do not come from an underlying 311 standard can only appear in specific positions in a tag. 312 Specifically, they can only occur as primary language subtags or as 313 variant subtags. 315 Note that sequences of private use and extension subtags MUST occur 316 at the end of the sequence of subtags and MUST NOT be interspersed 317 with subtags defined elsewhere in this document. 319 Single-letter and single-digit subtags are reserved for current or 320 future use. These include the following current uses: 322 o The single-letter subtag 'x' is reserved to introduce a sequence 323 of private use subtags. The interpretation of any private use 324 subtags is defined solely by private agreement and is not defined 325 by the rules in this section or in any standard or registry 326 defined in this document. 328 o All other single-letter subtags are reserved to introduce 329 standardized extension subtag sequences as described in 330 Section 3.7. 332 o The single-letter subtag 'i' is used by some grandfathered tags, 333 such as "i-default", where it always appears in the first position 334 and cannot be confused with an extension. 336 2.2.1. Primary Language Subtag 338 The primary language subtag is the first subtag in a language tag 339 (with the exception of private use and certain grandfathered tags) 340 and cannot be omitted. The following rules apply to the primary 341 language subtag: 343 1. All two-character primary language subtags were defined in the 344 IANA registry according to the assignments found in the standard 345 ISO 639 Part 1, "ISO 639-1:2002, Codes for the representation of 346 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 347 assignments subsequently made by the ISO 639-1 registration 348 authority (RA) or governing standardization bodies. 350 2. All three-character primary language subtags were defined in the 351 IANA registry according to the assignments found in either ISO 352 639 Part 2, "ISO 639-2:1998 - Codes for the representation of 353 names of languages -- Part 2: Alpha-3 code - edition 1" 354 [ISO639-2], ISO 639 Part 3, "Codes for the representation of 355 names of languages -- Part 3: Alpha-3 code for comprehensive 356 coverage of languages" [ISO639-3], or assignments subsequently 357 made by the relevant ISO 639 registration authorities or 358 governing standardization bodies. 360 3. The subtags in the range 'qaa' through 'qtz' are reserved for 361 private use in language tags. These subtags correspond to codes 362 reserved by ISO 639-2 for private use. These codes MAY be used 363 for non-registered primary language subtags (instead of using 364 private use subtags following 'x-'). Please refer to Section 4.5 365 for more information on private use subtags. 367 4. All four-character language subtags are reserved for possible 368 future standardization. 370 5. All language subtags of 5 to 8 characters in length in the IANA 371 registry were defined via the registration process in Section 3.5 372 and MAY be used to form the primary language subtag. At the time 373 this document was created, there were no examples of this kind of 374 subtag and future registrations of this type will be discouraged: 375 primary languages are strongly RECOMMENDED for registration with 376 ISO 639, and proposals rejected by ISO 639/RA-JAC will be closely 377 scrutinized before they are registered with IANA. 379 6. The single-character subtag 'x' as the primary subtag indicates 380 that the language tag consists solely of subtags whose meaning is 381 defined by private agreement. For example, in the tag "x-fr-CH", 382 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 383 French language or the country of Switzerland (or any other value 384 in the IANA registry) unless there is a private agreement in 385 place to do so. See Section 4.5. 387 7. The single-character subtag 'i' is used by some grandfathered 388 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 389 grandfathered tags have a primary language subtag in their first 390 position.) 392 8. Other values MUST NOT be assigned to the primary subtag except by 393 revision or update of this document. 395 Note: For languages that have both an ISO 639-1 two-character code 396 and a three character code assigned by either ISO 639-2 or ISO 639-3, 397 only the ISO 639-1 two-character code is defined in the IANA 398 registry. 400 Note: For languages that have no ISO 639-1 two-character code and for 401 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 402 (Bibliographic) codes differ, only the Terminology code is defined in 403 the IANA registry. At the time this document was created, all 404 languages that had both kinds of three-character code were also 405 assigned a two-character code; it is expected that future assignments 406 of this nature will not occur. 408 Note: To avoid problems with versioning and subtag choice as 409 experienced during the transition between RFC 1766 and RFC 3066, as 410 well as the canonical nature of subtags defined by this document, the 411 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 412 RA-JAC) has included the following statement in [iso639.prin]: 414 "A language code already in ISO 639-2 at the point of freezing ISO 415 639-1 shall not later be added to ISO 639-1. This is to ensure 416 consistency in usage over time, since users are directed in 417 Internet applications to employ the alpha-3 code when an alpha-2 418 code for that language is not available." 420 In order to avoid instability in the canonical form of tags, if a 421 two-character code is added to ISO 639-1 for a language for which a 422 three-character code was already included in either ISO 639-2 or ISO 423 639-3, the two-character code MUST NOT be registered. See 424 Section 3.4. 426 For example, if some content were tagged with 'haw' (Hawaiian), which 427 currently has no two-character code, the tag would not be invalidated 428 if ISO 639-1 were to assign a two-character code to the Hawaiian 429 language at a later date. 431 Note: An example of independent primary language subtag registration 432 might include: one of the grandfathered IANA registrations is 433 "i-enochian". The subtag 'enochian' could be registered in the IANA 434 registry as a primary language subtag (assuming that ISO 639 does not 435 register this language first), making tags such as "enochian-AQ" and 436 "enochian-Latn" valid. 438 2.2.2. Extended Language Subtags 440 [RFC4646] contained an additional type of subtag called the 'extended 441 language subtag' to allow for certain kinds of compatibility mappings 442 which ultimately were not used. These subtags were reserved for 443 future use and ultimately removed from the ABNF. They MUST NOT be 444 registered or used to form language tags. See also Section 2.2.9 for 445 a discussion of the consequences of removing the 'extlang' production 446 from grammar. 448 Note: a few grandfathered tags (Section 2.2.8) matched the 'extlang' 449 production in RFC 4646, and thus were not considered 'irregular'. 450 These tags are still valid and were added to the 'irregular' 451 production in the ABNF. 453 2.2.3. Script Subtag 455 Script subtags are used to indicate the script or writing system 456 variations that distinguish the written forms of a language or its 457 dialects. The following rules apply to the script subtags: 459 1. Script subtags MUST follow the primary language subtag and MUST 460 precede any other type of subtag. 462 2. All four-character subtags were defined according to 463 [ISO15924]--"Codes for the representation of the names of 464 scripts": alpha-4 script codes, or subsequently assigned by the 465 ISO 15924 maintenance agency or governing standardization bodies, 466 denoting the script or writing system used in conjunction with 467 this language. 469 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 470 use in language tags. These subtags correspond to codes reserved 471 by ISO 15924 for private use. These codes MAY be used for non- 472 registered script values. Please refer to Section 4.5 for more 473 information on private use subtags. 475 4. Script subtags MUST NOT be registered using the process in 476 Section 3.5 of this document. Variant subtags MAY be considered 477 for registration for that purpose. 479 5. There MUST be at most one script subtag in a language tag, and 480 the script subtag SHOULD be omitted when it adds no 481 distinguishing value to the tag or when the primary language 482 subtag's record includes a Suppress-Script field listing the 483 applicable script subtag. 485 Example: "sr-Latn" represents Serbian written using the Latin script. 487 2.2.4. Region Subtag 489 Region subtags are used to indicate linguistic variations associated 490 with or appropriate to a specific country, territory, or region. 491 Typically, a region subtag is used to indicate regional dialects or 492 usage, or region-specific spelling conventions. A region subtag can 493 also be used to indicate that content is expressed in a way that is 494 appropriate for use throughout a region, for instance, Spanish 495 content tailored to be useful throughout Latin America. 497 The following rules apply to the region subtags: 499 1. Region subtags MUST follow any language or script subtags and 500 MUST precede any other type of subtag. 502 2. All two-character subtags following the primary subtag were 503 defined in the IANA registry according to the assignments found 504 in [ISO3166-1] ("Codes for the representation of names of 505 countries and their subdivisions -- Part 1: Country codes") using 506 the list of alpha-2 country codes, or using assignments 507 subsequently made by the ISO 3166 maintenance agency or governing 508 standardization bodies. In addition, the codes that are 509 "exceptionally reserved" (as opposed to "assigned") in ISO 3166-1 510 were also defined in the registry, with the exception of 'UK', 511 which is an exact synonym for the assigned code 'GB'. 513 3. All three-character subtags consisting of digit (numeric) 514 characters following the primary subtag were defined in the IANA 515 registry according to the assignments found in UN Standard 516 Country or Area Codes for Statistical Use [UN_M.49] or 517 assignments subsequently made by the governing standards body. 518 Note that not all of the UN M.49 codes are defined in the IANA 519 registry. The following rules define which codes are entered 520 into the registry as valid subtags: 522 A. UN numeric codes assigned to 'macro-geographical 523 (continental)' or sub-regions MUST be registered in the 524 registry. These codes are not associated with an assigned 525 ISO 3166 alpha-2 code and represent supra-national areas, 526 usually covering more than one nation, state, province, or 527 territory. 529 B. UN numeric codes for 'economic groupings' or 'other 530 groupings' MUST NOT be registered in the IANA registry and 531 MUST NOT be used to form language tags. 533 C. UN numeric codes for countries or areas which are assigned 534 ISO 3166 alpha2 codes already present in the registry, MUST 535 be defined according to the rules in Section 3.4 and MUST be 536 used to form language tags that represent the country or 537 region for which they are defined. This happens when ISO 538 3166 reassigns a code already included in the registry and 539 formerly used for one country to another. 541 D. UN numeric codes for countries or areas for which there is an 542 associated ISO 3166 alpha-2 code in the registry MUST NOT be 543 entered into the registry and MUST NOT be used to form 544 language tags. Note that the ISO 3166-based subtag in the 545 registry MUST actually be associated with the UN M.49 code in 546 question. 548 E. UN numeric codes and ISO 3166 alpha-2 codes for countries or 549 areas listed as eligible for registration in [RFC4645] but 550 not presently registered MAY be entered into the IANA 551 registry via the process described in Section 3.5. Once 552 registered, these codes MAY be used to form language tags. 554 F. All other UN numeric codes for countries or areas that do not 555 have an associated ISO 3166 alpha-2 code MUST NOT be entered 556 into the registry and MUST NOT be used to form language tags. 557 For more information about these codes, see Section 3.4. 559 4. Note: The alphanumeric codes in Appendix X of the UN document 560 MUST NOT be entered into the registry and MUST NOT be used to 561 form language tags. (At the time this document was created, 562 these values matched the ISO 3166 alpha-2 codes.) 564 5. There MUST be at most one region subtag in a language tag and the 565 region subtag MAY be omitted, as when it adds no distinguishing 566 value to the tag. 568 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 569 reserved for private use in language tags. These subtags 570 correspond to codes reserved by ISO 3166 for private use. These 571 codes MAY be used for private use region subtags (instead of 572 using a private use subtag sequence). Please refer to 573 Section 4.5 for more information on private use subtags. 575 "de-CH" represents German ('de') as used in Switzerland ('CH'). 577 "sr-Latn-RS" represents Serbian ('sr') written using Latin script 578 ('Latn') as used in Serbia ('RS'). 580 "es-419" represents Spanish ('es') appropriate to the UN-defined 581 Latin America and Caribbean region ('419'). 583 2.2.5. Variant Subtags 585 Variant subtags are used to indicate additional, well-recognized 586 variations that define a language or its dialects that are not 587 covered by other available subtags. The following rules apply to the 588 variant subtags: 590 1. Variant subtags MUST follow any language, script, or region 591 subtags, but MUST precede any extension or private use subtag 592 sequences. 594 2. Variant subtags, as a collection, are not associated with any 595 particular external standard. The meaning of variant subtags in 596 the registry is defined in the course of the registration process 597 defined in Section 3.5. Note that any particular variant subtag 598 might be associated with some external standard. However, 599 association with a standard is not required for registration. 601 3. More than one variant MAY be used to form the language tag. 603 4. Variant subtags MUST be registered with IANA according to the 604 rules in Section 3.5 of this document before being used to form 605 language tags. In order to distinguish variants from other types 606 of subtags, registrations MUST meet the following length and 607 content restrictions: 609 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 610 at least five characters long. 612 2. Variant subtags that begin with a digit (0-9) MUST be at 613 least four characters long. 615 Variant subtag records in the language subtag registry MAY include 616 one or more 'Prefix' fields. The 'Prefix' indicates the language tag 617 or tags that would make a suitable prefix (with other subtags, as 618 appropriate) in forming a language tag with the variant. That is, 619 each of the subtags in the prefix SHOULD appear, in order, before the 620 variant. For example, the subtag 'nedis' has a Prefix of "sl", 621 making it suitable for forming language tags such as "sl-nedis" and 622 "sl-IT-nedis", but not suitable for use in a tag such as "zh-nedis" 623 or "it-IT-nedis". 625 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 627 "de-CH-1996" represents German as used in Switzerland and as written 628 using the spelling reform beginning in the year 1996 C.E. 630 Most variants that share a prefix are mutually exclusive. For 631 example, the German orthographic variations '1996' and '1901' SHOULD 632 NOT be used in the same tag, as they represent the dates of different 633 spelling reforms. A variant that can meaningfully be used in 634 combination with another variant SHOULD include a 'Prefix' field in 635 its registry record that lists that other variant. For example, if 636 another German variant 'example' were created that made sense to use 637 with '1996', then 'example' should include two Prefix fields: "de" 638 and "de-1996". 640 2.2.6. Extension Subtags 642 Extensions provide a mechanism for extending language tags for use in 643 various applications. They are intended to identify information 644 which is commonly used in association with languages or language 645 tags, but which is not part of language identification. See 646 Section 3.7. The following rules apply to extensions: 648 1. An extension MUST follow at least a primary language subtag. 649 That is, a language tag cannot begin with an extension. 650 Extensions extend language tags, they do not override or replace 651 them. For example, "a-value" is not a well-formed language tag, 652 while "de-a-value" is. 654 2. Extension subtags are separated from the other subtags defined 655 in this document by a single-character subtag ("singleton"). 656 The singleton MUST be one allocated to a registration authority 657 via the mechanism described in Section 3.7 and MUST NOT be the 658 letter 'x', which is reserved for private use subtag sequences. 660 3. Note: Private use subtag sequences starting with the singleton 661 subtag 'x' are described in Section 2.2.7 below. 663 4. Each singleton subtag MUST appear at most one time in each tag 664 (other than as a private use subtag). That is, singleton 665 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 666 ccc" is invalid because the subtag 'a' appears twice. Note that 667 the tag "en-a-bbb-x-a-ccc" is valid because the second 668 appearance of the singleton 'a' is in a private use sequence. 670 5. Extension subtags MUST meet all of the requirements for the 671 content and format of subtags defined in this document. 673 6. Extension subtags MUST meet whatever requirements are set by the 674 document that defines their singleton prefix and whatever 675 requirements are provided by the maintaining authority. 677 7. Each extension subtag MUST be from two to eight characters long 678 and consist solely of letters or digits, with each subtag 679 separated by a single '-'. 681 8. Each singleton MUST be followed by at least one extension 682 subtag. For example, the tag "tlh-a-b-foo" is invalid because 683 the first singleton 'a' is followed immediately by another 684 singleton 'b'. 686 9. Extension subtags MUST follow all language, script, region, and 687 variant subtags in a tag. 689 10. All subtags following the singleton and before another singleton 690 are part of the extension. Example: In the tag "fr-a-Latn", the 691 subtag 'Latn' does not represent the script subtag 'Latn' 692 defined in the IANA Language Subtag Registry. Its meaning is 693 defined by the extension 'a'. 695 11. In the event that more than one extension appears in a single 696 tag, the tag SHOULD be canonicalized as described in 697 Section 4.4. 699 For example, if the prefix singleton 'r' and the shown subtags were 700 defined, then the following tag would be a valid example: "en-Latn- 701 GB-boont-r-extended-sequence-x-private" 703 2.2.7. Private Use Subtags 705 Private use subtags are used to indicate distinctions in language 706 important in a given context by private agreement. The following 707 rules apply to private use subtags: 709 1. Private use subtags are separated from the other subtags defined 710 in this document by the reserved single-character subtag 'x'. 712 2. Private use subtags MUST conform to the format and content 713 constraints defined in the ABNF for all subtags. 715 3. Private use subtags MUST follow all language, script, region, 716 variant, and extension subtags in the tag. Another way of saying 717 this is that all subtags following the singleton 'x' MUST be 718 considered private use. Example: The subtag 'US' in the tag "en- 719 x-US" is a private use subtag. 721 4. A tag MAY consist entirely of private use subtags. 723 5. No source is defined for private use subtags. Use of private use 724 subtags is by private agreement only. 726 6. Private use subtags are NOT RECOMMENDED where alternatives exist 727 or for general interchange. See Section 4.5 for more information 728 on private use subtag choice. 730 For example: The Unicode Consortium defines a set of private use 731 extensions in LDML ([UTS35], Locale Data Markup Language, a standard 732 for defining locale data). One LDML-defined use of private use 733 subtags might be the tag "en-US-x-ldml-POSIX-k-calendar-islamic-k- 734 colStren-secondar", which, in addition to indicating that the 735 language "en-US" is being used, indicates locale-related variations, 736 such as that the (non-standard) POSIX variant is being used; that 737 formatting of dates might use the Islamic calendar; and that case is 738 being ignored in sorted lists. 740 2.2.8. Grandfathered Registrations 742 Prior to RFC 4646, whole language tags were registered according to 743 the rules in RFC 1766 and/or RFC 3066. These registered tags 744 maintain their validity. Of those tags, those that were made 745 obsolete or redundant by the advent of RFC 4646, by this document, or 746 by subsequent registration of subtags are maintained in the registry 747 in records as "redundant" records. Those tags that do not match the 748 'langtag' production in the ABNF in this document or that contain 749 subtags that do not individually appear in the registry are 750 maintained in the registry in records of the "grandfathered" type. 752 Grandfathered tags contain one or more subtags that are not defined 753 in the Language Subtag Registry (see Section 3). Redundant tags 754 consist entirely of subtags defined above and whose independent 755 registration was superseded by [RFC4646]. For more information see 756 Section 3.8. 758 Some grandfathered tags are "regular" in that they match the 759 'langtag' production in Figure 1. In some cases, these tags could 760 become redundant if their (currently unregistered) subtags were to be 761 registered (as variants, for example). In other cases, although the 762 subtags match the language tag pattern, the meaning assigned to the 763 various subtags is prohibited by rules elsewhere in this document. 764 Those tags can never become redundant. 766 The remaining grandfathered tags are "irregular" and do not match the 767 'langtag' production. These are listed in the 'irregular' production 768 in Figure 1. These grandfathered tags can never become redundant. 769 Many of these tags have been superseded by other registrations: their 770 record contains a Preferred-Value field that really ought to be used 771 to form language tags representing that value. 773 2.2.9. Classes of Conformance 775 Implementations sometimes need to describe their capabilities with 776 regard to the rules and practices described in this document. Tags 777 can be checked or verified in a number of ways, but two particular 778 classes of tag conformance are formally defined here. 780 A tag is considered "well-formed" if it conforms to the ABNF 781 (Section 2.1). Note that irregular grandfathered tags are now listed 782 in the 'irregular' production. 784 A tag is considered "valid" if it well-formed and it also satisfies 785 these conditions: 787 o The tag is either a grandfathered tag, or all of its language, 788 script, region, and variant subtags appear in the IANA language 789 subtag registry as of the particular registry date. 791 o There are no duplicate singleton (extension) subtags and no 792 duplicate variant subtags. 794 o For each subtag that has a 'Prefix' field in the registry, the 795 Prefix matches the language tag using Extended Filtering 796 [RFC4647]. That is, each subtag in the Prefix is present in the 797 tag and in the same order. Furthermore, all of the Prefix's 798 subtags MUST appear before the subtag. For example, the Prefix 799 "zh-TW" matches the tag "zh-Hant-TW". 801 Note that a tag's validity depends on the date of the registry used 802 to validate the tag. A more recent copy of the registry might 803 contain a subtag that an older version does not. 805 A tag is considered "valid" for a given extension (Section 3.7) (as 806 of a particular version, revision, and date) if it meets the criteria 807 for "valid" above and also satisfies this condition: 809 Each subtag used in the extension part of the tag is valid 810 according to the extension. 812 Some older implementations consider a tag "well-formed" if it matches 813 the ABNF in [RFC4646]. In that version, a well-formed tag could 814 contain a sequence matching the obsolete 'extlang' production. Other 815 than a few grandfathered tags (which are handled separately), no 816 valid tags have ever matched that pattern. The difference between 817 that ABNF and Figure 1 is that the language production is replaced as 818 follows: 820 obs-language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code 821 / 4ALPHA ; reserved for future use 822 / 5*8ALPHA ; registered language subtag 824 extlang = *3("-" 3ALPHA) ; removed in this version 826 Figure 2: Obsolete Language ABNF 828 Older language tag implementations sometimes reference [RFC3066]. 829 Again, all valid tags under that version also match this document's 830 language tag ABNF. However, a wider array of tags could be 831 considered "well-formed" under that document. The grammar used in 832 that document was: 834 Language-Tag = Primary-subtag *( "-" Subtag ) 836 Primary-subtag = 1*8ALPHA 838 Subtag = 1*8(ALPHA / DIGIT) 840 Figure 3: RFC 3066 Language Tag Syntax 842 Language tags may be well-formed in terms of syntax but not valid in 843 terms of content. Users MUST NOT assign and use their own subtags, 844 other than private-use sequences (such as "en-x-personal") or by 845 using subtags designated as private-use in the registry (such as 846 "no-QQ", where 'QQ' is one of a range of private-use ISO 3166 codes). 847 Otherwise they risk finding later that their previously unassigned 848 subtag was assigned a meaning that conflicts with their chosen usage. 850 3. Registry Format and Maintenance 852 This section defines the Language Subtag Registry and the maintenance 853 and update procedures associated with it, as well as a registry for 854 extensions to language tags (Section 3.7). 856 The Language Subtag Registry contains a comprehensive list of all of 857 the subtags valid in language tags. This allows implementers a 858 straightforward and reliable way to validate language tags. The 859 Language Subtag Registry will be maintained so that, except for 860 extension subtags, it is possible to validate all of the subtags that 861 appear in a language tag under the provisions of this document or its 862 revisions or successors. In addition, the meaning of the various 863 subtags will be unambiguous and stable over time. (The meaning of 864 private use subtags, of course, is not defined by the IANA registry.) 866 3.1. Format of the IANA Language Subtag Registry 868 The IANA Language Subtag Registry ("the registry") is a machine- 869 readable file in the format described in this section, plus copies of 870 the registration forms approved in accordance with the process 871 described in Section 3.5. The existing registration forms for 872 grandfathered and redundant tags taken from RFC 3066 will be 873 maintained as part of the obsolete RFC 3066 registry. The remaining 874 set of subtags created by either [RFC4645] or [registry-update] will 875 not have registration forms created for them. 877 3.1.1. File Format 879 The registry consists of a series of records stored in the record-jar 880 format (described in [record-jar]). Each record, in turn, consists 881 of a series of fields that describe the various subtags and tags. 882 The registry is a Unicode [Unicode] text file, using the UTF-8 883 [RFC3629] character encoding. 885 Each field can be considered a single, logical line of Unicode 886 [Unicode] characters, comprising a field-name and a field-body 887 separated by a COLON character (%x3A). Each field is terminated by 888 the newline sequence CRLF. The text in each field MUST be in Unicode 889 Normalization Form C (NFC). 891 A collection of fields forms a 'record'. Records are separated by 892 lines containing only the sequence "%%" (%x25.25). 894 Although fields are logically a single line of text, each line of 895 text in the file format is limited to 72 bytes in length. To 896 accommodate this, the field-body can be split into a multiple-line 897 representation; this is called "folding". Folding is done according 898 to customary conventions for line-wrapping. This is typically on 899 whitespace boundaries, but can occur between other characters when 900 the value does not include spaces, such as when a language does not 901 use whitespace between words. In any event, there MUST NOT be breaks 902 inside a multibyte UTF-8 sequence nor in the middle of a combining 903 character sequence. For more information, see [UAX14]. 905 Although the file format uses the UTF-8 encoding, unless otherwise 906 indicated, fields are restricted to the printable characters from the 907 US-ASCII [ISO646] repertoire. 909 The format of the registry is described by the following ABNF (per 910 [RFC5234]): 912 registry = record *("%%" CRLF record) 913 record = 1*( field-name *SP ":" *SP field-body CRLF ) 914 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] 915 field-body = *([[*SP CRLF] 1*SP] 1*CHARS) 916 CHARS = (%x21-10FFFF) ; Unicode code points 918 Figure 4: Registry Format ABNF 920 The sequence '..' (%x2E.2E) in a field-body denotes a range of 921 values. Such a range represents all subtags of the same length that 922 are in alphabetic or numeric order within that range, including the 923 values explicitly mentioned. For example 'a..c' denotes the values 924 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 925 '13'. 927 All fields whose field-body contains a date value use the "full-date" 928 format specified in [RFC3339]. For example: "2004-06-28" represents 929 June 28, 2004, in the Gregorian calendar. 931 3.1.2. Record Definitions 933 There are three types of records in the registry: "File-Date", 934 "Subtag", and "Tag" records. 936 The first record in the registry is a "File-Date" record. This 937 record contains the single field whose field-name is "File-Date" (see 938 Figure 4). The field-body of this record contains the last 939 modification date of this copy of the registry, making it possible to 940 compare different versions of the registry. The registry on the IANA 941 website is the most current. Versions with an older date than that 942 one are not up-to-date. 944 File-Date: 2004-06-28 945 %% 947 Figure 5: Example of the File-Date Record 949 Subsequent records represent either subtags or tags in the registry. 950 "Subtag" records contain a field with a field-name of "Subtag", 951 while, unsurprisingly, "Tag" records contain a field with a field- 952 name of "Tag". Each of the fields in each record MUST occur no more 953 than once, unless otherwise noted below. Each record MUST contain 954 the following fields: 956 o 'Type' 958 * Type's field-body MUST consist of one of the following strings: 959 "language", "script", "region", "variant", "grandfathered", and 960 "redundant" and denotes the type of tag or subtag. 962 o Either 'Subtag' or 'Tag' 964 * Subtag's field-body contains the subtag being defined. This 965 field MUST only appear in records of whose 'Type' has one of 966 these values: "language", "script", "region", or "variant". 968 * Tag's field-body contains a complete language tag. This field 969 MUST only appear in records whose 'Type' has one of these 970 values: "grandfathered" or "redundant". Note that the field- 971 body will always follow the 'grandfathered' production in the 972 ABNF in Section 2.1 974 o Description 976 * Description's field-body contains a non-normative description 977 of the subtag or tag. 979 o Added 981 * Added's field-body contains the date the record was added to 982 the registry. 984 Each record MAY also contain the following fields: 986 o Preferred-Value 988 * For fields of type 'script', 'region', and 'variant', 989 'Preferred-Value' contains the subtag of the same 'Type' that 990 is preferred for forming the language tag. 992 * For fields of type 'language', 'Preferred-Value' contains the 993 primary language subtag that is preferred when forming the 994 language tag. 996 * For fields of type 'grandfathered' and 'redundant', 'Preferred- 997 Value' contains a canonical mapping to a complete language tag. 999 o Deprecated 1001 * The field-body of the Deprecated field contains the date the 1002 record was deprecated. 1004 o Prefix 1006 * Prefix's field-body contains a language tag with which this 1007 subtag MAY be used to form a new language tag, perhaps with 1008 other subtags as well. The Prefix's subtags appear before the 1009 subtag. This field MUST only appear in records whose 'Type' 1010 field-body is 'variant'. For example, the 'Prefix' for the 1011 variant 'nedis' is 'sl', meaning that the tags "sl-nedis" and 1012 "sl-IT-nedis" are appropriate while the tag "is-nedis" is not. 1014 o Comments 1016 * Comments contains additional information about the subtag, as 1017 deemed appropriate for understanding the registry and 1018 implementing language tags using the subtag or tag. 1020 o Suppress-Script 1022 * Suppress-Script contains a script subtag that SHOULD NOT be 1023 used to form language tags with the associated primary language 1024 subtag. This field MUST only appear in records whose 'Type' 1025 field-body is 'language'. See Section 4.1. 1027 o Macrolanguage 1029 * Macrolanguage contains a primary language subtag defined by ISO 1030 639 as a "macrolanguage" that encompasses this language subtag. 1031 This field MUST only appear in records whose 'Type' field-body 1032 is 'language'. 1034 Future versions of this document might add additional fields to the 1035 registry, so implementations SHOULD ignore fields found in the 1036 registry that are not defined in this document. 1038 3.1.3. Subtag and Tag Fields 1040 The 'Subtag' field MUST NOT use uppercase letters to form the subtag, 1041 with two exceptions. Subtags whose 'Type' field is 'script' (in 1042 other words, subtags defined by ISO 15924) MUST use titlecase. 1043 Subtags whose 'Type' field is 'region' (in other words, the non- 1044 numeric region subtags defined by ISO 3166) MUST use all uppercase. 1045 These exceptions mirror the use of case in the underlying standards. 1047 Each subtag in the tags contained in a 'Tag' field MUST be formatted 1048 using the rules in the preceding paragraph. That is, all subtags are 1049 lowercase except for subtags that represent script or region codes. 1051 3.1.4. Description Field 1053 The field 'Description' contains a description of the tag or subtag 1054 in the record. The 'Description' field MAY appear more than once per 1055 record, that is, there can be multiple descriptions for a given 1056 record. The 'Description' field MAY include the full range of 1057 Unicode characters. At least one of the 'Description' fields MUST be 1058 written or transcribed into the Latin script; additional 1059 'Description' fields MAY also include a description in a non-Latin 1060 script. Each 'Description' field MUST be unique, both within the 1061 record in which it appears and for the collection of records of the 1062 same type. Moreover, formatting variations of the same description 1063 MUST NOT occur in that specific record or in any other record of the 1064 same type. For example, while the ISO 639-1 code 'fy' contains both 1065 the descriptions "Western Frisian" and "Frisian, Western", only one 1066 of these descriptions appears in the registry. 1068 The 'Description' field is used for identification purposes. It 1069 doesn't necessarily represent the actual native name of the item in 1070 the record, nor are any of the descriptions guaranteed to be in any 1071 particular language (such as English or French, for example). 1073 For subtags taken from a source standard (such as ISO 639 or ISO 1074 3166), the 'Description' value(s) SHOULD also be taken from the 1075 source standard. Multiple descriptions in the source standard MUST 1076 be split into separate 'Description' fields. The source standard's 1077 descriptions MAY be edited, either prior to insertion or via the 1078 registration process. For fields of type 'language', the first 1079 'Description' field appearing in the Registry corresponds to the 1080 Reference Name assigned by ISO 639-3. This helps facilitate cross- 1081 referencing between ISO 639 and the registry. 1083 When creating or updating a record due to the action of one of the 1084 source standards, the Language Subtag Reviewer SHOULD remove 1085 duplicate or redundant descriptions and MAY edit descriptions to 1086 correct irregularities in formatting (such as misspellings, 1087 inappropriate apostrophes or other punctuation, or excessive or 1088 missing spaces) prior to submitting the proposed record to the ietf- 1089 languages list. 1091 Note: Descriptions in registry entries that correspond to ISO 639, 1092 ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate 1093 the meaning of that identifier as defined in the source standard at 1094 the time it was added to the registry. The description does not 1095 replace the content of the source standard itself. The descriptions 1096 are not intended to be the localized English names for the subtags. 1097 Localization or translation of language tag and subtag descriptions 1098 is out of scope of this document. 1100 3.1.5. Deprecated Field 1102 The field 'Deprecated' MAY be added to any record via the maintenance 1103 process described in Section 3.3 or via the registration process 1104 described in Section 3.5. Usually, the addition of a 'Deprecated' 1105 field is due to the action of one of the standards bodies, such as 1106 ISO 3166, withdrawing a code. In some historical cases, it might not 1107 have been possible to reconstruct the original deprecation date. For 1108 these cases, an approximate date appears in the registry. Although 1109 valid in language tags, subtags and tags with a 'Deprecated' field 1110 are deprecated and validating processors SHOULD NOT generate these 1111 subtags. Note that a record that contains a 'Deprecated' field and 1112 no corresponding 'Preferred-Value' field has no replacement mapping. 1114 3.1.6. Preferred-Value Field 1116 The field 'Preferred-Value' contains a mapping between the record in 1117 which it appears and another tag or subtag. The value in this field 1118 is strongly RECOMMENDED as the best choice to represent the value of 1119 this record when selecting a language tag. These values form three 1120 groups: 1122 1. ISO 639 language codes that were later withdrawn in favor of 1123 other codes. These values are mostly a historical curiosity. 1125 2. ISO 3166 region codes that have been withdrawn in favor of a new 1126 code. This sometimes happens when a country changes its name or 1127 administration in such a way that warrants a new region code. 1129 3. Grandfathered or redundant tags from RFC 3066. In many cases, 1130 these tags have become obsolete because the values they represent 1131 were later encoded by ISO 639. 1133 Records that contain a 'Preferred-Value' field MUST also have a 1134 'Deprecated' field. This field contains a date of deprecation. 1135 Thus, a language tag processor can use the registry to construct the 1136 valid, non-deprecated set of subtags for a given date. In addition, 1137 for any given tag, a processor can construct the set of valid 1138 language tags that correspond to that tag for all dates up to the 1139 date of the registry. The ability to do these mappings MAY be 1140 beneficial to applications that are matching, selecting, for 1141 filtering content based on its language tags. 1143 Note that 'Preferred-Value' mappings in records of type 'region' 1144 sometimes do not represent exactly the same meaning as the original 1145 value. There are many reasons for a country code to be changed, and 1146 the effect this has on the formation of language tags will depend on 1147 the nature of the change in question. 1149 In particular, the 'Preferred-Value' field does not imply retagging 1150 content that uses the affected subtag. 1152 The field 'Preferred-Value' MUST NOT be modified once created in the 1153 registry. The field MAY be added to records according to the rules 1154 in Section 3.3. 1156 The 'Preferred-Value' field in records of type "grandfathered" and 1157 "redundant" contains whole language tags that are strongly 1158 RECOMMENDED for use in place of the record's value. In many cases, 1159 the mappings were created by deprecation of the tags during the 1160 period before this document was adopted. For example, the tag "no- 1161 nyn" was deprecated in favor of the ISO 639-1-defined language code 1162 'nn'. 1164 3.1.7. Prefix Field 1166 The 'Prefix' field contains an extended language range whose subtags 1167 are appropriate to use with this subtag: each of the subtags in one 1168 of the subtag's Prefix fields MUST appear before the variant in a 1169 valid tag. For example, the variant subtag '1996' has a 'Prefix' 1170 field of "de". This means that tags starting with the sequence "de-" 1171 are appropriate with this subtag, so "de-Latg-1996" and "de-CH-1996" 1172 are both acceptable, while the tag "fr-1996" is an inappropriate 1173 choice. 1175 The field of type 'Prefix' MUST NOT be removed from any record. The 1176 field-body for this type of field MAY be modified, but only if the 1177 modification broadens the meaning of the subtag. That is, the field- 1178 body can be replaced only by a prefix of itself. For example, the 1179 Prefix "be-Latn" (Belarusian, Latin script) could be replaced by the 1180 Prefix "be" (Belarusian) but not by the Prefix "ru-Latn" (Russian, 1181 Latin script). 1183 Records of type 'variant' MAY have more than one field of type 1184 'Prefix'. Additional fields of this type MAY be added to a 'variant' 1185 record via the registration process. 1187 The field-body of the 'Prefix' field MUST NOT conflict with any 1188 'Prefix' already registered for a given record. Such a conflict 1189 would occur when no valid tag could be constructed that would contain 1190 the prefix, such as when two subtags each have a 'Prefix' that 1191 contains the other subtag. For example, suppose that the subtag 1192 'avariant' has the prefix "es-bvariant". Then the subtag 'bvariant' 1193 cannot given the prefix 'avariant', for that would require a tag of 1194 the form "es-avariant-bvariant-avariant", which would not be valid. 1196 3.1.8. Suppress-Script Field 1198 The field 'Suppress-Script' contains a script subtag (whose record 1199 appears in the registry). The field 'Suppress-Script' MUST only 1200 appear in records whose 'Type' field-body is 'language'. This field 1201 MUST NOT appear more than one time in a record. This field indicates 1202 a script used to write the overwhelming majority of documents for the 1203 given language. This script code therefore adds no distinguishing 1204 information to a language tag. This helps ensure greater 1205 compatibility between the language tags generated according to the 1206 rules in this document and language tags and tag processors or 1207 consumers based on RFC 3066 by indicating that the script subtag 1208 SHOULD NOT be used for most documents in that language. For example, 1209 virtually all Icelandic documents are written in the Latin script, 1210 making the subtag 'Latn' redundant in the tag "is-Latn". 1212 Many language subtag records do not have a Suppress-Script field. 1213 The lack of a Suppress-Script might indicate that the language is 1214 customarily written in more than one script or that the language is 1215 not customarily written at all. It might also mean that sufficient 1216 information was not available when the record was created and thus 1217 remains a candidate for future registration. 1219 3.1.9. Macrolanguage Field 1221 The Macrolanguage field contains a primary language subtag that 1222 encompasses this subtag's language. That is, the language subtag 1223 whose record this field appears in is sometimes considered to be a 1224 sub-language of the Macrolanguage. Macrolanguage values are defined 1225 by ISO 639-3 and the exact nature of the relationship between the 1226 encompassed and encompassing languages varies on a case-by-case 1227 basis. 1229 This field can be useful to applications or users when selecting 1230 language tags or as additional metadata useful in matching. The 1231 Macrolanguage field can only occur in records of type 'language'. 1232 Only values assigned by ISO 639-3 will be considered for inclusion. 1233 Macrolanguage fields MAY be added or removed via the normal 1234 registration process whenever ISO 639-3 defines new values or 1235 withdraws old values. Macrolanguages are informational, and MAY be 1236 removed or changed if ISO 639-3 changes the values. 1238 For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn' 1239 (Norwegian Nynorsk) each have a Macrolanguage entry of 'no' 1240 (Norwegian). For more information see Section 4.1. 1242 3.1.10. Comments Field 1244 The field 'Comments' conveys additional information about the record 1245 and MAY appear more than once per record. The field-body MAY include 1246 the full range of Unicode characters and is not restricted to any 1247 particular script. This field MAY be inserted or changed via the 1248 registration process and no guarantee of stability is provided. The 1249 content of this field is not restricted, except by the need to 1250 register the information, the suitability of the request, and by 1251 reasonable practical size limitations. 1253 3.2. Language Subtag Reviewer 1255 The Language Subtag Reviewer moderates the ietf-languages mailing 1256 list, responds to requests for registration, and performs the other 1257 registry maintenance duties described in Section 3.3. Only the 1258 Language Subtag Reviewer is permitted to request IANA to change, 1259 update, or add records to the Language Subtag Registry. The Language 1260 Subtag Reviewer MAY delegate list moderation and other clerical 1261 duties as needed. 1263 The Language Subtag Reviewer is appointed by the IESG for an 1264 indefinite term, subject to removal or replacement at the IESG's 1265 discretion. The IESG will solicit nominees for the position (upon 1266 adoption of this document or upon a vacancy) and then solicit 1267 feedback on the nominees' qualifications. Qualified candidates 1268 should be familiar with BCP 47 and its requirements; be willing to 1269 fairly, responsively, and judiciously administer the registration 1270 process; and be suitably informed about the issues of language 1271 identification so that the reviewer can assess the claims and draw 1272 upon the contributions of language experts and subtag requesters. 1274 The subsequent performance or decisions of the Language Subtag 1275 Reviewer MAY be appealed to the IESG under the same rules as other 1276 IETF decisions (see [RFC2026]). The IESG can reverse or overturn the 1277 decisions of the Language Subtag Reviewer, provide guidance, or take 1278 other appropriate actions. 1280 3.3. Maintenance of the Registry 1282 Maintenance of the registry requires that as codes are assigned or 1283 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1284 Subtag Reviewer MUST evaluate each change and determine the 1285 appropriate course of action according to the rules in this document. 1286 Such updates follow the registration process described in 1287 Section 3.5. Usually the Language Subtag Reviewer will start the 1288 process for the new or updated record by filling in the registration 1289 form and submitting it. If a change to one of these standards takes 1290 place and the Language Subtag Reviewer does not do this in a timely 1291 manner, then any interested party MAY submit the form. Thereafter 1292 the registration process continues normally. 1294 The Language Subtag Reviewer MUST ensure that new subtags meet the 1295 requirements elsewhere in this document (and most especially in 1296 Section 3.4) or submit an appropriate registration form for an 1297 alternate subtag as described in that section. Each individual 1298 subtag affected by a change MUST be sent to the ietf-languages list 1299 with its own registration form and in a separate message. 1301 3.4. Stability of IANA Registry Entries 1303 The stability of entries and their meaning in the registry is 1304 critical to the long-term stability of language tags. The rules in 1305 this section guarantee that a specific language tag's meaning is 1306 stable over time and will not change. 1308 These rules specifically deal with how changes to codes (including 1309 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1310 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1311 Subtag Registry. Assignments to the IANA Language Subtag Registry 1312 MUST follow the following stability rules: 1314 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1315 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1316 guaranteed to be stable over time. 1318 2. Values in the 'Description' field MUST NOT be changed in a way 1319 that would invalidate previously-existing tags. They MAY be 1320 broadened somewhat in scope, changed to add information, or 1321 adapted to the most common modern usage. For example, countries 1322 occasionally change their names; a historical example of this 1323 would be "Upper Volta" changing to "Burkina Faso". 1325 3. Values in the field 'Prefix' MAY be added to records of type 1326 'variant' via the registration process. If a prefix is added to 1327 a variant record, 'Comment' fields SHOULD be used to explain 1328 different usages with the various prefixes. 1330 4. Values in the field 'Prefix' in records of type 'variant' MAY be 1331 modified, so long as the modifications broaden the set of 1332 prefixes. That is, a prefix MAY be replaced by one of its own 1333 prefixes. For example, the prefix "en-US" could be replaced by 1334 "en", but not by the prefixes "en-Latn", "fr", or "en-US-boont". 1335 If one of those prefixes were needed, a new Prefix SHOULD be 1336 registered. 1338 5. Values in the field 'Prefix' MUST NOT be removed. 1340 6. The field 'Comments' MAY be added, changed, modified, or removed 1341 via the registration process or any of the processes or 1342 considerations described in this section. 1344 7. The field 'Suppress-Script' MAY be added or removed via the 1345 registration process. 1347 8. The field 'Macrolanguage' MAY be added or removed via the 1348 registration process, but only in response to changes made by 1349 ISO 639. The Macrolanguage field appears whenever a language 1350 has a corresponding Macrolanguage in ISO 639. That is, the 1351 macrolanguage fields in the registry exactly match those of ISO 1352 639. No other macrolanguage mappings will be considered for 1353 registration. 1355 9. Codes assigned by ISO 639-1 that do not conflict with existing 1356 two-letter primary language subtags and which have no 1357 corresponding three-letter primary defined in the registry are 1358 entered into the IANA registry as new records of type 1359 'language'. 1361 10. Codes assigned by ISO 639-2 that do not conflict with existing 1362 three-letter primary language subtags are entered into the IANA 1363 registry as new records of type 'language'. 1365 11. Codes assigned by ISO 639-3 that do not conflict with existing 1366 three-letter primary language subtags are entered into the IANA 1367 registry as new primary language records. 1369 12. Codes assigned by ISO 15924 and ISO 3166 that do not conflict 1370 with existing subtags of the associated type and whose meaning 1371 is not the same as an existing subtag of the same type are 1372 entered into the IANA registry as new records. 1374 13. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1375 withdrawn by their respective maintenance or registration 1376 authority remain valid in language tags. A 'Deprecated' field 1377 containing the date of withdrawal MUST be added to the record. 1378 If a new record of the same type is added that represents a 1379 replacement value, then a 'Preferred-Value' field MAY also be 1380 added. The registration process MAY be used to add comments 1381 about the withdrawal of the code by the respective standard. 1383 Example The region code 'TL' was assigned to the country 1384 'Timor-Leste', replacing the code 'TP' (which was assigned to 1385 'East Timor' when it was under administration by Portugal). 1386 The subtag 'TP' remains valid in language tags, but its 1387 record contains the a 'Preferred-Value' of 'TL' and its field 1388 'Deprecated' contains the date the new code was assigned 1389 ('2004-07-06'). 1391 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1392 with existing subtags of the associated type, including subtags 1393 that are deprecated, MUST NOT be entered into the registry. The 1394 following additional considerations apply to subtag values that 1395 are reassigned: 1397 A. For ISO 639 codes, if the newly assigned code's meaning is 1398 not represented by a subtag in the IANA registry, the 1399 Language Subtag Reviewer, as described in Section 3.5, SHALL 1400 prepare a proposal for entering in the IANA registry as soon 1401 as practical a registered language subtag as an alternate 1402 value for the new code. The form of the registered language 1403 subtag will be at the discretion of the Language Subtag 1404 Reviewer and MUST conform to other restrictions on language 1405 subtags in this document. 1407 B. For all subtags whose meaning is derived from an external 1408 standard (that is, by ISO 639, ISO 15924, ISO 3166, or UN 1409 M.49), if a new meaning is assigned to an existing code and 1410 the new meaning broadens the meaning of that code, then the 1411 meaning for the associated subtag MAY be changed to match. 1412 The meaning of a subtag MUST NOT be narrowed, however, as 1413 this can result in an unknown proportion of the existing 1414 uses of a subtag becoming invalid. Note: ISO 639 1415 maintenance agency/registration authority (MA/RA) has 1416 adopted a similar stability policy. 1418 C. For ISO 15924 codes, if the newly assigned code's meaning is 1419 not represented by a subtag in the IANA registry, the 1420 Language Subtag Reviewer, as described in Section 3.5, SHALL 1421 prepare a proposal for entering in the IANA registry as soon 1422 as practical a registered variant subtag as an alternate 1423 value for the new code. The form of the registered variant 1424 subtag will be at the discretion of the Language Subtag 1425 Reviewer and MUST conform to other restrictions on variant 1426 subtags in this document. 1428 D. For ISO 3166 codes, if the newly assigned code's meaning is 1429 associated with the same UN M.49 code as another 'region' 1430 subtag, then the existing region subtag remains as the 1431 preferred value for that region and no new entry is created. 1432 A comment MAY be added to the existing region subtag 1433 indicating the relationship to the new ISO 3166 code. 1435 E. For ISO 3166 codes, if the newly assigned code's meaning is 1436 associated with a UN M.49 code that is not represented by an 1437 existing region subtag, then the Language Subtag Reviewer, 1438 as described in Section 3.5, SHALL prepare a proposal for 1439 entering the appropriate UN M.49 country code as an entry in 1440 the IANA registry. 1442 F. For ISO 3166 codes, if there is no associated UN numeric 1443 code, then the Language Subtag Reviewer SHALL petition the 1444 UN to create one. If there is no response from the UN 1445 within ninety days of the request being sent, the Language 1446 Subtag Reviewer SHALL prepare a proposal for entering in the 1447 IANA registry as soon as practical a registered variant 1448 subtag as an alternate value for the new code. The form of 1449 the registered variant subtag will be at the discretion of 1450 the Language Subtag Reviewer and MUST conform to other 1451 restrictions on variant subtags in this document. This 1452 situation is very unlikely to ever occur. 1454 15. UN M.49 has codes for both countries and areas (such as '276' 1455 for Germany) and geographical regions and sub-regions (such as 1456 '150' for Europe). UN M.49 country or area codes for which 1457 there is no corresponding ISO 3166 code SHOULD NOT be 1458 registered, except as a surrogate for an ISO 3166 code that is 1459 blocked from registration by an existing subtag. If such a code 1460 becomes necessary, then the registration authority for ISO 3166 1461 SHOULD first be petitioned to assign a code to the region. If 1462 the petition for a code assignment by ISO 3166 is refused or not 1463 acted on in a timely manner, the registration process described 1464 in Section 3.5 MAY then be used to register the corresponding UN 1465 M.49 code. This way, UN M.49 codes remain available as the 1466 value of last resort in cases where ISO 3166 reassigns a 1467 deprecated value in the registry. 1469 16. Stability provisions apply to grandfathered tags with this 1470 exception: should it become possible to compose one of the 1471 grandfathered tags from registered subtags, then the field 1472 'Type' in that record is changed from 'grandfathered' to 1473 'redundant'. Note that this will not affect language tags that 1474 match the grandfathered tag, since these tags will now match 1475 valid generative subtag sequences. For example, the variant 1476 subtag '1901' is registered, making the formerly-grandfathered 1477 tags such as "de-1901" and "de-AT-1901" redundant as a result. 1478 Of course, existing content or implementations that use these 1479 tags remain valid. 1481 Note: The redundant and grandfathered entries together are the 1482 complete list of tags registered under [RFC3066]. The redundant tags 1483 are those that can now be formed using the subtags defined in the 1484 registry together with the rules of Section 2.2. The grandfathered 1485 entries include those that can never be legal under those same 1486 provisions plus those tags that contain subtags not yet registered 1487 or, perhaps, inappropriate for registration. 1489 The set of redundant and grandfathered tags is permanent and stable: 1490 new entries in this section MUST NOT be added and existing entries 1491 MUST NOT be removed. Records of type 'grandfathered' MAY have their 1492 type converted to 'redundant'; see item 12 in Section 3.6 for more 1493 information. The decision-making process about which tags were 1494 initially grandfathered and which were made redundant is described in 1495 [RFC4645]. 1497 RFC 3066 tags that were deprecated prior to the adoption of [RFC4646] 1498 are part of the list of grandfathered tags, and their component 1499 subtags were not included as registered variants (although they 1500 remain eligible for registration). For example, the tag "art-lojban" 1501 was deprecated in favor of the language subtag 'jbo'. 1503 3.5. Registration Procedure for Subtags 1505 The procedure given here MUST be used by anyone who wants to use a 1506 subtag not currently in the IANA Language Subtag Registry. 1508 Only subtags of type 'language' and 'variant' will be considered for 1509 independent registration of new subtags. Subtags needed for 1510 stability and subtags necessary to keep the registry synchronized 1511 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1512 defined by this document also use this process, as described in 1513 Section 3.3. Stability provisions are described in Section 3.4. 1515 This procedure MAY also be used to register or alter the information 1516 for the 'Description', 'Comments', 'Deprecated', 'Prefix', or 1517 'Suppress-Script' fields in a subtag's record as described in 1518 Section 3.4. Changes to all other fields in the IANA registry are 1519 NOT permitted. 1521 Registering a new subtag or requesting modifications to an existing 1522 tag or subtag starts with the requester filling out the registration 1523 form reproduced below. Note that each response is not limited in 1524 size so that the request can adequately describe the registration. 1525 The fields in the "Record Requested" section SHOULD follow the 1526 requirements in Section 3.1. 1528 LANGUAGE SUBTAG REGISTRATION FORM 1529 1. Name of requester: 1530 2. E-mail address of requester: 1531 3. Record Requested: 1533 Type: 1534 Subtag: 1535 Description: 1536 Prefix: 1537 Preferred-Value: 1538 Deprecated: 1539 Suppress-Script: 1540 Macrolanguage: 1541 Comments: 1543 4. Intended meaning of the subtag: 1544 5. Reference to published description 1545 of the language (book or article): 1546 6. Any other relevant information: 1548 Figure 6: The Language Subtag Registration Form 1550 Examples of completed registration forms can be found in Appendix C 1551 or online at http://www.iana.org/assignments/lang-subtags-templates/. 1553 The subtag registration form MUST be sent to 1554 for a two-week review period before it can 1555 be submitted to IANA. If modifications are made to the request 1556 during the course of the registration process (such as corrections to 1557 meet the requirements in Section 3.1) the modified form MUST also be 1558 sent to at least one week prior to 1559 submission to IANA. 1561 Whenever an entry is created or modified in the registry, the 'File- 1562 Date' record at the start of the registry is updated to reflect the 1563 most recent modification date in the [RFC3339] "full-date" format. 1565 Before forwarding a new registration to IANA, the Language Subtag 1566 Reviewer MUST ensure that values in the 'Subtag' field match case 1567 according to the description in Section 3.1. 1569 The ietf-languages list is an open list and can be joined by sending 1570 a request to . The list can be 1571 hosted by IANA or by any third party at the request of IESG. 1573 Some fields in both the registration form as well as the registry 1574 record itself permit the use of non-ASCII characters. Registration 1575 requests SHOULD use the UTF-8 encoding for consistency and clarity. 1576 However, since some mail clients do not support this encoding, other 1577 encodings MAY be used for the registration request. The Language 1578 Subtag Reviewer is responsible for ensuring that the proper Unicode 1579 characters appear in both the archived request form and the registry 1580 record. In the case of a transcription or encoding error by IANA, 1581 the Language Subtag Reviewer will request that the registry be 1582 repaired, providing any necessary information to assist IANA. 1584 Variant subtags are usually registered for use with a particular 1585 range of language tags. For example, the subtag 'rozaj' is intended 1586 for use with language tags that start with the primary language 1587 subtag "sl", since Resian is a dialect of Slovenian. Thus, the 1588 subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" 1589 or "sl-IT-rozaj". This information is stored in the 'Prefix' field 1590 in the registry. Variant registration requests SHOULD include at 1591 least one 'Prefix' field in the registration form. 1593 The 'Prefix' field for a given registered subtag exists in the IANA 1594 registry as a guide to usage. Additional prefixes MAY be added by 1595 filing an additional registration form. In that form, the "Any other 1596 relevant information:" field MUST indicate that it is the addition of 1597 a prefix. 1599 Requests to add a prefix to a variant subtag that imply a different 1600 semantic meaning will probably be rejected. For example, a request 1601 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1602 nedis" represented some German dialect would be rejected. The 1603 'nedis' subtag represents a particular Slovenian dialect and the 1604 additional registration would change the semantic meaning assigned to 1605 the subtag. A separate subtag SHOULD be proposed instead. 1607 The 'Description' field MUST contain a description of the tag being 1608 registered written or transcribed into the Latin script; it MAY also 1609 include a description in a non-Latin script. The 'Description' field 1610 is used for identification purposes and doesn't necessarily represent 1611 the actual native name of the language or variation or to be in any 1612 particular language. 1614 While the 'Description' field itself is not guaranteed to be stable 1615 and errata corrections MAY be undertaken from time to time, attempts 1616 to provide translations or transcriptions of entries in the registry 1617 itself will probably be frowned upon by the community or rejected 1618 outright, as changes of this nature have an impact on the provisions 1619 in Section 3.4. 1621 When the two-week period has passed, the Language Subtag Reviewer 1622 MUST take one of the following actions: 1624 o Explicitly accept the request and forward the form containing the 1625 record to be inserted or modified to iana@iana.org according to 1626 the procedure described in Section 3.3. 1628 o Explicitly reject the request because of significant objections 1629 raised on the list or due to problems with constraints in this 1630 document (which MUST be explicitly cited). 1632 o Extend the review period by granting an additional two-week 1633 increment to permit further discussion. After each two-week 1634 increment, the Language Subtag Reviewer MUST indicate on the list 1635 whether the registration has been accepted, rejected, or extended. 1637 Note that the Language Subtag Reviewer MAY raise objections on the 1638 list if he or she so desires. The important thing is that the 1639 objection MUST be made publicly. 1641 Sometimes the request needs to be modified as a result of discussion 1642 during the review period or due to requirements in this document. 1643 The applicant, Language Subtag Reviewer, or others are free to submit 1644 a modified version of the completed registration form, which will be 1645 considered in lieu of the original request with the explicit approval 1646 of the applicant. Such changes do not restart the two-week 1647 discussion period, although an application containing the final 1648 record submitted to IANA MUST appear on the list at least one week 1649 prior to the Language Subtag Reviewer forwarding the record to IANA. 1650 The applicant is also free to modify a rejected application with 1651 additional information and submit it again; this starts a new two- 1652 week comment period. 1654 Registrations initiated due to the provisions of Section 3.3 or 1655 Section 3.4 SHALL NOT be rejected altogether (since they have to 1656 ultimately appear in the registry) and SHOULD be completed as quickly 1657 as possible. The review process allows list members to comment on 1658 the specific information in the form and the record it contains and 1659 thus help ensure that it is correct and consistent. The Language 1660 Subtag Reviewer MAY reject a specific version of the form, but MUST 1661 include in the rejection a suitable replacement, extending the review 1662 period as described above, until the form is in a format worthy of 1663 reviewer's approval. 1665 Decisions made by the Language Subtag Reviewer MAY be appealed to the 1666 IESG [RFC2028] under the same rules as other IETF decisions 1667 [RFC2026]. This includes a decision to extend the review period or 1668 the failure to announce a decision in a clear and timely manner. 1670 The approved records appear in the Language Subtag Registry. The 1671 approved registration forms are available online under 1672 http://www.iana.org/assignments/lang-subtags-templates/. 1674 Updates or changes to existing records follow the same procedure as 1675 new registrations. The Language Subtag Reviewer decides whether 1676 there is consensus to update the registration following the two week 1677 review period; normally, objections by the original registrant will 1678 carry extra weight in forming such a consensus. 1680 Registrations are permanent and stable. Once registered, subtags 1681 will not be removed from the registry and will remain a valid way in 1682 which to specify a specific language or variant. 1684 Note: The purpose of the "Reference to published description" section 1685 in the registration form is to aid in verifying whether a language is 1686 registered or what language or language variation a particular subtag 1687 refers to. In most cases, reference to an authoritative grammar or 1688 dictionary of that language will be useful; in cases where no such 1689 work exists, other well-known works describing that language or in 1690 that language MAY be appropriate. The Language Subtag Reviewer 1691 decides what constitutes "good enough" reference material. This 1692 requirement is not intended to exclude particular languages or 1693 dialects due to the size of the speaker population or lack of a 1694 standardized orthography. Minority languages will be considered 1695 equally on their own merits. 1697 3.6. Possibilities for Registration 1699 Possibilities for registration of subtags or information about 1700 subtags include: 1702 o Primary language subtags for languages not listed in ISO 639 that 1703 are not variants of any listed or registered language MAY be 1704 registered. At the time this document was created, there were no 1705 examples of this form of subtag. Before attempting to register a 1706 language subtag, there MUST be an attempt to register the language 1707 with ISO 639. Subtags MUST NOT be registered for languages 1708 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3, 1709 or that are under consideration by the ISO 639 registration 1710 authorities, or that have never been attempted for registration 1711 with those authorities. If ISO 639 has previously rejected a 1712 language for registration, it is reasonable to assume that there 1713 must be additional, very compelling evidence of need before it 1714 will be registered as a primary language subtag in the IANA 1715 registry (to the extent that it is very unlikely that any subtags 1716 will be registered of this type). 1718 o Dialect or other divisions or variations within a language, its 1719 orthography, writing system, regional or historical usage, 1720 transliteration or other transformation, or distinguishing 1721 variation MAY be registered as variant subtags. An example is the 1722 'rozaj' subtag (the Resian dialect of Slovenian). 1724 o The addition or maintenance of fields (generally of an 1725 informational nature) in Tag or Subtag records as described in 1726 Section 3.1 and subject to the stability provisions in 1727 Section 3.4. This includes descriptions, comments, deprecation 1728 and preferred values for obsolete or withdrawn codes, or the 1729 addition of script or macrolanguage information to primary 1730 language subtags. 1732 o The addition of records and related field value changes necessary 1733 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1734 UN M.49 as described in Section 3.4. 1736 Subtags proposed for registration that would cause all or part of a 1737 grandfathered tag to become redundant but whose meaning conflicts 1738 with or alters the meaning of the grandfathered tag MUST be rejected. 1740 This document leaves the decision on what subtags or changes to 1741 subtags are appropriate (or not) to the registration process 1742 described in Section 3.5. 1744 Note: four-character primary language subtags are reserved to allow 1745 for the possibility of alpha4 codes in some future addition to the 1746 ISO 639 family of standards. 1748 ISO 639 defines a maintenance agency for additions to and changes in 1749 the list of languages in ISO 639. This agency is: 1751 International Information Centre for Terminology (Infoterm) 1752 Aichholzgasse 6/12, AT-1120 1753 Wien, Austria 1754 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1756 ISO 639-2 defines a maintenance agency for additions to and changes 1757 in the list of languages in ISO 639-2. This agency is: 1759 Library of Congress 1760 Network Development and MARC Standards Office 1761 Washington, D.C. 20540 USA 1762 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1763 URL: http://www.loc.gov/standards/iso639-2 1765 ISO 639-3 defines a maintenance agency for additions to and changes 1766 in the list of languages in ISO 639-3. This agency is: 1768 SIL International 1769 ISO 639-3 Registrar 1770 7500 W. Camp Wisdom Rd. 1771 Dallas, TX 75236 USA 1772 Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546 1773 Email: iso639-3@sil.org 1774 URL: http://www.sil.org/iso639-3 1776 The maintenance agency for ISO 3166 (country codes) is: 1778 ISO 3166 Maintenance Agency 1779 c/o International Organization for Standardization 1780 Case postale 56 1781 CH-1211 Geneva 20 Switzerland 1782 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1783 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1785 The registration authority for ISO 15924 (script codes) is: 1787 Unicode Consortium Box 391476 1788 Mountain View, CA 94039-1476, USA 1789 URL: http://www.unicode.org/iso15924 1791 The Statistics Division of the United Nations Secretariat maintains 1792 the Standard Country or Area Codes for Statistical Use and can be 1793 reached at: 1795 Statistical Services Branch 1796 Statistics Division 1797 United Nations, Room DC2-1620 1798 New York, NY 10017, USA 1800 Fax: +1-212-963-0623 1801 E-mail: statistics@un.org 1802 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1804 3.7. Extensions and the Extensions Registry 1806 Extension subtags are those introduced by single-character subtags 1807 ("singletons") other than 'x'. They are reserved for the generation 1808 of identifiers that contain a language component and are compatible 1809 with applications that understand language tags. 1811 The structure and form of extensions are defined by this document so 1812 that implementations can be created that are forward compatible with 1813 applications that might be created using singletons in the future. 1814 In addition, defining a mechanism for maintaining singletons will 1815 lend stability to this document by reducing the likely need for 1816 future revisions or updates. 1818 Single-character subtags are assigned by IANA using the "IETF 1819 Consensus" policy defined by [RFC2434]. This policy requires the 1820 development of an RFC, which SHALL define the name, purpose, 1821 processes, and procedures for maintaining the subtags. The 1822 maintaining or registering authority, including name, contact email, 1823 discussion list email, and URL location of the registry, MUST be 1824 indicated clearly in the RFC. The RFC MUST specify or include each 1825 of the following: 1827 o The specification MUST reference the specific version or revision 1828 of this document that governs its creation and MUST reference this 1829 section of this document. 1831 o The specification and all subtags defined by the specification 1832 MUST follow the ABNF and other rules for the formation of tags and 1833 subtags as defined in this document. In particular, it MUST 1834 specify that case is not significant and that subtags MUST NOT 1835 exceed eight characters in length. 1837 o The specification MUST specify a canonical representation. 1839 o The specification of valid subtags MUST be available over the 1840 Internet and at no cost. 1842 o The specification MUST be in the public domain or available via a 1843 royalty-free license acceptable to the IETF and specified in the 1844 RFC. 1846 o The specification MUST be versioned, and each version of the 1847 specification MUST be numbered, dated, and stable. 1849 o The specification MUST be stable. That is, extension subtags, 1850 once defined by a specification, MUST NOT be retracted or change 1851 in meaning in any substantial way. 1853 o The specification MUST include in a separate section the 1854 registration form reproduced in this section (below) to be used in 1855 registering the extension upon publication as an RFC. 1857 o IANA MUST be informed of changes to the contact information and 1858 URL for the specification. 1860 IANA will maintain a registry of allocated single-character 1861 (singleton) subtags. This registry MUST use the record-jar format 1862 described by the ABNF in Section 3.1. Upon publication of an 1863 extension as an RFC, the maintaining authority defined in the RFC 1864 MUST forward this registration form to iesg@ietf.org, who MUST 1865 forward the request to iana@iana.org. The maintaining authority of 1866 the extension MUST maintain the accuracy of the record by sending an 1867 updated full copy of the record to iana@iana.org with the subject 1868 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only 1869 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY 1870 be modified in these updates. 1872 Failure to maintain this record, maintain the corresponding registry, 1873 or meet other conditions imposed by this section of this document MAY 1874 be appealed to the IESG [RFC2028] under the same rules as other IETF 1875 decisions (see [RFC2026]) and MAY result in the authority to maintain 1876 the extension being withdrawn or reassigned by the IESG. 1877 %% 1878 Identifier: 1879 Description: 1880 Comments: 1881 Added: 1882 RFC: 1883 Authority: 1884 Contact_Email: 1885 Mailing_List: 1886 URL: 1887 %% 1889 Figure 7: Format of Records in the Language Tag Extensions Registry 1891 'Identifier' contains the single-character subtag (singleton) 1892 assigned to the extension. The Internet-Draft submitted to define 1893 the extension SHOULD specify which letter or digit to use, although 1894 the IESG MAY change the assignment when approving the RFC. 1896 'Description' contains the name and description of the extension. 1898 'Comments' is an OPTIONAL field and MAY contain a broader description 1899 of the extension. 1901 'Added' contains the date the extension's RFC was published in the 1902 "full-date" format specified in [RFC3339]. For example: 2004-06-28 1903 represents June 28, 2004, in the Gregorian calendar. 1905 'RFC' contains the RFC number assigned to the extension. 1907 'Authority' contains the name of the maintaining authority for the 1908 extension. 1910 'Contact_Email' contains the email address used to contact the 1911 maintaining authority. 1913 'Mailing_List' contains the URL or subscription email address of the 1914 mailing list used by the maintaining authority. 1916 'URL' contains the URL of the registry for this extension. 1918 The determination of whether an Internet-Draft meets the above 1919 conditions and the decision to grant or withhold such authority rests 1920 solely with the IESG and is subject to the normal review and appeals 1921 process associated with the RFC process. 1923 Extension authors are strongly cautioned that many (including most 1924 well-formed) processors will be unaware of any special relationships 1925 or meaning inherent in the order of extension subtags. Extension 1926 authors SHOULD avoid subtag relationships or canonicalization 1927 mechanisms that interfere with matching or with length restrictions 1928 that sometimes exist in common protocols where the extension is used. 1929 In particular, applications MAY truncate the subtags in doing 1930 matching or in fitting into limited lengths, so it is RECOMMENDED 1931 that the most significant information be in the most significant 1932 (left-most) subtags and that the specification gracefully handle 1933 truncated subtags. 1935 When a language tag is to be used in a specific, known, protocol, it 1936 is RECOMMENDED that the language tag not contain extensions not 1937 supported by that protocol. In addition, note that some protocols 1938 MAY impose upper limits on the length of the strings used to store or 1939 transport the language tag. 1941 3.8. Update of the Language Subtag Registry 1943 Upon adoption of this document the IANA Language Subtag Registry will 1944 need an update so that it contains the complete set of subtags valid 1945 in a language tag. This collection of subtags, along with a 1946 description of the process used to create it, is described by 1947 [registry-update]. IANA will publish the updated version of the 1948 registry described by this document using the instructions and 1949 content of [registry-update]. Once published by IANA, the 1950 maintenance procedures, rules, and registration processes described 1951 in this document will be available for new registrations or updates. 1953 Registrations that are in process under the rules defined in 1954 [RFC4646] when this document is adopted MUST be completed under the 1955 rules contained in this document. 1957 4. Formation and Processing of Language Tags 1959 This section addresses how to use the information in the registry 1960 with the tag syntax to choose, form, and process language tags. 1962 4.1. Choice of Language Tag 1964 The guiding principle in forming language tags is to "tag content 1965 wisely." Sometimes there is a choice between several possible tags 1966 for the same content. The choice of which tag to use depends on the 1967 content and application in question and some amount of judgment might 1968 be necessary when selecting a tag. 1970 Interoperability is best served when the same language tag is used 1971 consistently to represent the same language. If an application has 1972 requirements that make the rules here inapplicable, then that 1973 application risks damaging interoperability. It is strongly 1974 RECOMMENDED that users not define their own rules for language tag 1975 choice. 1977 A subtag SHOULD only be used when it adds useful distinguishing 1978 information to the tag. Extraneous subtags interfere with the 1979 meaning, understanding, and processing of language tags. In 1980 particular, users and implementations SHOULD follow the 'Prefix' and 1981 'Suppress-Script' fields in the registry (defined in Section 3.1): 1982 these fields provide guidance on when specific additional subtags 1983 SHOULD be used or avoided in a language tag. 1985 Some applications can benefit from the use of script subtags in 1986 language tags, as long as the use is consistent for a given context. 1987 Script subtags are never appropriate for unwritten content (such as 1988 audio recordings). 1990 Script subtags were not formally defined in [RFC3066] and their use 1991 can affect matching and subtag identification for implementations of 1992 RFC 3066, as these subtags appear between the primary language and 1993 region subtags. For example, if an implementation selects content 1994 using Basic Filtering [RFC4647] (originally described in Section 2.5 1995 of [RFC3066]) and the user requested the language range "en-US", 1996 content labeled "en-Latn-US" will not match the request and thus not 1997 be selected. Therefore, it is important to know when script subtags 1998 will customarily be used and when they ought not be used. In the 1999 registry, the Suppress-Script field helps ensure greater 2000 compatibility between the language tags by defining when users SHOULD 2001 NOT include a script subtag with a particular primary language 2002 subtag. 2004 The choice of subtags used to form a language tag SHOULD be guided by 2005 the following rules: 2007 1. Use as precise a tag as possible, but no more specific than is 2008 justified. Avoid using subtags that are not important for 2009 distinguishing content in an application. 2011 * For example, 'de' might suffice for tagging an email written 2012 in German, while "de-CH-1996" is probably unnecessarily 2013 precise for such a task. 2015 2. The script subtag SHOULD NOT be used to form language tags unless 2016 the script adds some distinguishing information to the tag. The 2017 field 'Suppress-Script' in the primary language record in the 2018 registry indicates script subtags that do not add distinguishing 2019 information for most applications. For example: 2021 * The subtag 'Latn' should not be used with the primary language 2022 'en' because nearly all English documents are written in the 2023 Latin script and it adds no distinguishing information. 2024 However, if a document were written in English mixing Latin 2025 script with another script such as Braille ('Brai'), then it 2026 might be appropriate to choose to indicate both scripts to aid 2027 in content selection, such as the application of a style 2028 sheet. 2030 * When labeling content that is unwritten (such as a recording 2031 of human speech), the script subtag should not be used, even 2032 if the language is customarily written in several scripts. 2033 Thus the subtitles to a movie might use the tag "zh-cmn-Hant" 2034 (Chinese, Mandarin, Traditional script), but the audio track 2035 for the same language would be tagged "zh-cmn". 2037 3. If a tag or subtag has a 'Preferred-Value' field in its registry 2038 entry, then the value of that field SHOULD be used to form the 2039 language tag in preference to the tag or subtag in which the 2040 preferred value appears. 2042 * For example, use 'he' for Hebrew in preference to 'iw'. 2044 4. [ISO639-2] has defined several codes included in the subtag 2045 registry that require additional care when choosing language 2046 tags. In most of these cases, where omitting the language tag is 2047 permitted, such omission is preferable to using these codes. 2048 Language tags SHOULD NOT incorporate these subtags as a prefix, 2049 unless the additional information conveys some value to the 2050 application. 2052 1. Use specific language subtags or subtag sequences in 2053 preference to subtags for language collections. A "language 2054 collection" is a subtag derived from one of the [ISO639-2] 2055 codes that represents multiple related languages. These 2056 codes are included as primary language subtags in the 2057 registry. For example, the code 'cmc' represents "Chamic 2058 languages". The registry contains values for each of the 2059 approximately ten individual languages represented by this 2060 collective code. Some other examples include the subtags 2061 Germanic languages ('gem') or Algonquian languages ('alg'). 2062 Since these codes are interpreted inclusively, content tagged 2063 with "en" (English), "de" (German), or "gsw" (Swiss German, 2064 Alemannic) could also (but SHOULD NOT) be tagged with "gem" 2065 (Germanic languages). Subtags derived from collection codes 2066 SHOULD NOT be used be used unless more specific language 2067 information is not available. Note that matching 2068 implementations generally do not understand the relationship 2069 between the collection and its encompassed languages, and so 2070 users ought not assume a subtag based on a language 2071 collection is a useful means for selecting content in its 2072 encompassed languages. 2074 2. The 'mul' (Multiple) primary language subtag identifies 2075 content in multiple languages. It SHOULD NOT be used when a 2076 list of languages (such as Content-Language) or individual 2077 tags for each content element can be used instead. 2079 3. The 'und' (Undetermined) primary language subtag identifies 2080 linguistic content whose language is not known. It SHOULD 2081 NOT be used unless a language tag is required and language 2082 information is not available or cannot be determined. 2083 Omitting the language tag (where permitted) is preferred. 2084 The 'und' subtag MAY be useful for protocols that require a 2085 language tag to be provided or where a primary language 2086 subtag is required (such as in "und-Latn"). The 'und' subtag 2087 MAY also be useful when matching language tags in certain 2088 situations. 2090 4. The 'zxx' (Non-Linguistic) primary language subtag identifies 2091 content that has no language. Some examples might include 2092 instrumental or electronic music; sound recordings consisting 2093 of nonverbal sounds; audiovisual materials with no narration, 2094 printed titles, or subtitles; machine-readable data files 2095 consisting of machine languages or character codes; or 2096 programming source code. Note: where there are fragments of 2097 linguistic content, such as programming source code 2098 containing comments written in English, the subtag 'zxx' 2099 might still be used to indicate the primary status of the 2100 content, just as 'en' can be applied to a predominantly 2101 English text that contains a few French phrases. 2103 5. The 'mis' (Uncoded) primary language subtag identifies 2104 content whose language is known but which does not currently 2105 have a corresponding subtag. This subtag SHOULD NOT be used. 2106 Because the addition of other codes in the future can render 2107 its application invalid, it is inherently unstable and hence 2108 incompatible with the stability goals of BCP 47. It is 2109 always preferable to use other subtags: either 'und' or (with 2110 prior agreement) private use subtags. 2112 6. The grandfathered tag "i-default" (Default Language) was 2113 originally registered according to [RFC1766] to meet the 2114 needs of [RFC2277]. It is used to indicate not a specific 2115 language, but rather, it identifies the condition or content 2116 used where the language preferences of the user cannot be 2117 established. It SHOULD NOT be used except as a means of 2118 labeling the default content for applications or protocols 2119 that require default language content to be labeled with that 2120 specific tag. It MAY also be used by an application or 2121 protocol to identify when the default language content is 2122 being returned. 2124 5. The same variant subtag MUST NOT be used more than once within a 2125 language tag. 2127 * For example, the tag "de-DE-1901-1901" is not valid. 2129 Some of the languages in the registry are labeled "macrolanguages" by 2130 ISO 639-3, which defines the term as "clusters of closely-related 2131 language varieties that [...] can be considered distinct individual 2132 languages, yet in certain usage contexts a single language identity 2133 for all is needed". These correspond to codes registered in ISO 2134 639-2 as single languages that were found to correspond to more than 2135 one language in ISO 639-3. The record for each of the languages 2136 encompassed by a macrolanguage contains a 'Macrolanguage' field in 2137 the registry; the macrolanguages themselves are not specially marked. 2139 It is always permitted, and sometimes useful, to tag an encompassed 2140 language using the subtag for its macrolanguage. However, the 2141 Macrolanguage field doesn't define what the relationship is between 2142 the encompassed language and its macrolanguage, nor does it define 2143 how languages encompassed by the same macrolanguage are related to 2144 each other. In some cases, one of the encompassed languages serves 2145 as a standard form for the entire macrolanguage and is frequently 2146 identified with it; in other cases there is no dominant language, and 2147 the macrolanguage simply serves as a cover term for the entire group. 2149 Applications MAY use macrolanguage information to improve matching or 2150 language negotiation. For example, the information that 'sr' 2151 (Serbian) and 'hr' (Croatian) share a macrolanguage expresses a 2152 closer relation between those languages than between, say, 'sr' 2153 (Serbian) and 'ma' (Macedonian). It is valid to use either the 2154 subtag of the encompassed language or of the macrolanguage to form 2155 language tags. However, many matching applications will not be aware 2156 of the relationship between the languages. Care in selecting which 2157 subtags are used is crucial to interoperability. 2159 In general, use the most specific subtag to form the language tag. 2160 However, where the macrolanguage tag has been historically used to 2161 denote a dominant encompassed language, it SHOULD be used in place of 2162 the subtag specific to that encompassed language unless it is 2163 necessary to clearly distinguish the macrolanguage as a whole from 2164 that enclosed dominant language variety. 2166 The pairs of macro and encompassed languages affected by this issue 2167 when this document was published were: 2169 Arabic 'ar' Standard Arabic 'arb' 2170 Konkani (macrolanguage) 'kok' Konkani (single language) 'knn' 2171 Malay (macrolanguage) 'ms' Malay (single language) 'mly' 2172 Swahili (macrolanguage) 'sw' Swahili (single language) 'swh' 2173 Uzbek 'uz' Northern Uzbek 'uzn' 2174 Chinese 'zh' Mandarin Chinese 'cmn' 2176 Figure 8 2178 In particular, the Chinese family of languages call for special 2179 consideration. Because the written form is very similar for most 2180 languages having 'zh' (Chinese) as a macrolanguage (and because 2181 historically subtags for the various encompassed languages were not 2182 available), languages such as 'yue' (Cantonese) have historically 2183 used either 'zh' or a tag (now grandfathered) beginning with 'zh'. 2184 This means that macrolanguage information can be usefully applied 2185 when searching for content or when providing fallbacks in language 2186 negotiation. For example, the information that 'yue' has a 2187 macrolangauge of 'zh' could be used in the Lookup algorithm to 2188 fallback from a request for "yue-Hans-CN" to "zh-Hans-CN" without 2189 losing the script and region information (even though the user did 2190 not specify "zh-Hans-CN" in their request). 2192 To ensure consistent backward compatibility, this document contains 2193 several provisions to account for potential instability in the 2194 standards used to define the subtags that make up language tags. 2195 These provisions mean that no language tag created under the rules in 2196 this document will become invalid, nor will a language tag have a 2197 narrower scope in the future (it may have a broader scope). 2199 Standards, protocols, and applications that reference this document 2200 normatively but apply different rules to the ones given in this 2201 section MUST specify how language tag selection varies from the 2202 guidelines given here. 2204 4.2. Meaning of the Language Tag 2206 The meaning of a language tag is related to the meaning of the 2207 subtags that it contains. Each subtag, in turn, implies a certain 2208 range of expectations one might have for related content, although it 2209 is not a guarantee. For example, the use of a script subtag such as 2210 'Arab' (Arabic script) does not mean that the content contains only 2211 Arabic characters. It does mean that the language involved is 2212 predominantly in the Arabic script. Thus a language tag and its 2213 subtags can encompass a very wide range of variation and yet remain 2214 valid in each particular instance. 2216 Validity of a tag is not the only factor determining its usefulness. 2217 While every valid tag has a meaning, it might not represent any real- 2218 world language usage. This is unavoidable in a system in which 2219 subtags can be combined freely. For example, tags such as 2220 "ar-Cyrl-CO" (Arabic, Cyrillic script, as used in Colombia ) or "tlh- 2221 Kore-AQ-fonipa" (Klingon, Korean script, as used in Antarctica, IPA 2222 phonetic transcription) are both valid and unlikely to represent a 2223 useful combination of language attributes. 2225 The meaning of a given tag doesn't depend on the context in which it 2226 appears. The relationship between a tag's meaning and the 2227 information objects to which that tag is applied, however, can very. 2229 o For a single information object, the associated language tags 2230 might be interpreted as the set of languages that is necessary for 2231 a complete comprehension of the complete object. Example: Plain 2232 text documents. 2234 o For an aggregation of information objects, the associated language 2235 tags could be taken as the set of languages used inside components 2236 of that aggregation. Examples: Document stores and libraries. 2238 o For information objects whose purpose is to provide alternatives, 2239 the associated language tags could be regarded as a hint that the 2240 content is provided in several languages and that one has to 2241 inspect each of the alternatives in order to find its language or 2242 languages. In this case, the presence of multiple tags might not 2243 mean that one needs to be multi-lingual to get complete 2244 understanding of the document. Example: MIME multipart/ 2245 alternative. 2247 o In markup languages, such as HTML and XML, language information 2248 can be added to each part of the document identified by the markup 2249 structure (including the whole document itself). For example, one 2250 could write C'est la vie. inside a 2251 Norwegian document; the Norwegian-speaking user could then access 2252 a French-Norwegian dictionary to find out what the marked section 2253 meant. If the user were listening to that document through a 2254 speech synthesis interface, this formation could be used to signal 2255 the synthesizer to appropriately apply French text-to-speech 2256 pronunciation rules to that span of text, instead of applying the 2257 inappropriate Norwegian rules. 2259 o Language tags form the basis for most implementations of locale 2260 identifiers. For example, see Unicode's CLDR (Common Locale Data 2261 Repository) project. 2263 Language tags are related when they contain a similar sequence of 2264 subtags. For example, if a language tag B contains language tag A as 2265 a prefix, then B is typically "narrower" or "more specific" than A. 2266 Thus, "zh-Hant-TW" is more specific than "zh-Hant". 2268 This relationship is not guaranteed in all cases: specifically, 2269 languages that begin with the same sequence of subtags are NOT 2270 guaranteed to be mutually intelligible, although they might be. For 2271 example, the tag "az" shares a prefix with both "az-Latn" 2272 (Azerbaijani written using the Latin script) and "az-Cyrl" 2273 (Azerbaijani written using the Cyrillic script). A person fluent in 2274 one script might not be able to read the other, even though the text 2275 might be identical. Content tagged as "az" most probably is written 2276 in just one script and thus might not be intelligible to a reader 2277 familiar with the other script. 2279 Similarly, not all subtags specify an actual distinction in language. 2280 For example, the tags "en-US" and "en-CA" mean, roughly, English with 2281 features generally thought to be characteristic of the United States 2282 and Canada, respectively. They do not imply that a significant 2283 dialectical boundary exists between any arbitrarily selected point in 2284 the United States and any arbitrarily selected point in Canada. 2285 Neither does a particular region subtag imply that linguistic 2286 distinctions do not exist within that region. 2288 4.3. Length Considerations 2290 There is no defined upper limit on the size of language tags. While 2291 historically most language tags have consisted of language and region 2292 subtags with a combined total length of up to six characters, larger 2293 tags have always been both possible and actually appeared in use. 2295 Neither the language tag syntax nor other requirements in this 2296 document impose a fixed upper limit on the number of subtags in a 2297 language tag (and thus an upper bound on the size of a tag). The 2298 language tag syntax suggests that, depending on the specific 2299 language, more subtags (and thus a longer tag) are sometimes 2300 necessary to completely identify the language for certain 2301 applications; thus, it is possible to envision long or complex subtag 2302 sequences. 2304 4.3.1. Working with Limited Buffer Sizes 2306 Some applications and protocols are forced to allocate fixed buffer 2307 sizes or otherwise limit the length of a language tag. A conformant 2308 implementation or specification MAY refuse to support the storage of 2309 language tags that exceed a specified length. Any such limitation 2310 SHOULD be clearly documented, and such documentation SHOULD include 2311 what happens to longer tags (for example, whether an error value is 2312 generated or the language tag is truncated). A protocol that allows 2313 tags to be truncated at an arbitrary limit, without giving any 2314 indication of what that limit is, has the potential for causing harm 2315 by changing the meaning of tags in substantial ways. 2317 In practice, most language tags do not require more than a few 2318 subtags and will not approach reasonably sized buffer limitations; 2319 see Section 4.1. 2321 Some specifications or protocols have limits on tag length but do not 2322 have a fixed length limitation. For example, [RFC2231] has no 2323 explicit length limitation: the length available for the language tag 2324 is constrained by the length of other header components (such as the 2325 charset's name) coupled with the 76-character limit in [RFC2047]. 2326 Thus, the "limit" might be 50 or more characters, but it could 2327 potentially be quite small. 2329 The considerations for assigning a buffer limit are: 2331 Implementations SHOULD NOT truncate language tags unless the 2332 meaning of the tag is purposefully being changed, or unless the 2333 tag does not fit into a limited buffer size specified by a 2334 protocol for storage or transmission. 2336 Implementations SHOULD warn the user when a tag is truncated since 2337 truncation changes the semantic meaning of the tag. 2339 Implementations of protocols or specifications that are space 2340 constrained but do not have a fixed limit SHOULD use the longest 2341 possible tag in preference to truncation. 2343 Protocols or specifications that specify limited buffer sizes for 2344 language tags MUST allow for language tags of up to 33 characters. 2346 Protocols or specifications that specify limited buffer sizes for 2347 language tags SHOULD allow for language tags of at least 30 2348 characters. Note that RFC 4646 [RFC4646] recommended a field size 2349 of 42 character because it included the permanently reserved (and 2350 unused) 'extlang' production. The current size recommendation 2351 does not include the use of the 'extlang' field. Protocols or 2352 specifications that commonly use extensions or private use subtags 2353 might wish to reserve or recommend a longer "minimum buffer" size. 2355 The following illustration shows how the 30-character recommendation 2356 was derived: 2358 language = 3 (ISO 639-2; ISO 639-1 requires 2) 2359 script = 5 (if not suppressed: see Section 4.1) 2360 region = 4 (UN M.49; ISO 3166 requires 3) 2361 variant1 = 9 (needs 'language' as a prefix) 2362 variant2 = 9 (needs 'language-variant1' as a prefix) 2364 total = 30 characters 2366 Figure 9: Derivation of the Limit on Tag Length 2368 4.3.2. Truncation of Language Tags 2370 Truncation of a language tag alters the meaning of the tag, and thus 2371 SHOULD be avoided. However, truncation of language tags is sometimes 2372 necessary due to limited buffer sizes. Such truncation MUST NOT 2373 permit a subtag to be chopped off in the middle or the formation of 2374 invalid tags (for example, one ending with the "-" character). 2376 This means that applications or protocols that truncate tags MUST do 2377 so by progressively removing subtags along with their preceding "-" 2378 from the right side of the language tag until the tag is short enough 2379 for the given buffer. If the resulting tag ends with a single- 2380 character subtag, that subtag and its preceding "-" MUST also be 2381 removed. For example: 2383 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 2384 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 2385 2. zh-Latn-CN-variant1-a-extend1 2386 3. zh-Latn-CN-variant1 2387 4. zh-Latn-CN 2388 5. zh-Latn 2389 6. zh 2391 Figure 10: Example of Tag Truncation 2393 4.4. Canonicalization of Language Tags 2395 Since a particular language tag is sometimes used by many processes, 2396 language tags SHOULD always be created or generated in a canonical 2397 form. 2399 A language tag is in canonical form when: 2401 1. The tag is well-formed according the rules in Section 2.1 and 2402 Section 2.2. 2404 2. Subtags of type 'Region' that have a Preferred-Value mapping in 2405 the IANA registry (see Section 3.1) MUST be replaced with their 2406 mapped value. Note: In rare cases, the mapped value will also 2407 have a Preferred-Value. 2409 3. Redundant or grandfathered tags that have a Preferred-Value 2410 mapping in the IANA registry (see Section 3.1) MUST be replaced 2411 with their mapped value. These items either are deprecated 2412 mappings created before the adoption of this document (such as 2413 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 2414 the result of later registrations or additions to this document 2415 (for example, "zh-hakka" was deprecated in favor of the ISO 639-3 2416 code 'hak' when this document was adopted). 2418 4. Other subtags that have a Preferred-Value mapping in the IANA 2419 registry (see Section 3.1) MUST be replaced with their mapped 2420 value. These items consist entirely of clerical corrections to 2421 ISO 639-1 in which the deprecated subtags have been maintained 2422 for compatibility purposes. 2424 5. If more than one extension subtag sequence exists, the extension 2425 sequences are ordered into case-insensitive ASCII order by 2426 singleton subtag. 2428 Example: The language tag "en-a-aaa-b-ccc-bbb-x-xyz" is in canonical 2429 form, while "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed and potentially 2430 valid (extensions 'a' and 'b' are not defined as of the publication 2431 of this document) but not in canonical form (the extensions are not 2432 in alphabetical order). 2434 Example: The language tag "en-BU" (English as used in Burma) is not 2435 canonical because the 'BU' subtag has a canonical mapping to 'MM' 2436 (Myanmar), although the tag "en-BU" maintains its validity. 2438 Canonicalization of language tags does not imply anything about the 2439 use of upper or lowercase letters when processing or comparing 2440 subtags (and as described in Section 2.1). All comparisons MUST be 2441 performed in a case-insensitive manner. 2443 When performing canonicalization of language tags, processors MAY 2444 regularize the case of the subtags (that is, this process is 2445 OPTIONAL), following the case used in the registry. Note that this 2446 corresponds to the following casing rules: uppercase all non-initial 2447 two-letter subtags; titlecase all non-initial four-letter subtags; 2448 lowercase everything else. 2450 Note: Case folding of ASCII letters in certain locales, unless 2451 carefully handled, sometimes produces non-ASCII character values. 2452 The Unicode Character Database file "SpecialCasing.txt" defines the 2453 specific cases that are known to cause problems with this. In 2454 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2455 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2456 Implementers SHOULD specify a locale-neutral casing operation to 2457 ensure that case folding of subtags does not produce this value, 2458 which is illegal in language tags. For example, if one were to 2459 uppercase the region subtag 'in' using Turkish locale rules, the 2460 sequence U+0130 U+004E would result instead of the expected 'IN'. 2462 Note: if the field 'Deprecated' appears in a registry record without 2463 an accompanying 'Preferred-Value' field, then that tag or subtag is 2464 deprecated without a replacement. Validating processors SHOULD NOT 2465 generate tags that include these values, although the values are 2466 canonical when they appear in a language tag. 2468 An extension MUST define any relationships that exist between the 2469 various subtags in the extension and thus MAY define an alternate 2470 canonicalization scheme for the extension's subtags. Extensions MAY 2471 define how the order of the extension's subtags are interpreted. For 2472 example, an extension could define that its subtags are in canonical 2473 order when the subtags are placed into ASCII order: that is, "en-a- 2474 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2475 define that the order of the subtags influences their semantic 2476 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2477 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2478 so that they are tolerant of the typical processes described in 2479 Section 3.7. 2481 4.5. Considerations for Private Use Subtags 2483 Private use subtags, like all other subtags, MUST conform to the 2484 format and content constraints in the ABNF. Private use subtags have 2485 no meaning outside the private agreement between the parties that 2486 intend to use or exchange language tags that employ them. The same 2487 subtags MAY be used with a different meaning under a separate private 2488 agreement. They SHOULD NOT be used where alternatives exist and 2489 SHOULD NOT be used in content or protocols intended for general use. 2491 Private use subtags are simply useless for information exchange 2492 without prior arrangement. The value and semantic meaning of private 2493 use tags and of the subtags used within such a language tag are not 2494 defined by this document. 2496 Subtags defined in the IANA registry as having a specific private use 2497 meaning convey more information that a purely private use tag 2498 prefixed by the singleton subtag 'x'. For applications, this 2499 additional information MAY be useful. 2501 For example, the region subtags 'AA', 'ZZ', and in the ranges 2502 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2503 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2504 great deal of public, interchangeable information about the language 2505 material (that it is Chinese in the simplified Chinese script and is 2506 suitable for some geographic region 'XQ'). While the precise 2507 geographic region is not known outside of private agreement, the tag 2508 conveys far more information than an opaque tag such as "x-someLang", 2509 which contains no information about the language subtag or script 2510 subtag outside of the private agreement. 2512 However, in some cases content tagged with private use subtags MAY 2513 interact with other systems in a different and possibly unsuitable 2514 manner compared to tags that use opaque, privately defined subtags, 2515 so the choice of the best approach sometimes depends on the 2516 particular domain in question. 2518 5. IANA Considerations 2520 This section deals with the processes and requirements necessary for 2521 IANA to undertake to maintain the subtag and extension registries as 2522 defined by this document and in accordance with the requirements of 2523 [RFC2434]. 2525 The impact on the IANA maintainers of the two registries defined by 2526 this document will be a small increase in the frequency of new 2527 entries or updates. IANA also is required to create a new mailing 2528 list (described below in Section 5.1) to announce registry changes 2529 and updates. 2531 5.1. Language Subtag Registry 2533 Upon adoption of this document, IANA will update the registry using 2534 instructions and content provided in a companion document: 2535 [registry-update]. The criteria and process for selecting the 2536 updated set of records are described in that document. The updated 2537 set of records represents no impact on IANA, since the work to create 2538 it will be performed externally. 2540 Future work on the Language Subtag Registry includes the following 2541 activities: 2543 Inserting or replacing whole records. These records are 2544 preformatted for IANA by the Language Subtag Reviewer, as 2545 described in Section 3.3. 2547 Archiving and making publicly available the registration forms. 2549 Announcing each updated version of the registry on the 2550 "ietf-languages-announcements@iana.org" mailing list. 2552 Each registration form sent to IANA contains a single record for 2553 incorporation into the registry. The form will be sent to 2554 "iana@iana.org" by the Language Subtag Reviewer. It will have a 2555 subject line indicating whether the enclosed form represents an 2556 insertion of a new record (indicated by the word "INSERT" in the 2557 subject line) or a replacement of an existing record (indicated by 2558 the word "MODIFY" in the subject line). At no time can a record be 2559 deleted from the registry. 2561 IANA will extract the record from the form and place the inserted or 2562 modified record into the appropriate section of the language subtag 2563 registry, grouping the records by their 'Type' field. Inserted 2564 records can be placed anywhere in the appropriate section; there is 2565 no guarantee of the order of the records beyond grouping them 2566 together by 'Type'. Modified records overwrite the record they 2567 replace. 2569 IANA will also update the File-Date record to contain the most recent 2570 modification date when performing any inserting or modification: 2571 included in any request to insert or modify records will be a new 2572 File-Date record indicating the acceptance date of the record. This 2573 record is to be placed first in the registry, replacing the existing 2574 File-Date record. In the event that the File-Date record present in 2575 the registry has a later date than the record being inserted or 2576 modified, then the latest (most recent) record will be preserved. 2577 IANA should process multiple registration requests in order according 2578 to the File-Date in the form, since one registration could otherwise 2579 cause a more recent change to be overwritten. 2581 The updated registry file MUST use the UTF-8 character encoding and 2582 IANA MUST check the registry file for proper encoding. Non-ASCII 2583 characters can be sent to IANA by attaching the registration form to 2584 the email message or by using various encodings in the mail message 2585 body (UTF-8 is recommended). IANA will verify any unclear or 2586 corrupted characters with the Language Subtag Reviewer prior to 2587 posting the updated registry. 2589 IANA will also archive and make publicly available from 2590 "http://www.iana.org/assignments/lang-subtags-templates/" each 2591 registration form. Note that multiple registrations can pertain to 2592 the same record in the registry. 2594 Developers who are dependent upon the language subtag registry 2595 sometimes would like to be informed of changes in the registry so 2596 that they can update their implementations. When any change is made 2597 to the language subtag registry, IANA will send an announcement 2598 message to "ietf-languages-announcements@iana.org" (a self- 2599 subscribing list that only IANA can post to). 2601 5.2. Extensions Registry 2603 The Language Tag Extensions Registry can contain at most 35 records 2604 and thus changes to this registry are expected to be very infrequent. 2606 Future work by IANA on the Language Tag Extensions Registry is 2607 limited to two cases. First, the IESG MAY request that new records 2608 be inserted into this registry from time to time. These requests 2609 MUST include the record to insert in the exact format described in 2610 Section 3.7. In addition, there MAY be occasional requests from the 2611 maintaining authority for a specific extension to update the contact 2612 information or URLs in the record. These requests MUST include the 2613 complete, updated record. IANA is not responsible for validating the 2614 information provided, only that it is properly formatted. It should 2615 reasonably be seen to come from the maintaining authority named in 2616 the record present in the registry. 2618 6. Security Considerations 2620 Language tags used in content negotiation, like any other information 2621 exchanged on the Internet, might be a source of concern because they 2622 might be used to infer the nationality of the sender, and thus 2623 identify potential targets for surveillance. 2625 This is a special case of the general problem that anything sent is 2626 visible to the receiving party and possibly to third parties as well. 2627 It is useful to be aware that such concerns can exist in some cases. 2629 The evaluation of the exact magnitude of the threat, and any possible 2630 countermeasures, is left to each application protocol (see BCP 72 2631 [RFC3552] for best current practice guidance on security threats and 2632 defenses). 2634 The language tag associated with a particular information item is of 2635 no consequence whatsoever in determining whether that content might 2636 contain possible homographs. The fact that a text is tagged as being 2637 in one language or using a particular script subtag provides no 2638 assurance whatsoever that it does not contain characters from scripts 2639 other than the one(s) associated with or specified by that language 2640 tag. 2642 Since there is no limit to the number of variant, private use, and 2643 extension subtags, and consequently no limit on the possible length 2644 of a tag, implementations need to guard against buffer overflow 2645 attacks. See Section 4.3 for details on language tag truncation, 2646 which can occur as a consequence of defenses against buffer overflow. 2648 Although the specification of valid subtags for an extension (see 2649 Section 3.7) MUST be available over the Internet, implementations 2650 SHOULD NOT mechanically depend on it being always accessible, to 2651 prevent denial-of-service attacks. 2653 7. Character Set Considerations 2655 The syntax in this document requires that language tags use only the 2656 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2657 character sets, so the composition of language tags should not have 2658 any character set issues. 2660 Rendering of characters based on the content of a language tag is not 2661 addressed in this memo. Historically, some languages have relied on 2662 the use of specific character sets or other information in order to 2663 infer how a specific character should be rendered (notably this 2664 applies to language- and culture-specific variations of Han 2665 ideographs as used in Japanese, Chinese, and Korean). When language 2666 tags are applied to spans of text, rendering engines sometimes use 2667 that information in deciding which font to use in the absence of 2668 other information, particularly where languages with distinct writing 2669 traditions use the same characters. 2671 8. Changes from RFC 4646 2673 The main goal for this revision of this document was to incorporate 2674 ISO 639-3 and its attendant set of language codes into the IANA 2675 Language Subtag Registry, permitting the identification of many more 2676 languages and dialects than previously supported. 2678 The specific changes in this document to meet these goals are: 2680 o Defines the incorporation of ISO 639-3 codes as language. It also 2681 permanently reserves and disallows the use of extlang subtags. 2682 The changes necessary to achieve this were: 2684 * Modified the ABNF comments. 2686 * Updated various registration and stability requirements 2687 sections to reference ISO 639-3 in addition to ISO 639-1 and 2688 ISO 639-2. 2690 * Edited the text to eliminate references to extended language 2691 subtags where they are no longer used. 2693 * Explained the change in the section on extended language 2694 subtags. 2696 o Changed the ABNF related to grandfathered tags. The irregular 2697 tags are now listed. Well-formed grandfathered tags are now 2698 described by the 'langtag' production and the 'grandfathered' 2699 production was removed as a result. Also: added description of 2700 both types of grandfathered tags to Section 2.2.8. 2702 o Added the paragraph on "collections" to Section 4.1. 2704 o Changed the capitalization rules for 'Tag' fields in Section 3.1. 2706 o Split section 3.1 up into subsections. 2708 o Modified section 3.5 to allow Suppress-Script fields to be added, 2709 modified, or removed via the registration process. This was an 2710 erratum from RFC 4646. 2712 o Modified examples that used region code 'CS' (formerly Serbia and 2713 Montenegro) to use 'RS' (Serbia) instead. 2715 o Modified the rules for creating and maintaining record 2716 'Description' fields to prevent duplicates, including inverted 2717 duplicates. 2719 o Removed the lengthy description of why RFC 4646 was created from 2720 this section, which also caused the removal of the reference to 2721 XML Schema. 2723 o Modified the text in section 2.1 to place more emphasis on the 2724 fact that language tags are not case sensitive. 2726 o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS" 2727 and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the 2728 Suppress-Script on 'Latn' with 'fr'. 2730 o Changed the requirements for well-formedness to make singleton 2731 repetition checking optional (it is required for validity 2732 checking) in Section 2.2.9. 2734 o Changed the text in Section 2.2.9 referring to grandfathered 2735 checking to note that the list is now included in the ABNF. 2737 o Modified and added text to Section 3.2. The job description was 2738 placed first. A note was added making clear that the Language 2739 Subtag Reviewer may delegate various non-critical duties, 2740 including list moderation. Finally, additional text was added to 2741 make the appointment process clear and to clarify that decisions 2742 and performance of the reviewer are appealable. 2744 o Added text to Section 3.5 clarifying that the ietf-languages list 2745 is operated by whomever the IESG appoints. 2747 o Added text to Section 3.1.4 clarifying that the first Description 2748 in a 'language' record matches the corresponding Reference Name 2749 for the language in ISO 639-3. 2751 o Modified Section 2.2.9 to define classes of conformance related to 2752 specific tags (formerly 'well-formed' and 'valid' referred to 2753 implementations). Notes were added about the removal of 'extlang' 2754 from the ABNF provided in RFC 4646, allowing for well-formedness 2755 using this older definition. Reference to RFC 3066 well- 2756 formedness was also added. 2758 o Added text to the end of Section 3.1.2 noting that future versions 2759 of this document might add new field types to the Registry format 2760 and recommending that implementations ignore any unrecognized 2761 fields. 2763 o Added text about what the lack of a Suppress-Script field means in 2764 a record to Section 3.1.8. 2766 o Added text allowing the correction of misspellings and typographic 2767 errors to Section 3.1.4. 2769 o Added text to Section 3.1.7 disallowing Prefix field conflicts 2770 (such as circular prefix references). 2772 o Modified text in Section 3.5 to require the subtag reviewer to 2773 announce his/her decision (or extension) following the two-week 2774 period. Also clarified that any decision or failure to decide can 2775 be appealed. 2777 o Modified text in Section 4.1 to include the (heretofore anecdotal) 2778 guiding principle of tag choice, and clarifying the non-use of 2779 script subtags in non-written applications. Also updated examples 2780 in this section to use Chamic languages as an example of language 2781 collections. 2783 o Prohibited multiple use of the same variant in a tag (i.e. "de- 2784 1901-1901"). Previously this was only a recommendation 2785 ("SHOULD"). 2787 o Removed inappropriate [RFC2119] language from the illustration in 2788 Section 4.3.1. 2790 o Replaced the example of deprecating "zh-gouyu" with "zh- 2791 hakka"->"hak" in Section 4.4, noting that it was this document 2792 that caused the change. 2794 o Replaced the section in Section 4.1 dealing with "mul"/"und" to 2795 include the subtags 'zxx' and 'mis', as well as the tag 2796 "i-default". A normative reference to RFC 2277 was added, along 2797 with an informative reference to MARC21. 2799 o Added text to Section 3.5 clarifying that any modifications of a 2800 registration request must be sent to the ietf-languages list 2801 before submission to IANA. 2803 o Changed the ABNF for the record-jar format from using the LWSP 2804 production to use a folding whitespace production similar to obs- 2805 FWS in RFC 4324. This effectively prevents unintentional blank 2806 lines inside a field. 2808 o Clarified and revised text in Section 3.3, Section 3.5, and 2809 Section 5.1 to clarify that the Language Subtag Reviewer sends the 2810 complete registration forms to IANA, that IANA extracts the record 2811 from the form, and that the forms must also be archived separately 2812 from the registry. 2814 o Added text to Section 5 requiring IANA to send an announcement to 2815 an ietf-languages-announce list whenever the registry is updated. 2817 o Modification of the registry to use UTF-8 as its character 2818 encoding. This also entails additional instructions to IANA and 2819 the Language Subtag Reviewer in the registration process. 2821 o Modified the rules in Section 2.2.4 so that "exceptionally 2822 reserved" ISO 3166-1 codes other than 'UK' were included into the 2823 registry. In particular, this allows the code 'EU' (European 2824 Union) to be used to form language tags or (more commonly) for 2825 applications that use the registry for region codes to reference 2826 this subtag. 2828 o Modified the IANA considerations section (Section 5) to remove 2829 unnecessary normative [RFC2119] language. 2831 9. References 2833 9.1. Normative References 2835 [ISO15924] 2836 International Organization for Standardization, "ISO 2837 15924:2004. Information and documentation -- Codes for the 2838 representation of names of scripts", January 2004. 2840 [ISO3166-1] 2841 International Organization for Standardization, "ISO 3166- 2842 1:2006. Codes for the representation of names of countries 2843 and their subdivisions -- Part 1: Country codes", 2844 November 2006. 2846 [ISO639-1] 2847 International Organization for Standardization, "ISO 639- 2848 1:2002. Codes for the representation of names of languages 2849 -- Part 1: Alpha-2 code", 2002. 2851 [ISO639-2] 2852 International Organization for Standardization, "ISO 639- 2853 2:1998. Codes for the representation of names of languages 2854 -- Part 2: Alpha-3 code, first edition", 1998. 2856 [ISO639-3] 2857 International Organization for Standardization, "ISO 639- 2858 3:2007. Codes for the representation of names of languages 2859 -- Part 3: Alpha-3 code for comprehensive coverage of 2860 languages", 2007. 2862 [ISO646] International Organization for Standardization, "ISO/IEC 2863 646:1991, Information technology -- ISO 7-bit coded 2864 character set for information interchange.", 1991. 2866 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2867 3", BCP 9, RFC 2026, October 1996. 2869 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2870 the IETF Standards Process", BCP 11, RFC 2028, 2871 October 1996. 2873 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2874 Requirement Levels", BCP 14, RFC 2119, March 1997. 2876 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 2877 Languages", BCP 18, RFC 2277, January 1998. 2879 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2880 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2881 October 1998. 2883 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2884 Understanding Concerning the Technical Work of the 2885 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2887 [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the 2888 Internet: Timestamps", RFC 3339, July 2002. 2890 [RFC4645] Ewell, D., "Initial Language Subtag Registry", RFC 4645, 2891 September 2006. 2893 [RFC4647] Phillips, A. and M. Davis, "Matching of Language Tags", 2894 BCP 47, RFC 4647, September 2006. 2896 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 2897 Specifications: ABNF", STD 68, RFC 5234, January 2008. 2899 [UAX14] Freitag, A., "Unicode Standard Annex #14: Line Breaking 2900 Properties", August 2006, 2901 . 2903 [UN_M.49] Statistics Division, United Nations, "Standard Country or 2904 Area Codes for Statistical Use", UN Standard Country or 2905 Area Codes for Statistical Use, Revision 4 (United Nations 2906 publication, Sales No. 98.XVII.9, June 1999. 2908 9.2. Informative References 2910 [RFC1766] Alvestrand, H., "Tags for the Identification of 2911 Languages", RFC 1766, March 1995. 2913 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2914 Part Three: Message Header Extensions for Non-ASCII Text", 2915 RFC 2047, November 1996. 2917 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2918 Word Extensions: 2919 Character Sets, Languages, and Continuations", RFC 2231, 2920 November 1997. 2922 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2923 10646", RFC 2781, February 2000. 2925 [RFC3066] Alvestrand, H., "Tags for the Identification of 2926 Languages", RFC 3066, January 2001. 2928 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2929 Text on Security Considerations", BCP 72, RFC 3552, 2930 July 2003. 2932 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 2933 10646", STD 63, RFC 3629, November 2003. 2935 [RFC4646] Phillips, A. and M. Davis, "Tags for Identifying 2936 Languages", BCP 47, RFC 4646, September 2006. 2938 [UTS35] Davis, M., "Unicode Technical Standard #35: Locale Data 2939 Markup Language (LDML)", 12 2007, 2940 . 2942 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2943 Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003. 2944 ISBN 0-321-49081-0)", January 2007. 2946 [iso639.prin] 2947 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2948 Committee: Working principles for ISO 639 maintenance", 2949 March 2000, 2950 . 2953 [record-jar] 2954 Raymond, E., "The Art of Unix Programming", 2003, 2955 . 2957 [registry-update] 2958 Ewell, D., Ed., "Update to the Language Subtag Registry", 2959 September 2006, . 2962 Appendix A. Acknowledgements 2964 Any list of contributors is bound to be incomplete; please regard the 2965 following as only a selection from the group of people who have 2966 contributed to make this document what it is today. 2968 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the 2969 precursors of this document, made enormous contributions directly or 2970 indirectly to this document and are generally responsible for the 2971 success of language tags. 2973 The following people contributed to this document: 2975 Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan, 2976 Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion 2977 Gunn, Kent Karlsson, Chris Newman, Randy Presuhn, Stephen Silver, and 2978 many, many others. 2980 Very special thanks must go to Harald Tveit Alvestrand, who 2981 originated RFCs 1766 and 3066, and without whom this document would 2982 not have been possible. 2984 Special thanks go to Michael Everson, who served as the Language Tag 2985 Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as 2986 the Language Subtag Reviewer since the adoption of RFC 4646. 2988 Special thanks also to Doug Ewell, for his production of the first 2989 complete subtag registry, his work to support and maintain new 2990 registrations, and his careful editorship of both RFC 4645 and 2991 [registry-update]. 2993 Appendix B. Examples of Language Tags (Informative) 2995 Simple language subtag: 2997 de (German) 2999 fr (French) 3001 ja (Japanese) 3003 i-enochian (example of a grandfathered tag) 3005 Language subtag plus Script subtag: 3007 zh-Hant (Chinese written using the Traditional Chinese script) 3009 zh-Hans (Chinese written using the Simplified Chinese script) 3011 sr-Cyrl (Serbian written using the Cyrillic script) 3013 sr-Latn (Serbian written using the Latin script) 3015 Language-Script-Region: 3017 zh-Hans-CN (Chinese written using the Simplified script as used in 3018 mainland China) 3020 sr-Latn-RS (Serbian written using the Latin script as used in 3021 Serbia) 3023 Language-Variant: 3025 sl-rozaj (Resian dialect of Slovenian) 3027 sl-nedis (Nadiza dialect of Slovenian) 3029 Language-Region-Variant: 3031 de-CH-1901 (German as used in Switzerland using the 1901 variant 3032 [orthography]) 3034 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 3036 Language-Script-Region-Variant: 3038 hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as 3039 used in Italy) 3041 Language-Region: 3043 de-DE (German for Germany) 3045 en-US (English as used in the United States) 3047 es-419 (Spanish appropriate for the Latin America and Caribbean 3048 region using the UN region code) 3050 Private use subtags: 3052 de-CH-x-phonebk 3054 az-Arab-x-AZE-derbend 3056 Private use registry values: 3058 x-whatever (private use using the singleton 'x') 3060 qaa-Qaaa-QM-x-southern (all private tags) 3062 de-Qaaa (German, with a private script) 3064 sr-Latn-QM (Serbian, Latin-script, private region) 3066 sr-Qaaa-RS (Serbian, private script, for Serbia) 3068 Tags that use extensions (examples ONLY: extensions MUST be defined 3069 by revision or update to this document or by RFC): 3071 en-US-u-islamCal 3073 zh-CN-a-myExt-x-private 3075 en-a-myExt-b-another 3077 Some Invalid Tags: 3079 de-419-DE (two region tags) 3081 a-DE (use of a single-character subtag in primary position; note 3082 that there are a few grandfathered tags that start with "i-" that 3083 are valid) 3084 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter 3085 prefix) 3087 Appendix C. Examples of Registration Forms 3088 LANGUAGE SUBTAG REGISTRATION FORM 3089 1. Name of requester: Han Steenwijk 3090 2. E-mail address of requester: han.steenwijk @ unipd.it 3091 3. Record Requested: 3093 Type: variant 3094 Subtag: biske 3095 Description: The San Giorgio dialect of Resian 3096 Description: The Bila dialect of Resian 3097 Prefix: sl-rozaj 3098 Comments: The dialect of San Giorgio/Bila is one of the 3099 four major local dialects of Resian 3101 4. Intended meaning of the subtag: The local variety of Resian as 3102 spoken in San Giorgio/Bila 3104 5. Reference to published description of the language (book or 3105 article): 3106 -- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich 3107 govorov, Varsava - Peterburg: Vende - Kozancikov, 1875. 3109 LANGUAGE SUBTAG REGISTRATION FORM 3110 1. Name of requester: Jaska Zedlik 3111 2. E-mail address of requester: jz53 @ zedlik.com 3112 3. Record Requested: 3114 Type: variant 3115 Subtag: tarask 3116 Description: Belarusian in Taraskievica orthography 3117 Prefix: be 3118 Comments: The subtag represents Branislau Taraskievic's Belarusian 3119 orthography as published in "Bielaruski klasycny pravapis" by Juras 3120 Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka 3121 (Vilnia-Miensk 2005). 3123 4. Intended meaning of the subtag: 3125 The subtag is intended to represent the Belarusian orthography as 3126 published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk 3127 Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005). 3129 5. Reference to published description of the language (book or article): 3131 Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd. 3132 "Bielaruskaha kamitetu", 1929, 5th edition. 3134 Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier. 3135 Bielaruski klasycny pravapis. Vilnia-Miensk, 2005. 3137 6. Any other relevant information: 3139 Belarusian in Taraskievica orthography became widely used, especially in 3140 Belarusian-speaking Internet segment, but besides this some books and 3141 newspapers are also printed using this orthography of Belarusian. 3143 Authors' Addresses 3145 Addison Phillips (editor) 3146 Yahoo! Inc. 3148 Email: addison@inter-locale.com 3149 URI: http://www.inter-locale.com 3151 Mark Davis (editor) 3152 Google 3154 Email: mark.davis@macchiato.com or mark.davis@google.com 3156 Full Copyright Statement 3158 Copyright (C) The IETF Trust (2008). 3160 This document is subject to the rights, licenses and restrictions 3161 contained in BCP 78, and except as set forth therein, the authors 3162 retain all their rights. 3164 This document and the information contained herein are provided on an 3165 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 3166 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 3167 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 3168 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 3169 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 3170 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 3172 Intellectual Property 3174 The IETF takes no position regarding the validity or scope of any 3175 Intellectual Property Rights or other rights that might be claimed to 3176 pertain to the implementation or use of the technology described in 3177 this document or the extent to which any license under such rights 3178 might or might not be available; nor does it represent that it has 3179 made any independent effort to identify any such rights. Information 3180 on the procedures with respect to rights in RFC documents can be 3181 found in BCP 78 and BCP 79. 3183 Copies of IPR disclosures made to the IETF Secretariat and any 3184 assurances of licenses to be made available, or the result of an 3185 attempt made to obtain a general license or permission for the use of 3186 such proprietary rights by implementers or users of this 3187 specification can be obtained from the IETF on-line IPR repository at 3188 http://www.ietf.org/ipr. 3190 The IETF invites any interested party to bring to its attention any 3191 copyrights, patents or patent applications, or other proprietary 3192 rights that may cover technology that may be required to implement 3193 this standard. Please address the information to the IETF at 3194 ietf-ipr@ietf.org. 3196 Acknowledgment 3198 Funding for the RFC Editor function is provided by the IETF 3199 Administrative Support Activity (IASA).