idnits 2.17.1 draft-ietf-ltru-4646bis-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 3105. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3116. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3123. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3129. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC4646, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 12, 2007) is 5972 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2860 ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) ** Downref: Normative reference to an Informational RFC: RFC 4645 -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) -- Obsolete informational reference (is this intentional?): RFC 4646 (Obsoleted by RFC 5646) Summary: 6 errors (**), 0 flaws (~~), 2 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc. 4 Obsoletes: 4646 (if approved) M. Davis, Ed. 5 Intended status: Best Current Google 6 Practice December 12, 2007 7 Expires: June 14, 2008 9 Tags for Identifying Languages 10 draft-ietf-ltru-4646bis-11 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on June 14, 2008. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2007). 41 Abstract 43 This document describes the structure, content, construction, and 44 semantics of language tags for use in cases where it is desirable to 45 indicate the language used in an information object. It also 46 describes how to register values for use in language tags and the 47 creation of user-defined extensions for private interchange. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 5 53 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 5 54 2.2. Language Subtag Sources and Interpretation . . . . . . . . 7 55 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . 9 56 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11 57 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 11 58 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 12 59 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 14 60 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 15 61 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 16 62 2.2.8. Grandfathered Registrations . . . . . . . . . . . . . 17 63 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 18 64 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 20 65 3.1. Format of the IANA Language Subtag Registry . . . . . . . 20 66 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 20 67 3.1.2. Record Definitions . . . . . . . . . . . . . . . . . . 21 68 3.1.3. Subtag and Tag Fields . . . . . . . . . . . . . . . . 23 69 3.1.4. Description Field . . . . . . . . . . . . . . . . . . 24 70 3.1.5. Deprecated Field . . . . . . . . . . . . . . . . . . . 25 71 3.1.6. Preferred-Value Field . . . . . . . . . . . . . . . . 25 72 3.1.7. Prefix Field . . . . . . . . . . . . . . . . . . . . . 26 73 3.1.8. Suppress-Script Field . . . . . . . . . . . . . . . . 27 74 3.1.9. Macrolanguage Field . . . . . . . . . . . . . . . . . 27 75 3.1.10. Comments Field . . . . . . . . . . . . . . . . . . . . 28 76 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 28 77 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 28 78 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 29 79 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 33 80 3.6. Possibilities for Registration . . . . . . . . . . . . . . 37 81 3.7. Extensions and the Extensions Registry . . . . . . . . . . 39 82 3.8. Update of the Language Subtag Registry . . . . . . . . . . 42 83 4. Formation and Processing of Language Tags . . . . . . . . . . 44 84 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 44 85 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 48 86 4.3. Length Considerations . . . . . . . . . . . . . . . . . . 50 87 4.3.1. Working with Limited Buffer Sizes . . . . . . . . . . 50 88 4.3.2. Truncation of Language Tags . . . . . . . . . . . . . 52 89 4.4. Canonicalization of Language Tags . . . . . . . . . . . . 52 90 4.5. Considerations for Private Use Subtags . . . . . . . . . . 54 91 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 56 92 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 56 93 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 57 94 6. Security Considerations . . . . . . . . . . . . . . . . . . . 59 95 7. Character Set Considerations . . . . . . . . . . . . . . . . . 60 96 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 61 97 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 65 98 9.1. Normative References . . . . . . . . . . . . . . . . . . . 65 99 9.2. Informative References . . . . . . . . . . . . . . . . . . 66 100 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 68 101 Appendix B. Examples of Language Tags (Informative) . . . . . . . 69 102 Appendix C. Examples of Registration Forms . . . . . . . . . . . 72 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 74 104 Intellectual Property and Copyright Statements . . . . . . . . . . 75 106 1. Introduction 108 Human beings on our planet have, past and present, used a number of 109 languages. There are many reasons why one would want to identify the 110 language used when presenting or requesting information. 112 A user's language preferences often need to be identified so that 113 appropriate processing can be applied. For example, the user's 114 language preferences in a Web browser can be used to select Web pages 115 appropriately. Language preferences can also be used to select among 116 tools (such as dictionaries) to assist in the processing or 117 understanding of content in different languages. 119 In addition, knowledge about the particular language used by some 120 piece of information content might be useful or even required by some 121 types of processing; for example, spell-checking, computer- 122 synthesized speech, Braille transcription, or high-quality print 123 renderings. 125 One means of indicating the language used is by labeling the 126 information content with an identifier or "tag". These tags can be 127 used to specify user preferences when selecting information content, 128 or for labeling additional attributes of content and associated 129 resources. 131 Tags can also be used to indicate additional language attributes of 132 content. For example, indicating specific information about the 133 dialect, writing system, or orthography used in a document or 134 resource may enable the user to obtain information in a form that 135 they can understand, or it can be important in processing or 136 rendering the given content into an appropriate form or style. 138 This document specifies a particular identifier mechanism (the 139 language tag) and a registration function for values to be used to 140 form tags. It also defines a mechanism for private use values and 141 future extension. 143 This document replaces [RFC4646], which replaced [RFC3066] and its 144 predecessor [RFC1766]. For a list of changes in this document, see 145 Section 8. 147 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 148 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 149 document are to be interpreted as described in [RFC2119]. 151 2. The Language Tag 153 Language tags are used to help identify languages, whether spoken, 154 written, signed, or otherwise signaled, for the purpose of 155 communication. This includes constructed and artificial languages, 156 but excludes languages not intended primarily for human 157 communication, such as programming languages. 159 2.1. Syntax 161 The language tag is composed of one or more parts, known as 162 "subtags". Each subtag consists of a sequence of alphanumeric 163 characters. Subtags are distinguished and separated from one another 164 by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a 165 "primary language" subtag and a (possibly empty) series of subsequent 166 subtags, each of which refines or narrows the range of languages 167 identified by the overall tag. 169 Usually, each type of subtag is distinguished by length, position in 170 the tag, and content: subtags can be recognized solely by these 171 features. The only exception to this is a fixed list of 172 grandfathered tags registered under RFC 3066 [RFC3066]. This makes 173 it possible to construct a parser that can extract and assign some 174 semantic information to the subtags, even if the specific subtag 175 values are not recognized. Thus, a parser need not have an up-to- 176 date copy (or any copy at all) of the subtag registry to perform most 177 searching and matching operations. 179 The syntax of the language tag in ABNF [RFC4234] is: 181 Language-Tag = langtag 182 / privateuse ; private use tag 183 / irregular ; tags grandfathered by rule 185 langtag = (language 186 ["-" script] 187 ["-" region] 188 *("-" variant) 189 *("-" extension) 190 ["-" privateuse]) 192 language = (2*3ALPHA) ; shortest ISO 639 code 193 / 4ALPHA ; reserved for future use 194 / 5*8ALPHA ; registered language subtag 196 script = 4ALPHA ; ISO 15924 code 198 region = 2ALPHA ; ISO 3166 code 199 / 3DIGIT ; UN M.49 code 201 variant = 5*8alphanum ; registered variants 202 / (DIGIT 3alphanum) 204 extension = singleton 1*("-" (2*8alphanum)) 206 singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT 207 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" 208 ; Single alphanumerics 209 ; "x" is reserved for private use 211 privateuse = "x" 1*("-" (1*8alphanum)) 213 irregular = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default" 214 / "i-enochian" / "i-hak" / "i-klingon" / "i-lux" 215 / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" 216 / "i-tay" / "i-tsu" / "sgn-BE-fr" / "sgn-BE-nl" 217 / "sgn-CH-de" 219 alphanum = (ALPHA / DIGIT) ; letters and numbers 221 Figure 1: Language Tag ABNF 223 All subtags have a maximum length of eight characters and whitespace 224 is not permitted in a language tag. There is a subtlety in the ABNF 225 production 'variant': variants starting with a digit MAY be four 226 characters long, while those starting with a letter MUST be at least 227 five characters long. For examples of language tags, see Appendix B. 229 Note Well: the ABNF syntax does not distinguish between upper and 230 lowercase. The appearance of upper and lowercase letters in the 231 various ABNF productions above do not affect how implementations 232 interpret tags. That is, the tag "I-AMI" matches the item "i-ami" in 233 the 'irregular' production. At all times, the tags and their 234 subtags, including private use and extensions, are to be treated as 235 case insensitive: there exist conventions for the capitalization of 236 some of the subtags, but these MUST NOT be taken to carry meaning. 238 For example: 240 o [ISO639-1] recommends that language codes be written in lowercase 241 ('mn' Mongolian). 243 o [ISO3166-1] recommends that country codes be capitalized ('MN' 244 Mongolia). 246 o [ISO15924] recommends that script codes use lowercase with the 247 initial letter capitalized ('Cyrl' Cyrillic). 249 However, in the tags defined by this document, the uppercase US-ASCII 250 letters in the range 'A' through 'Z' are considered equivalent and 251 mapped directly to their US-ASCII lowercase equivalents in the range 252 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from 253 "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of 254 these variations conveys the same meaning: Mongolian written in the 255 Cyrillic script as used in Mongolia. 257 Although case distinctions do not carry meaning in language tags, 258 consistent formatting and presentation of the tags will aid users. 259 The format of the tags and subtags in the registry is RECOMMENDED. 260 In this format, all non-initial two-letter subtags are uppercase, all 261 non-initial four-letter subtags are titlecase, and all other subtags 262 are lowercase. 264 Note that although [RFC4234] refers to octets, the language tags 265 described in this document are sequences of characters from the US- 266 ASCII [ISO646] repertoire. Language tags MAY be used in documents 267 and applications that use other encodings, so long as these encompass 268 the US-ASCII repertoire. An example of this would be an XML document 269 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 271 2.2. Language Subtag Sources and Interpretation 273 The namespace of language tags and their subtags is administered by 274 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 275 the rules in Section 5 of this document. The Language Subtag 276 Registry maintained by IANA is the source for valid subtags: other 277 standards referenced in this section provide the source material for 278 that registry. 280 Terminology used in this document: 282 o "Tag" refers to a complete language tag, such as "sr-Latn-RS" or 283 "az-Arab-IR". Examples of tags in this document are enclosed in 284 double-quotes ("en-US"). 286 o "Subtag" refers to a specific section of a tag, delimited by 287 hyphen, such as the subtag 'Hant' in "zh-Hant-CN". Examples of 288 subtags in this document are enclosed in single quotes ('Hant'). 290 o "Code" refers to values defined in external standards (and which 291 are used as subtags in this document). For example, 'Hant' is an 292 [ISO15924] script code that was used to define the 'Hant' script 293 subtag for use in a language tag. Examples of codes in this 294 document are enclosed in single quotes ('en', 'Hant'). 296 The definitions in this section apply to the various subtags within 297 the language tags defined by this document, excepting those 298 "grandfathered" tags defined in Section 2.2.8. 300 Language tags are designed so that each subtag type has unique length 301 and content restrictions. These make identification of the subtag's 302 type possible, even if the content of the subtag itself is 303 unrecognized. This allows tags to be parsed and processed without 304 reference to the latest version of the underlying standards or the 305 IANA registry and makes the associated exception handling when 306 parsing tags simpler. 308 Subtags in the IANA registry that do not come from an underlying 309 standard can only appear in specific positions in a tag. 310 Specifically, they can only occur as primary language subtags or as 311 variant subtags. 313 Note that sequences of private use and extension subtags MUST occur 314 at the end of the sequence of subtags and MUST NOT be interspersed 315 with subtags defined elsewhere in this document. 317 Single-letter and single-digit subtags are reserved for current or 318 future use. These include the following current uses: 320 o The single-letter subtag 'x' is reserved to introduce a sequence 321 of private use subtags. The interpretation of any private use 322 subtags is defined solely by private agreement and is not defined 323 by the rules in this section or in any standard or registry 324 defined in this document. 326 o All other single-letter subtags are reserved to introduce 327 standardized extension subtag sequences as described in 328 Section 3.7. 330 o The single-letter subtag 'i' is used by some grandfathered tags, 331 such as "i-default", where it always appears in the first position 332 and cannot be confused with an extension. 334 2.2.1. Primary Language Subtag 336 The primary language subtag is the first subtag in a language tag 337 (with the exception of private use and certain grandfathered tags) 338 and cannot be omitted. The following rules apply to the primary 339 language subtag: 341 1. All two-character primary language subtags were defined in the 342 IANA registry according to the assignments found in the standard 343 ISO 639 Part 1, "ISO 639-1:2002, Codes for the representation of 344 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 345 assignments subsequently made by the ISO 639-1 registration 346 authority (RA) or governing standardization bodies. 348 2. All three-character primary language subtags were defined in the 349 IANA registry according to the assignments found in either ISO 350 639 Part 2, "ISO 639-2:1998 - Codes for the representation of 351 names of languages -- Part 2: Alpha-3 code - edition 1" 352 [ISO639-2], ISO 639 Part 3, "Codes for the representation of 353 names of languages -- Part 3: Alpha-3 code for comprehensive 354 coverage of languages" [ISO639-3], or assignments subsequently 355 made by the relevant ISO 639 registration authorities or 356 governing standardization bodies. 358 3. The subtags in the range 'qaa' through 'qtz' are reserved for 359 private use in language tags. These subtags correspond to codes 360 reserved by ISO 639-2 for private use. These codes MAY be used 361 for non-registered primary language subtags (instead of using 362 private use subtags following 'x-'). Please refer to Section 4.5 363 for more information on private use subtags. 365 4. All four-character language subtags are reserved for possible 366 future standardization. 368 5. All language subtags of 5 to 8 characters in length in the IANA 369 registry were defined via the registration process in Section 3.5 370 and MAY be used to form the primary language subtag. At the time 371 this document was created, there were no examples of this kind of 372 subtag and future registrations of this type will be discouraged: 373 primary languages are strongly RECOMMENDED for registration with 374 ISO 639, and proposals rejected by ISO 639/RA-JAC will be closely 375 scrutinized before they are registered with IANA. 377 6. The single-character subtag 'x' as the primary subtag indicates 378 that the language tag consists solely of subtags whose meaning is 379 defined by private agreement. For example, in the tag "x-fr-CH", 380 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 381 French language or the country of Switzerland (or any other value 382 in the IANA registry) unless there is a private agreement in 383 place to do so. See Section 4.5. 385 7. The single-character subtag 'i' is used by some grandfathered 386 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 387 grandfathered tags have a primary language subtag in their first 388 position.) 390 8. Other values MUST NOT be assigned to the primary subtag except by 391 revision or update of this document. 393 Note: For languages that have both an ISO 639-1 two-character code 394 and a three character code assigned by either ISO 639-2 or ISO 639-3, 395 only the ISO 639-1 two-character code is defined in the IANA 396 registry. 398 Note: For languages that have no ISO 639-1 two-character code and for 399 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 400 (Bibliographic) codes differ, only the Terminology code is defined in 401 the IANA registry. At the time this document was created, all 402 languages that had both kinds of three-character code were also 403 assigned a two-character code; it is expected that future assignments 404 of this nature will not occur. 406 Note: To avoid problems with versioning and subtag choice as 407 experienced during the transition between RFC 1766 and RFC 3066, as 408 well as the canonical nature of subtags defined by this document, the 409 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 410 RA-JAC) has included the following statement in [iso639.prin]: 412 "A language code already in ISO 639-2 at the point of freezing ISO 413 639-1 shall not later be added to ISO 639-1. This is to ensure 414 consistency in usage over time, since users are directed in 415 Internet applications to employ the alpha-3 code when an alpha-2 416 code for that language is not available." 418 In order to avoid instability in the canonical form of tags, if a 419 two-character code is added to ISO 639-1 for a language for which a 420 three-character code was already included in either ISO 639-2 or ISO 421 639-3, the two-character code MUST NOT be registered. See 422 Section 3.4. 424 For example, if some content were tagged with 'haw' (Hawaiian), which 425 currently has no two-character code, the tag would not be invalidated 426 if ISO 639-1 were to assign a two-character code to the Hawaiian 427 language at a later date. 429 Note: An example of independent primary language subtag registration 430 might include: one of the grandfathered IANA registrations is 431 "i-enochian". The subtag 'enochian' could be registered in the IANA 432 registry as a primary language subtag (assuming that ISO 639 does not 433 register this language first), making tags such as "enochian-AQ" and 434 "enochian-Latn" valid. 436 2.2.2. Extended Language Subtags 438 Extended language subtags are permanently reserved. They MUST NOT be 439 registered or used to form language tags (except in grandfathered 440 tags). They were originally created to allow for certain kinds of 441 compatibility mappings which ultimately were not used. 443 2.2.3. Script Subtag 445 Script subtags are used to indicate the script or writing system 446 variations that distinguish the written forms of a language or its 447 dialects. The following rules apply to the script subtags: 449 1. Script subtags MUST follow the primary language subtag and MUST 450 precede any other type of subtag. 452 2. All four-character subtags were defined according to 453 [ISO15924]--"Codes for the representation of the names of 454 scripts": alpha-4 script codes, or subsequently assigned by the 455 ISO 15924 maintenance agency or governing standardization bodies, 456 denoting the script or writing system used in conjunction with 457 this language. 459 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 460 use in language tags. These subtags correspond to codes reserved 461 by ISO 15924 for private use. These codes MAY be used for non- 462 registered script values. Please refer to Section 4.5 for more 463 information on private use subtags. 465 4. Script subtags MUST NOT be registered using the process in 466 Section 3.5 of this document. Variant subtags MAY be considered 467 for registration for that purpose. 469 5. There MUST be at most one script subtag in a language tag, and 470 the script subtag SHOULD be omitted when it adds no 471 distinguishing value to the tag or when the primary language 472 subtag's record includes a Suppress-Script field listing the 473 applicable script subtag. 475 Example: "sr-Latn" represents Serbian written using the Latin script. 477 2.2.4. Region Subtag 479 Region subtags are used to indicate linguistic variations associated 480 with or appropriate to a specific country, territory, or region. 481 Typically, a region subtag is used to indicate regional dialects or 482 usage, or region-specific spelling conventions. A region subtag can 483 also be used to indicate that content is expressed in a way that is 484 appropriate for use throughout a region, for instance, Spanish 485 content tailored to be useful throughout Latin America. 487 The following rules apply to the region subtags: 489 1. Region subtags MUST follow any language or script subtags and 490 MUST precede any other type of subtag. 492 2. All two-character subtags following the primary subtag were 493 defined in the IANA registry according to the assignments found 494 in [ISO3166-1] ("Codes for the representation of names of 495 countries and their subdivisions -- Part 1: Country codes") using 496 the list of alpha-2 country codes, or using assignments 497 subsequently made by the ISO 3166 maintenance agency or governing 498 standardization bodies. In addition, the codes that are 499 "exceptionally reserved" (as opposed to "assigned") in ISO 3166-1 500 were also defined in the registry, with the exception of 'UK', 501 which is an exact synonym for the assigned code 'GB'. 503 3. All three-character subtags consisting of digit (numeric) 504 characters following the primary subtag were defined in the IANA 505 registry according to the assignments found in UN Standard 506 Country or Area Codes for Statistical Use [UN_M.49] or 507 assignments subsequently made by the governing standards body. 508 Note that not all of the UN M.49 codes are defined in the IANA 509 registry. The following rules define which codes are entered 510 into the registry as valid subtags: 512 A. UN numeric codes assigned to 'macro-geographical 513 (continental)' or sub-regions MUST be registered in the 514 registry. These codes are not associated with an assigned 515 ISO 3166 alpha-2 code and represent supra-national areas, 516 usually covering more than one nation, state, province, or 517 territory. 519 B. UN numeric codes for 'economic groupings' or 'other 520 groupings' MUST NOT be registered in the IANA registry and 521 MUST NOT be used to form language tags. 523 C. UN numeric codes for countries or areas which are assigned 524 ISO 3166 alpha2 codes already present in the registry, MUST 525 be defined according to the rules in Section 3.4 and MUST be 526 used to form language tags that represent the country or 527 region for which they are defined. This happens when ISO 528 3166 reassigns a code formerly used for one country to 529 another. 531 D. UN numeric codes for countries or areas for which there is an 532 associated ISO 3166 alpha-2 code in the registry MUST NOT be 533 entered into the registry and MUST NOT be used to form 534 language tags. Note that the ISO 3166-based subtag in the 535 registry MUST actually be associated with the UN M.49 code in 536 question. 538 E. UN numeric codes and ISO 3166 alpha-2 codes for countries or 539 areas listed as eligible for registration in [RFC4645] but 540 not presently registered MAY be entered into the IANA 541 registry via the process described in Section 3.5. Once 542 registered, these codes MAY be used to form language tags. 544 F. All other UN numeric codes for countries or areas that do not 545 have an associated ISO 3166 alpha-2 code MUST NOT be entered 546 into the registry and MUST NOT be used to form language tags. 547 For more information about these codes, see Section 3.4. 549 4. Note: The alphanumeric codes in Appendix X of the UN document 550 MUST NOT be entered into the registry and MUST NOT be used to 551 form language tags. (At the time this document was created, 552 these values matched the ISO 3166 alpha-2 codes.) 554 5. There MUST be at most one region subtag in a language tag and the 555 region subtag MAY be omitted, as when it adds no distinguishing 556 value to the tag. 558 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 559 reserved for private use in language tags. These subtags 560 correspond to codes reserved by ISO 3166 for private use. These 561 codes MAY be used for private use region subtags (instead of 562 using a private use subtag sequence). Please refer to 563 Section 4.5 for more information on private use subtags. 565 "de-CH" represents German ('de') as used in Switzerland ('CH'). 567 "sr-Latn-RS" represents Serbian ('sr') written using Latin script 568 ('Latn') as used in Serbia ('RS'). 570 "es-419" represents Spanish ('es') appropriate to the UN-defined 571 Latin America and Caribbean region ('419'). 573 2.2.5. Variant Subtags 575 Variant subtags are used to indicate additional, well-recognized 576 variations that define a language or its dialects that are not 577 covered by other available subtags. The following rules apply to the 578 variant subtags: 580 1. Variant subtags MUST follow any language, script, or region 581 subtags, but MUST precede any extension or private use subtag 582 sequences. 584 2. Variant subtags, as a collection, are not associated with any 585 particular external standard. The meaning of variant subtags in 586 the registry is defined in the course of the registration process 587 defined in Section 3.5. Note that any particular variant subtag 588 might be associated with some external standard. However, 589 association with a standard is not required for registration. 591 3. More than one variant MAY be used to form the language tag. 593 4. Variant subtags MUST be registered with IANA according to the 594 rules in Section 3.5 of this document before being used to form 595 language tags. In order to distinguish variants from other types 596 of subtags, registrations MUST meet the following length and 597 content restrictions: 599 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 600 at least five characters long. 602 2. Variant subtags that begin with a digit (0-9) MUST be at 603 least four characters long. 605 Variant subtag records in the language subtag registry MAY include 606 one or more 'Prefix' fields. The 'Prefix' indicates the language tag 607 or tags that would make a suitable prefix (with other subtags, as 608 appropriate) in forming a language tag with the variant. That is, 609 each of the subtags in the prefix SHOULD appear before the variant. 610 For example, the subtag 'nedis' has a Prefix of "sl", making it 611 suitable for forming language tags such as "sl-nedis" and "sl-IT- 612 nedis", but not suitable for use in a tag such as "zh-nedis" or "it- 613 IT-nedis". 615 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 617 "de-CH-1996" represents German as used in Switzerland and as written 618 using the spelling reform beginning in the year 1996 C.E. 620 Most variants that share a prefix are mutually exclusive. For 621 example, the German orthographic variations '1996' and '1901' SHOULD 622 NOT be used in the same tag, as they represent the dates of different 623 spelling reforms. A variant that can meaningfully be used in 624 combination with another variant SHOULD include a 'Prefix' field in 625 its registry record that lists that other variant. For example, if 626 another German variant 'example' were created that made sense to use 627 with '1996', then 'example' should include two Prefix fields: "de" 628 and "de-1996". 630 2.2.6. Extension Subtags 632 Extensions provide a mechanism for extending language tags for use in 633 various applications. They are intended to identify information 634 which is commonly used in association with languages or language 635 tags, but which are not part of language identification. See 636 Section 3.7. The following rules apply to extensions: 638 1. An extension MUST follow at least a primary language subtag. 639 That is, a language tag cannot begin with an extension. 640 Extensions extend language tags, they do not override or replace 641 them. For example, "a-value" is not a well-formed language tag, 642 while "de-a-value" is. 644 2. Extension subtags are separated from the other subtags defined 645 in this document by a single-character subtag ("singleton"). 646 The singleton MUST be one allocated to a registration authority 647 via the mechanism described in Section 3.7 and MUST NOT be the 648 letter 'x', which is reserved for private use subtag sequences. 650 3. Note: Private use subtag sequences starting with the singleton 651 subtag 'x' are described in Section 2.2.7 below. 653 4. Each singleton subtag MUST appear at most one time in each tag 654 (other than as a private use subtag). That is, singleton 655 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 656 ccc" is invalid because the subtag 'a' appears twice. Note that 657 the tag "en-a-bbb-x-a-ccc" is valid because the second 658 appearance of the singleton 'a' is in a private use sequence. 660 5. Extension subtags MUST meet all of the requirements for the 661 content and format of subtags defined in this document. 663 6. Extension subtags MUST meet whatever requirements are set by the 664 document that defines their singleton prefix and whatever 665 requirements are provided by the maintaining authority. 667 7. Each extension subtag MUST be from two to eight characters long 668 and consist solely of letters or digits, with each subtag 669 separated by a single '-'. 671 8. Each singleton MUST be followed by at least one extension 672 subtag. For example, the tag "tlh-a-b-foo" is invalid because 673 the first singleton 'a' is followed immediately by another 674 singleton 'b'. 676 9. Extension subtags MUST follow all language, extended language, 677 script, region, and variant subtags in a tag. 679 10. All subtags following the singleton and before another singleton 680 are part of the extension. Example: In the tag "fr-a-Latn", the 681 subtag 'Latn' does not represent the script subtag 'Latn' 682 defined in the IANA Language Subtag Registry. Its meaning is 683 defined by the extension 'a'. 685 11. In the event that more than one extension appears in a single 686 tag, the tag SHOULD be canonicalized as described in 687 Section 4.4. 689 For example, if the prefix singleton 'r' and the shown subtags were 690 defined, then the following tag would be a valid example: "en-Latn- 691 GB-boont-r-extended-sequence-x-private" 693 2.2.7. Private Use Subtags 695 Private use subtags are used to indicate distinctions in language 696 important in a given context by private agreement. The following 697 rules apply to private use subtags: 699 1. Private use subtags are separated from the other subtags defined 700 in this document by the reserved single-character subtag 'x'. 702 2. Private use subtags MUST conform to the format and content 703 constraints defined in the ABNF for all subtags. 705 3. Private use subtags MUST follow all language, extended language, 706 script, region, variant, and extension subtags in the tag. 707 Another way of saying this is that all subtags following the 708 singleton 'x' MUST be considered private use. Example: The 709 subtag 'US' in the tag "en-x-US" is a private use subtag. 711 4. A tag MAY consist entirely of private use subtags. 713 5. No source is defined for private use subtags. Use of private use 714 subtags is by private agreement only. 716 6. Private use subtags are NOT RECOMMENDED where alternatives exist 717 or for general interchange. See Section 4.5 for more information 718 on private use subtag choice. 720 For example: Users who wished to utilize codes from the Ethnologue 721 publication of SIL International for language identification might 722 agree to exchange tags such as "az-Arab-x-AZE-derbend". This example 723 contains two private use subtags. The first is 'AZE' and the second 724 is 'derbend'. 726 2.2.8. Grandfathered Registrations 728 Prior to RFC 4646, whole language tags were registered according to 729 the rules in RFC 1766 and/or RFC 3066. These registered tags 730 maintain their validity. Of those tags, those that were made 731 obsolete or redundant by the advent of RFC 4646, by this document, or 732 by subsequent registration of subtags are maintained in the registry 733 in records as "redundant" records. Those tags that do not match the 734 'langtag' production in the ABNF in this document or that contain 735 subtags that do not individually appear in the registry are 736 maintained in the registry in records of the "grandfathered" type. 738 Grandfathered tags contain one or more subtags that are not defined 739 in the Language Subtag Registry (see Section 3). Redundant tags 740 consist entirely of subtags defined above and whose independent 741 registration was superseded by [RFC4646]. For more information see 742 Section 3.8. 744 Some grandfathered tags are "regular" in that they match the 745 'langtag' production in Figure 1. In some cases, these tags could 746 become redundant if their (currently unregistered) subtags were to be 747 registered (as variants, for example). In other cases, although the 748 subtags match the language tag pattern, the meaning assigned to the 749 various subtags is prohibited by rules elsewhere in this document. 750 Those tags can never become redundant. 752 The remaining grandfathered tags are "irregular" and do not match the 753 'langtag' production. These are listed in the 'irregular' production 754 in Figure 1. These grandfathered tags can never become redundant. 755 Many of these tags have been superseded by other registrations: their 756 record contains a Preferred-Value field that really ought to be used 757 to form language tags representing that value. 759 2.2.9. Classes of Conformance 761 Implementations sometimes need to describe their capabilities with 762 regard to the rules and practices described in this document. Tags 763 can be checked or verified in a number of ways, but two particular 764 classes of tag conformance are formally defined here. 766 A tag is considered "well-formed" if it conforms to the ABNF 767 (Section 2.1). Note that irregular grandfathered tags are now listed 768 in the 'irregular' production. 770 A tag is considered "valid" if it well-formed and it also satisfies 771 these conditions: 773 o The tag is either a grandfathered tag, or all of its language, 774 extended language, script, region, and variant subtags appear in 775 the IANA language subtag registry as of the particular registry 776 date. 778 o There are no duplicate singleton (extension) subtags and no 779 duplicate variant subtags. 781 o For each subtag that has a 'Prefix' field in the registry, the 782 Prefix matches the language tag using Extended Filtering 783 [RFC4647]. That is, each subtag in the Prefix is present in the 784 tag and in the same order. Furthermore, all of the Prefix's 785 subtags MUST appear before the subtag. For example, the Prefix 786 "zh-TW" matches the tag "zh-Hant-TW". 788 Note that a tag's validity depends on the date of the registry used 789 to validate the tag. A more-recent copy of the registry might 790 contain a subtag that an older version does not. 792 A tag is considered "valid" for a given extension (Section 3.7) (as 793 of a particular version, revision, and date) if it meets the criteria 794 for "valid" above and also satisfies this condition: 796 Each subtag used in the extension part of the tag is valid 797 according to the extension. 799 Some older implementations consider a tag "well-formed" if it matches 800 the ABNF in [RFC4646]. In that version, a well-formed tag could 801 contain a sequence matching the obsolete 'extlang' production. Other 802 than a few grandfathered tags (which are handled separately), no 803 valid tags have ever matched that pattern. The difference between 804 that ABNF and Figure 1 is that the language production is replaced as 805 follows: 807 obs-language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code 808 / 4ALPHA ; reserved for future use 809 / 5*8ALPHA ; registered language subtag 811 extlang = *3("-" 3ALPHA) ; removed in this version 813 Figure 2: Obsolete Language ABNF 815 Older language tag implementations sometimes reference [RFC3066]. 816 Again, all valid tags under that version also match this document's 817 language tag ABNF. However, a wider array of tags could be 818 considered "well-formed" under that document. The grammar used in 819 that document was: 821 Language-Tag = Primary-subtag *( "-" Subtag ) 823 Primary-subtag = 1*8ALPHA 825 Subtag = 1*8(ALPHA / DIGIT) 827 Figure 3 829 3. Registry Format and Maintenance 831 This section defines the Language Subtag Registry and the maintenance 832 and update procedures associated with it, as well as a registry for 833 extensions to language tags (Section 3.7). 835 The Language Subtag Registry contains a comprehensive list of all of 836 the subtags valid in language tags. This allows implementers a 837 straightforward and reliable way to validate language tags. The 838 Language Subtag Registry will be maintained so that, except for 839 extension subtags, it is possible to validate all of the subtags that 840 appear in a language tag under the provisions of this document or its 841 revisions or successors. In addition, the meaning of the various 842 subtags will be unambiguous and stable over time. (The meaning of 843 private use subtags, of course, is not defined by the IANA registry.) 845 3.1. Format of the IANA Language Subtag Registry 847 The IANA Language Subtag Registry ("the registry") is a machine- 848 readable file in the format described in this section, plus copies of 849 the registration forms approved in accordance with the process 850 described in Section 3.5. The existing registration forms for 851 grandfathered and redundant tags taken from RFC 3066 will be 852 maintained as part of the obsolete RFC 3066 registry. The remaining 853 set of subtags created by either [RFC4645] or [registry-update] will 854 not have registration forms created for them. 856 3.1.1. File Format 858 The registry consists of a series of records stored in the record-jar 859 format (described in [record-jar]). Each record, in turn, consists 860 of a series of fields that describe the various subtags and tags. 861 The registry is a Unicode [Unicode] text file, using the UTF-8 862 [RFC3629] character encoding. 864 Each field can be considered a single, logical line of Unicode 865 [Unicode] characters, comprising a field-name and a field-body 866 separated by a COLON character (%x3A). Each field is terminated by 867 the newline sequence CRLF. The text in each field MUST be in Unicode 868 Normalization Form C (NFC). 870 A collection of fields forms a 'record'. Records are separated by 871 lines containing only the sequence "%%" (%x25.25). 873 Although fields are logically a single line of text, each line of 874 text in the file format is limited to 72 bytes in length. To 875 accommodate this, the field-body can be split into a multiple-line 876 representation; this is called "folding". Folding is always done on 877 Unicode default grapheme boundaries (that is, never in the middle of 878 a multibyte UTF-8 sequence nor in the middle of a combining character 879 sequence). 881 Although the file format uses the UTF-8 encoding, unless otherwise 882 indicated, fields are restricted to the printable characters from the 883 US-ASCII [ISO646] repertoire. 885 The format of the registry is described by the following ABNF (per 886 [RFC4234]): 888 registry = record *("%%" CRLF record) 889 record = 1*( field-name *SP ":" *SP field-body CRLF ) 890 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] 891 field-body = *([[*SP CRLF] 1*SP] 1*CHARS) 892 CHARS = (%x21-10FFFF) ; Unicode code points 894 Figure 4: Registry Format ABNF 896 The sequence '..' (%x2E.2E) in a field-body denotes a range of 897 values. Such a range represents all subtags of the same length that 898 are in alphabetic or numeric order within that range, including the 899 values explicitly mentioned. For example 'a..c' denotes the values 900 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 901 '13'. 903 All fields whose field-body contains a date value use the "full-date" 904 format specified in [RFC3339]. For example: "2004-06-28" represents 905 June 28, 2004, in the Gregorian calendar. 907 3.1.2. Record Definitions 909 There are three types of records in the registry: "File-Date", 910 "Subtag", and "Tag" records. 912 The first record in the registry is a "File-Date" record. This 913 record contains the single field whose field-name is "File-Date" (see 914 Figure 4). The field-body of this record contains the last 915 modification date of this copy of the registry, making it possible to 916 compare different versions of the registry. The registry on the IANA 917 website is the most current. Versions with an older date than that 918 one are not up-to-date. 920 File-Date: 2004-06-28 921 %% 923 Figure 5: Example of the File-Date Record 925 Subsequent records represent either subtags or tags in the registry. 926 "Subtag" records contain a field with a field-name of "Subtag", 927 while, unsurprisingly, "Tag" records contain a field with a field- 928 name of "Tag". Each of the fields in each record MUST occur no more 929 than once, unless otherwise noted below. Each record MUST contain 930 the following fields: 932 o 'Type' 934 * Type's field-body MUST consist of one of the following strings: 935 "language", "script", "region", "variant", "grandfathered", and 936 "redundant" and denotes the type of tag or subtag. 938 o Either 'Subtag' or 'Tag' 940 * Subtag's field-body contains the subtag being defined. This 941 field MUST only appear in records of whose 'Type' has one of 942 these values: "language", "script", "region", or "variant". 944 * Tag's field-body contains a complete language tag. This field 945 MUST only appear in records whose 'Type' has one of these 946 values: "grandfathered" or "redundant". Note that the field- 947 body will always follow the 'grandfathered' production in the 948 ABNF in Section 2.1 950 o Description 952 * Description's field-body contains a non-normative description 953 of the subtag or tag. 955 o Added 957 * Added's field-body contains the date the record was added to 958 the registry. 960 Each record MAY also contain the following fields: 962 o Preferred-Value 964 * For fields of type 'script', 'region', and 'variant', 965 'Preferred-Value' contains the subtag of the same 'Type' that 966 is preferred for forming the language tag. 968 * For fields of type 'language', 'Preferred-Value' contains the 969 primary language subtag that is preferred when forming the 970 language tag. 972 * For fields of type 'grandfathered' and 'redundant', 'Preferred- 973 Value' contains a canonical mapping to a complete language tag. 975 o Deprecated 977 * The field-body of the Deprecated field contains the date the 978 record was deprecated. 980 o Prefix 982 * Prefix's field-body contains a language tag with which this 983 subtag MAY be used to form a new language tag, perhaps with 984 other subtags as well. The Prefix's subtags appear before the 985 subtag. This field MUST only appear in records whose 'Type' 986 field-body is 'variant'. For example, the 'Prefix' for the 987 variant 'nedis' is 'sl', meaning that the tags "sl-nedis" and 988 "sl-IT-nedis" might be appropriate while the tag "is-nedis" is 989 not. 991 o Comments 993 * Comments contains additional information about the subtag, as 994 deemed appropriate for understanding the registry and 995 implementing language tags using the subtag or tag. 997 o Suppress-Script 999 * Suppress-Script contains a script subtag that SHOULD NOT be 1000 used to form language tags with the associated primary language 1001 subtag. This field MUST only appear in records whose 'Type' 1002 field-body is 'language'. See Section 4.1. 1004 o Macrolanguage 1006 * Macrolanguage contains a primary language subtag defined by ISO 1007 639 as a "macrolanguage" that encompasses this language subtag. 1008 This field MUST only appear in records whose 'Type' field-body 1009 is 'language'. 1011 Future versions of this document might add additional fields to the 1012 registry, so implementations SHOULD ignore fields found in the 1013 registry that are not defined in this document. 1015 3.1.3. Subtag and Tag Fields 1017 The 'Subtag' field MUST NOT use uppercase letters to form the subtag, 1018 with two exceptions. Subtags whose 'Type' field is 'script' (in 1019 other words, subtags defined by ISO 15924) MUST use titlecase. 1021 Subtags whose 'Type' field is 'region' (in other words, the non- 1022 numeric region subtags defined by ISO 3166) MUST use all uppercase. 1023 These exceptions mirror the use of case in the underlying standards. 1025 Each subtag in the tags contained in a 'Tag' field MUST be formatted 1026 using the rules in the preceding paragraph. That is, all subtags are 1027 lowercase except for subtags that represent script or region codes. 1029 3.1.4. Description Field 1031 The field 'Description' contains a description of the tag or subtag 1032 in the record. The 'Description' field MAY appear more than once per 1033 record, that is, there can be multiple descriptions for a given 1034 record. The 'Description' field MAY include the full range of 1035 Unicode characters. At least one of the 'Description' fields MUST be 1036 written or transcribed into the Latin script; additional 1037 'Description' fields MAY also include a description in a non-Latin 1038 script. Each 'Description' field MUST be unique, both within the 1039 record in which it appears and for the collection of records of the 1040 same type. Moreover, formatting variations of the same description 1041 MUST NOT occur in that specific record or in any other record of the 1042 same type. For example, while the ISO 639-1 code 'fy' contains both 1043 the descriptions "Western Frisian" and "Frisian, Western", only one 1044 of these descriptions appears in the registry. 1046 The 'Description' field is used for identification purposes and 1047 SHOULD NOT be taken to represent the actual native name of the 1048 language or variation or to be in any particular language. 1050 For subtags taken from a source standard (such as ISO 639 or ISO 1051 3166), the 'Description' value(s) SHOULD also be taken from the 1052 source standard. Multiple descriptions in the source standard MUST 1053 be split into separate 'Description' fields. The source standard's 1054 descriptions MAY be edited, either prior to insertion or via the 1055 registration process. For fields of type 'language', the first 1056 'Description' field appearing in the Registry corresponds to the 1057 Reference Name assigned by ISO 639-3. This helps facilitate cross- 1058 referencing between ISO 639 and the registry. 1060 When creating or updating a record due to the action of one of the 1061 source standards, the Language Subtag Reviewer SHOULD remove 1062 duplicate or redundant descriptions and MAY edit descriptions to 1063 correct irregularities in formatting (such as misspellings, 1064 inappropriate apostrophes or other punctuation, or excessive or 1065 missing spaces) prior to submitting the proposed record to the ietf- 1066 languages list. 1068 Note: Descriptions in registry entries that correspond to ISO 639, 1069 ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate 1070 the meaning of that identifier as defined in the source standard at 1071 the time it was added to the registry. The description does not 1072 replace the content of the source standard itself. The descriptions 1073 are not intended to be the localized English names for the subtags. 1074 Localization or translation of language tag and subtag descriptions 1075 is out of scope of this document. 1077 3.1.5. Deprecated Field 1079 The field 'Deprecated' MAY be added to any record via the maintenance 1080 process described in Section 3.3 or via the registration process 1081 described in Section 3.5. Usually, the addition of a 'Deprecated' 1082 field is due to the action of one of the standards bodies, such as 1083 ISO 3166, withdrawing a code. In some historical cases, it might not 1084 have been possible to reconstruct the original deprecation date. For 1085 these cases, an approximate date appears in the registry. Although 1086 valid in language tags, subtags and tags with a 'Deprecated' field 1087 are deprecated and validating processors SHOULD NOT generate these 1088 subtags. Note that a record that contains a 'Deprecated' field and 1089 no corresponding 'Preferred-Value' field has no replacement mapping. 1091 3.1.6. Preferred-Value Field 1093 The field 'Preferred-Value' contains a mapping between the record in 1094 which it appears and another tag or subtag. The value in this field 1095 is strongly RECOMMENDED as the best choice to represent the value of 1096 this record when selecting a language tag. These values form three 1097 groups: 1099 1. ISO 639 language codes that were later withdrawn in favor of 1100 other codes. These values are mostly a historical curiosity. 1102 2. ISO 3166 region codes that have been withdrawn in favor of a new 1103 code. This sometimes happens when a country changes its name or 1104 administration in such a way that warrants a new region code. 1106 3. Grandfathered or redundant tags from RFC 3066. In many cases, 1107 these tags have become obsolete because the values they represent 1108 were later encoded by ISO 639. 1110 Records that contain a 'Preferred-Value' field MUST also have a 1111 'Deprecated' field. This field contains a date of deprecation. 1112 Thus, a language tag processor can use the registry to construct the 1113 valid, non-deprecated set of subtags for a given date. In addition, 1114 for any given tag, a processor can construct the set of valid 1115 language tags that correspond to that tag for all dates up to the 1116 date of the registry. The ability to do these mappings MAY be 1117 beneficial to applications that are matching, selecting, for 1118 filtering content based on its language tags. 1120 Note that 'Preferred-Value' mappings in records of type 'region' 1121 sometimes do not represent exactly the same meaning as the original 1122 value. There are many reasons for a country code to be changed, and 1123 the effect this has on the formation of language tags will depend on 1124 the nature of the change in question. 1126 In particular, the 'Preferred-Value' field does not imply retagging 1127 content that uses the affected subtag. 1129 The field 'Preferred-Value' MUST NOT be modified once created in the 1130 registry. The field MAY be added to records according to the rules 1131 in Section 3.3. 1133 The 'Preferred-Value' field in records of type "grandfathered" and 1134 "redundant" contains whole language tags that are strongly 1135 RECOMMENDED for use in place of the record's value. In many cases, 1136 the mappings were created by deprecation of the tags during the 1137 period before this document was adopted. For example, the tag "no- 1138 nyn" was deprecated in favor of the ISO 639-1-defined language code 1139 'nn'. 1141 3.1.7. Prefix Field 1143 The 'Prefix' field contains an extended language range whose subtags 1144 are appropriate to use with this subtag: each of the subtags in one 1145 of the subtag's Prefix fields MUST appear before the variant in a 1146 valid tag. For example, the variant subtag '1996' has a 'Prefix' 1147 field of "de". This means that tags starting with the sequence "de-" 1148 are appropriate with this subtag, so "de-Latg-1996" and "de-CH-1996" 1149 are both acceptable, while the tag "fr-1996" is an inappropriate 1150 choice. 1152 The field of type 'Prefix' MUST NOT be removed from any record. The 1153 field-body for this type of field MAY be modified, but only if the 1154 modification broadens the meaning of the subtag. That is, the field- 1155 body can be replaced only by a prefix of itself. For example, the 1156 Prefix "be-Latn" (Belarusian, Latin script) could be replaced by the 1157 Prefix "be" (Belarusian) but not by the Prefix "ru-Latn" (Russian, 1158 Latin script). 1160 Records of type 'variant' MAY have more than one field of type 1161 'Prefix'. Additional fields of this type MAY be added to a 'variant' 1162 record via the registration process. 1164 The field-body of the 'Prefix' field MUST NOT conflict with any 1165 'Prefix' already registered for a given record. Such a conflict 1166 would occur when no valid tag could be constructed that would contain 1167 the prefix, such as when two subtags each have a 'Prefix' that 1168 contains the other subtag. For example, suppose that the subtag 1169 'avariant' has the prefix "es-bvariant". Then the subtag 'bvariant' 1170 cannot given the prefix 'avariant', for that would require a tag of 1171 the form "es-avariant-bvariant-avariant", which would not be valid. 1173 3.1.8. Suppress-Script Field 1175 The field 'Suppress-Script' contains a script subtag (whose record 1176 appears in the registry). The field 'Suppress-Script' MUST only 1177 appear in records whose 'Type' field-body is 'language'. This field 1178 MUST NOT appear more than one time in a record. This field indicates 1179 a script used to write the overwhelming majority of documents for the 1180 given language. This script code therefore adds no distinguishing 1181 information to a language tag. This helps ensure greater 1182 compatibility between the language tags generated according to the 1183 rules in this document and language tags and tag processors or 1184 consumers based on RFC 3066 by indicating that the script subtag 1185 SHOULD NOT be used for most documents in that language. For example, 1186 virtually all Icelandic documents are written in the Latin script, 1187 making the subtag 'Latn' redundant in the tag "is-Latn". 1189 Many language subtag records do not have a Suppress-Script field. 1190 The lack of a Suppress-Script might indicate that the language is 1191 customarily written in more than one script or that the language is 1192 not customarily written at all. It might also mean that sufficient 1193 information was not available when the record was created and thus 1194 remains a candidate for future registration. 1196 3.1.9. Macrolanguage Field 1198 The Macrolanguage field contains a primary language subtag that 1199 encompasses this subtag's language. That is, the language subtag 1200 whose record this field appears in is sometimes considered to be a 1201 sub-language of the Macrolanguage. Macrolanguage values are defined 1202 by ISO 639-3 and the exact nature of the relationship between the 1203 encompassed and encompassing languages varies on a case-by-case 1204 basis. 1206 This field can be useful to applications or users when selecting 1207 language tags or as additional metadata useful in matching. The 1208 Macrolanguage field can only occur in records of type 'language'. 1209 Only values assigned by ISO 639-3 will be considered for inclusion. 1210 Macrolanguage fields MAY be added or removed via the normal 1211 registration process whenever ISO 639-3 defines new values or 1212 withdraws old values. Macrolanguages are informational, and MAY be 1213 removed or changed if ISO 639-3 changes the values. 1215 For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn' 1216 (Norwegian Nynorsk) each have a Macrolanguage entry of 'no' 1217 (Norwegian). For more information see Section 4.1. 1219 3.1.10. Comments Field 1221 The field 'Comments' conveys additional information about the record 1222 and MAY appear more than once per record. The field-body MAY include 1223 the full range of Unicode characters and is not restricted to any 1224 particular script. This field MAY be inserted or changed via the 1225 registration process and no guarantee of stability is provided. The 1226 content of this field is not restricted, except by the need to 1227 register the information, the suitability of the request, and by 1228 reasonable practical size limitations. 1230 3.2. Language Subtag Reviewer 1232 The Language Subtag Reviewer moderates the ietf-languages mailing 1233 list, responds to requests for registration, and performs the other 1234 registry maintenance duties described in Section 3.3. Only the 1235 Language Subtag Reviewer is permitted to request IANA to change, 1236 update, or add records to the Language Subtag Registry. The Language 1237 Subtag Reviewer MAY delegate list moderation and other clerical 1238 duties as needed. 1240 The Language Subtag Reviewer is appointed by the IESG for an 1241 indefinite term, subject to removal or replacement at the IESG's 1242 discretion. The IESG will solicit nominees for the position (upon 1243 adoption of this document or upon a vacancy) and then solicit 1244 feedback on the nominees' qualifications. Qualified candidates 1245 should be familiar with BCP 47 and its requirements; be willing to 1246 fairly, responsively, and judiciously administer the registration 1247 process; and be suitably informed about the issues of language 1248 identification so that they can draw upon and assess the claim and 1249 contributions of language experts and subtag requesters. 1251 The subsequent performance or decisions of the Language Subtag 1252 Reviewer MAY be appealed to the IESG under the same rules as other 1253 IETF decisions (see [RFC2026]). The IESG can reverse or overturn the 1254 decision of the Language Subtag Reviewer, provide guidance, or take 1255 other appropriate actions. 1257 3.3. Maintenance of the Registry 1259 Maintenance of the registry requires that as codes are assigned or 1260 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1261 Subtag Reviewer MUST evaluate each change and determine the 1262 appropriate course of action according to the rules in this document. 1263 Such updates follow the registration process described in 1264 Section 3.5. Usually the Language Subtag Reviewer will start the 1265 process for the new or updated record by filling in the registration 1266 form and submitting it. If a change to one of these standards takes 1267 place and the Language Subtag Reviewer does not do this in a timely 1268 manner, then any interested party MAY submit the form. Thereafter 1269 the registration process continues normally. 1271 The Language Subtag Reviewer MUST ensure that new subtags meet the 1272 requirements elsewhere in this document (and most especially in 1273 Section 3.4) or submit an appropriate registration form for an 1274 alternate subtag as described in that section. Each individual 1275 subtag affected by a change MUST be sent to the ietf-languages list 1276 with its own registration form and in a separate message. 1278 3.4. Stability of IANA Registry Entries 1280 The stability of entries and their meaning in the registry is 1281 critical to the long-term stability of language tags. The rules in 1282 this section guarantee that a specific language tag's meaning is 1283 stable over time and will not change. 1285 These rules specifically deal with how changes to codes (including 1286 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1287 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1288 Subtag Registry. Assignments to the IANA Language Subtag Registry 1289 MUST follow the following stability rules: 1291 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1292 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1293 guaranteed to be stable over time. 1295 2. Values in the 'Description' field MUST NOT be changed in a way 1296 that would invalidate previously-existing tags. They MAY be 1297 broadened somewhat in scope, changed to add information, or 1298 adapted to the most common modern usage. For example, countries 1299 occasionally change their names; a historical example of this 1300 would be "Upper Volta" changing to "Burkina Faso". 1302 3. Values in the field 'Prefix' MAY be added to records of type 1303 'variant' via the registration process. If a prefix is added to 1304 a variant record, 'Comment' fields SHOULD be used to explain 1305 different usages with the various prefixes. 1307 4. Values in the field 'Prefix' in records of type 'variant' MAY be 1308 modified, so long as the modifications broaden the set of 1309 prefixes. That is, a prefix MAY be replaced by one of its own 1310 prefixes. For example, the prefix "en-US" could be replaced by 1311 "en", but not by the prefixes "en-Latn", "fr", or "en-US-boont". 1312 If one of those prefixes were needed, a new Prefix SHOULD be 1313 registered. 1315 5. Values in the field 'Prefix' MUST NOT be removed. 1317 6. The field 'Comments' MAY be added, changed, modified, or removed 1318 via the registration process or any of the processes or 1319 considerations described in this section. 1321 7. The field 'Suppress-Script' MAY be added or removed via the 1322 registration process. 1324 8. The field 'Macrolanguage' MAY be added or removed via the 1325 registration process, but only in response to changes made by 1326 ISO 639. The Macrolanguage field appears whenever a language 1327 has a corresponding Macrolanguage in ISO 639. That is, the 1328 macrolanguage fields in the registry exactly match those of ISO 1329 639. No other macrolanguage mappings will be considered for 1330 registration. 1332 9. Codes assigned by ISO 639-1 that do not conflict with existing 1333 two-letter primary language subtags and which have no 1334 corresponding three-letter primary or extended language subtags 1335 defined in the registry are entered into the IANA registry as 1336 new records of type 'language'. 1338 10. Codes assigned by ISO 639-2 that do not conflict with existing 1339 three-letter primary or extended language subtags are entered 1340 into the IANA registry as new records of type 'language'. 1342 11. Codes assigned by ISO 639-3 that do not conflict with existing 1343 three-letter primary language subtags are entered into the IANA 1344 registry as new primary language records. 1346 12. Codes assigned by ISO 15924 and ISO 3166 that do not conflict 1347 with existing subtags of the associated type and whose meaning 1348 is not the same as an existing subtag of the same type are 1349 entered into the IANA registry as new records. 1351 13. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1352 withdrawn by their respective maintenance or registration 1353 authority remain valid in language tags. A 'Deprecated' field 1354 containing the date of withdrawal MUST be added to the record. 1355 If a new record of the same type is added that represents a 1356 replacement value, then a 'Preferred-Value' field MAY also be 1357 added. The registration process MAY be used to add comments 1358 about the withdrawal of the code by the respective standard. 1360 Example The region code 'TL' was assigned to the country 1361 'Timor-Leste', replacing the code 'TP' (which was assigned to 1362 'East Timor' when it was under administration by Portugal). 1363 The subtag 'TP' remains valid in language tags, but its 1364 record contains the a 'Preferred-Value' of 'TL' and its field 1365 'Deprecated' contains the date the new code was assigned 1366 ('2004-07-06'). 1368 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1369 with existing subtags of the associated type, including subtags 1370 that are deprecated, MUST NOT be entered into the registry. The 1371 following additional considerations apply to subtag values that 1372 are reassigned: 1374 A. For ISO 639 codes, if the newly assigned code's meaning is 1375 not represented by a subtag in the IANA registry, the 1376 Language Subtag Reviewer, as described in Section 3.5, SHALL 1377 prepare a proposal for entering in the IANA registry as soon 1378 as practical a registered language subtag as an alternate 1379 value for the new code. The form of the registered language 1380 subtag will be at the discretion of the Language Subtag 1381 Reviewer and MUST conform to other restrictions on language 1382 subtags in this document. 1384 B. For all subtags whose meaning is derived from an external 1385 standard (that is, by ISO 639, ISO 15924, ISO 3166, or UN 1386 M.49), if a new meaning is assigned to an existing code and 1387 the new meaning broadens the meaning of that code, then the 1388 meaning for the associated subtag MAY be changed to match. 1389 The meaning of a subtag MUST NOT be narrowed, however, as 1390 this can result in an unknown proportion of the existing 1391 uses of a subtag becoming invalid. Note: ISO 639 1392 maintenance agency/registration authority (MA/RA) has 1393 adopted a similar stability policy. 1395 C. For ISO 15924 codes, if the newly assigned code's meaning is 1396 not represented by a subtag in the IANA registry, the 1397 Language Subtag Reviewer, as described in Section 3.5, SHALL 1398 prepare a proposal for entering in the IANA registry as soon 1399 as practical a registered variant subtag as an alternate 1400 value for the new code. The form of the registered variant 1401 subtag will be at the discretion of the Language Subtag 1402 Reviewer and MUST conform to other restrictions on variant 1403 subtags in this document. 1405 D. For ISO 3166 codes, if the newly assigned code's meaning is 1406 associated with the same UN M.49 code as another 'region' 1407 subtag, then the existing region subtag remains as the 1408 preferred value for that region and no new entry is created. 1409 A comment MAY be added to the existing region subtag 1410 indicating the relationship to the new ISO 3166 code. 1412 E. For ISO 3166 codes, if the newly assigned code's meaning is 1413 associated with a UN M.49 code that is not represented by an 1414 existing region subtag, then the Language Subtag Reviewer, 1415 as described in Section 3.5, SHALL prepare a proposal for 1416 entering the appropriate UN M.49 country code as an entry in 1417 the IANA registry. 1419 F. For ISO 3166 codes, if there is no associated UN numeric 1420 code, then the Language Subtag Reviewer SHALL petition the 1421 UN to create one. If there is no response from the UN 1422 within ninety days of the request being sent, the Language 1423 Subtag Reviewer SHALL prepare a proposal for entering in the 1424 IANA registry as soon as practical a registered variant 1425 subtag as an alternate value for the new code. The form of 1426 the registered variant subtag will be at the discretion of 1427 the Language Subtag Reviewer and MUST conform to other 1428 restrictions on variant subtags in this document. This 1429 situation is very unlikely to ever occur. 1431 15. UN M.49 has codes for both countries and areas (such as '276' 1432 for Germany) and geographical regions and sub-regions (such as 1433 '150' for Europe). UN M.49 country or area codes for which 1434 there is no corresponding ISO 3166 code SHOULD NOT be 1435 registered, except as a surrogate for an ISO 3166 code that is 1436 blocked from registration by an existing subtag. If such a code 1437 becomes necessary, then the registration authority for ISO 3166 1438 SHOULD first be petitioned to assign a code to the region. If 1439 the petition for a code assignment by ISO 3166 is refused or not 1440 acted on in a timely manner, the registration process described 1441 in Section 3.5 MAY then be used to register the corresponding UN 1442 M.49 code. This way, UN M.49 codes remain available as the 1443 value of last resort in cases where ISO 3166 reassigns a 1444 deprecated value in the registry. 1446 16. Stability provisions apply to grandfathered tags with this 1447 exception: should it become possible to compose one of the 1448 grandfathered tags from registered subtags, then the field 1449 'Type' in that record is changed from 'grandfathered' to 1450 'redundant'. Note that this will not affect language tags that 1451 match the grandfathered tag, since these tags will now match 1452 valid generative subtag sequences. For example, this document 1453 caused the ISO 639-3 code 'gan', used in the redundant tag "zh- 1454 gan", to be registered as an extended language subtag. The 1455 formerly-grandfathered tag "zh-gan" became a redundant tag as a 1456 result (but existing content or implementations that use "zh- 1457 gan" remain valid). 1459 Note: The redundant and grandfathered entries together are the 1460 complete list of tags registered under [RFC3066]. The redundant tags 1461 are those that can now be formed using the subtags defined in the 1462 registry together with the rules of Section 2.2. The grandfathered 1463 entries include those that can never be legal under those same 1464 provisions plus those tags that contain subtags not yet registered 1465 or, perhaps, inappropriate for registration. 1467 The set of redundant and grandfathered tags is permanent and stable: 1468 new entries in this section MUST NOT be added and existing entries 1469 MUST NOT be removed. Records of type 'grandfathered' MAY have their 1470 type converted to 'redundant'; see item 12 in Section 3.6 for more 1471 information. The decision-making process about which tags were 1472 initially grandfathered and which were made redundant is described in 1473 [RFC4645]. 1475 RFC 3066 tags that were deprecated prior to the adoption of [RFC4646] 1476 are part of the list of grandfathered tags, and their component 1477 subtags were not included as registered variants (although they 1478 remain eligible for registration). For example, the tag "art-lojban" 1479 was deprecated in favor of the language subtag 'jbo'. 1481 3.5. Registration Procedure for Subtags 1483 The procedure given here MUST be used by anyone who wants to use a 1484 subtag not currently in the IANA Language Subtag Registry. 1486 Only subtags of type 'language' and 'variant' will be considered for 1487 independent registration of new subtags. Subtags needed for 1488 stability and subtags necessary to keep the registry synchronized 1489 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1490 defined by this document also use this process, as described in 1491 Section 3.3. Stability provisions are described in Section 3.4. 1493 This procedure MAY also be used to register or alter the information 1494 for the 'Description', 'Comments', 'Deprecated', 'Prefix', or 1495 'Suppress-Script' fields in a subtag's record as described in 1496 Section 3.4. Changes to all other fields in the IANA registry are 1497 NOT permitted. 1499 Registering a new subtag or requesting modifications to an existing 1500 tag or subtag starts with the requester filling out the registration 1501 form reproduced below. Note that each response is not limited in 1502 size so that the request can adequately describe the registration. 1503 The fields in the "Record Requested" section SHOULD follow the 1504 requirements in Section 3.1. 1506 LANGUAGE SUBTAG REGISTRATION FORM 1507 1. Name of requester: 1508 2. E-mail address of requester: 1509 3. Record Requested: 1511 Type: 1512 Subtag: 1513 Description: 1514 Prefix: 1515 Preferred-Value: 1516 Deprecated: 1517 Suppress-Script: 1518 Macrolanguage: 1519 Comments: 1521 4. Intended meaning of the subtag: 1522 5. Reference to published description 1523 of the language (book or article): 1524 6. Any other relevant information: 1526 Figure 6: The Language Subtag Registration Form 1528 Examples of completed registration forms can be found in Appendix C 1529 or online at http://www.iana.org/assignments/lang-subtags-templates/. 1531 The subtag registration form MUST be sent to 1532 for a two-week review period before it can 1533 be submitted to IANA. If modifications are made to the request 1534 during the course of the registration process (such as corrections to 1535 meet the requirements in Section 3.1) the modified form MUST also be 1536 sent to at least one week prior to 1537 submission to IANA. 1539 Whenever an entry is created or modified in the registry, the 'File- 1540 Date' record at the start of the registry is updated to reflect the 1541 most recent modification date in the [RFC3339] "full-date" format. 1543 Before forwarding a new registration to IANA, the Language Subtag 1544 Reviewer MUST ensure that values in the 'Subtag' field match case 1545 according to the description in Section 3.1. 1547 The ietf-languages list is an open list and can be joined by sending 1548 a request to . The list can be 1549 hosted by IANA or by any third party at the request of IESG. 1551 Some fields in both the registration form as well as the registry 1552 record itself permit the use of non-ASCII characters. Registration 1553 requests SHOULD use the UTF-8 encoding for consistency and clarity. 1554 However, since some mail clients do not support this encoding, other 1555 encodings MAY be used for the registration request. The Language 1556 Subtag Reviewer is responsible for ensuring that the proper Unicode 1557 characters appear in both the archived request form and the registry 1558 record. In the case of a transcription or encoding error by IANA, 1559 the Language Subtag Reviewer will request that the registry be 1560 repaired, providing any necessary information to assist IANA. 1562 Variant subtags are usually registered for use with a particular 1563 range of language tags. For example, the subtag 'rozaj' is intended 1564 for use with language tags that start with the primary language 1565 subtag "sl", since Resian is a dialect of Slovenian. Thus, the 1566 subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" 1567 or "sl-IT-rozaj". This information is stored in the 'Prefix' field 1568 in the registry. Variant registration requests SHOULD include at 1569 least one 'Prefix' field in the registration form. 1571 Extended language subtags MUST include exactly one 'Prefix' field. 1573 The 'Prefix' field for a given registered subtag exists in the IANA 1574 registry as a guide to usage. Additional prefixes MAY be added by 1575 filing an additional registration form. In that form, the "Any other 1576 relevant information:" field MUST indicate that it is the addition of 1577 a prefix. 1579 Requests to add a prefix to a variant subtag that imply a different 1580 semantic meaning will probably be rejected. For example, a request 1581 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1582 nedis" represented some German dialect would be rejected. The 1583 'nedis' subtag represents a particular Slovenian dialect and the 1584 additional registration would change the semantic meaning assigned to 1585 the subtag. A separate subtag SHOULD be proposed instead. 1587 The 'Description' field MUST contain a description of the tag being 1588 registered written or transcribed into the Latin script; it MAY also 1589 include a description in a non-Latin script. The 'Description' field 1590 is used for identification purposes and doesn't necessarily represent 1591 the actual native name of the language or variation or to be in any 1592 particular language. 1594 While the 'Description' field itself is not guaranteed to be stable 1595 and errata corrections MAY be undertaken from time to time, attempts 1596 to provide translations or transcriptions of entries in the registry 1597 itself will probably be frowned upon by the community or rejected 1598 outright, as changes of this nature have an impact on the provisions 1599 in Section 3.4. 1601 When the two-week period has passed, the Language Subtag Reviewer 1602 MUST take one of the following actions: 1604 o Explicitly accept the request and forward the form containing the 1605 record to be inserted or modified to iana@iana.org according to 1606 the procedure described in Section 3.3. 1608 o Explicitly reject the request because of significant objections 1609 raised on the list or due to problems with constraints in this 1610 document (which MUST be explicitly cited). 1612 o Extend the review period by granting an additional two-week 1613 increment to permit further discussion. After each two-week 1614 increment, the Language Subtag Reviewer MUST indicate on the list 1615 whether the registration has been accepted, rejected, or extended. 1617 Note that the Language Subtag Reviewer MAY raise objections on the 1618 list if he or she so desires. The important thing is that the 1619 objection MUST be made publicly. 1621 Sometimes the request needs to be modified as a result of discussion 1622 during the review period or due to requirements in this document. 1623 The applicant, Language Subtag Reviewer, or others are free to submit 1624 a modified version of the completed registration form, which will be 1625 considered in lieu of the original request with the explicit approval 1626 of the applicant. Such changes do not restart the two-week 1627 discussion period, although an application containing the final 1628 record submitted to IANA MUST appear on the list at least one week 1629 prior to the Language Subtag Reviewer forwarding the record to IANA. 1630 The applicant is also free to modify a rejected application with 1631 additional information and submit it again; this starts a new two- 1632 week comment period. 1634 Registrations initiated due to the provisions of Section 3.3 or 1635 Section 3.4 SHALL NOT be rejected altogether (since they have to 1636 ultimately appear in the registry) and SHOULD be completed as quickly 1637 as possible. The review process allows list members to comment on 1638 the specific information in the form and the record it contains and 1639 thus help ensure that it is correct and consistent. The Language 1640 Subtag Reviewer MAY reject a specific version of the form, but MUST 1641 include in the rejection a suitable replacement, extending the review 1642 period as described above, until the form is in a format worthy of 1643 reviewer's approval. 1645 Decisions made by the Language Subtag Reviewer MAY be appealed to the 1646 IESG [RFC2028] under the same rules as other IETF decisions 1647 [RFC2026]. This includes a decision to extend the review period or 1648 the failure to announce a decision in a clear and timely manner. 1650 The approved records appear in the Language Subtag Registry. The 1651 approved registration forms are available online under 1652 http://www.iana.org/assignments/lang-subtags-templates/. 1654 Updates or changes to existing records follow the same procedure as 1655 new registrations. The Language Subtag Reviewer decides whether 1656 there is consensus to update the registration following the two week 1657 review period; normally, objections by the original registrant will 1658 carry extra weight in forming such a consensus. 1660 Registrations are permanent and stable. Once registered, subtags 1661 will not be removed from the registry and will remain a valid way in 1662 which to specify a specific language or variant. 1664 Note: The purpose of the "Reference to published description" section 1665 in the registration form is to aid in verifying whether a language is 1666 registered or what language or language variation a particular subtag 1667 refers to. In most cases, reference to an authoritative grammar or 1668 dictionary of that language will be useful; in cases where no such 1669 work exists, other well-known works describing that language or in 1670 that language MAY be appropriate. The Language Subtag Reviewer 1671 decides what constitutes "good enough" reference material. This 1672 requirement is not intended to exclude particular languages or 1673 dialects due to the size of the speaker population or lack of a 1674 standardized orthography. Minority languages will be considered 1675 equally on their own merits. 1677 3.6. Possibilities for Registration 1679 Possibilities for registration of subtags or information about 1680 subtags include: 1682 o Primary language subtags for languages not listed in ISO 639 that 1683 are not variants of any listed or registered language MAY be 1684 registered. At the time this document was created, there were no 1685 examples of this form of subtag. Before attempting to register a 1686 language subtag, there MUST be an attempt to register the language 1687 with ISO 639. Subtags MUST NOT be registered for languages 1688 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3, 1689 or that are under consideration by the ISO 639 registration 1690 authorities, or that have never been attempted for registration 1691 with those authorities. If ISO 639 has previously rejected a 1692 language for registration, it is reasonable to assume that there 1693 must be additional, very compelling evidence of need before it 1694 will be registered as a primary language subtag in the IANA 1695 registry (to the extent that it is very unlikely that any subtags 1696 will be registered of this type). 1698 o Dialect or other divisions or variations within a language, its 1699 orthography, writing system, regional or historical usage, 1700 transliteration or other transformation, or distinguishing 1701 variation MAY be registered as variant subtags. An example is the 1702 'rozaj' subtag (the Resian dialect of Slovenian). 1704 o The addition or maintenance of fields (generally of an 1705 informational nature) in Tag or Subtag records as described in 1706 Section 3.1 and subject to the stability provisions in 1707 Section 3.4. This includes descriptions, comments, deprecation 1708 and preferred values for obsolete or withdrawn codes, or the 1709 addition of script or macrolanguage information to primary 1710 language subtags. 1712 o The addition of records and related field value changes necessary 1713 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1714 UN M.49 as described in Section 3.4. 1716 Subtags proposed for registration that would cause all or part of a 1717 grandfathered tag to become redundant but whose meaning conflicts 1718 with or alters the meaning of the grandfathered tag MUST be rejected. 1720 This document leaves the decision on what subtags or changes to 1721 subtags are appropriate (or not) to the registration process 1722 described in Section 3.5. 1724 Note: four-character primary language subtags are reserved to allow 1725 for the possibility of alpha4 codes in some future addition to the 1726 ISO 639 family of standards. 1728 ISO 639 defines a maintenance agency for additions to and changes in 1729 the list of languages in ISO 639. This agency is: 1731 International Information Centre for Terminology (Infoterm) 1732 Aichholzgasse 6/12, AT-1120 1733 Wien, Austria 1734 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1736 ISO 639-2 defines a maintenance agency for additions to and changes 1737 in the list of languages in ISO 639-2. This agency is: 1739 Library of Congress 1740 Network Development and MARC Standards Office 1741 Washington, D.C. 20540 USA 1742 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1743 URL: http://www.loc.gov/standards/iso639-2 1745 ISO 639-3 defines a maintenance agency for additions to and changes 1746 in the list of languages in ISO 639-3. This agency is: 1748 SIL International 1749 ISO 639-3 Registrar 1750 7500 W. Camp Wisdom Rd. 1751 Dallas, TX 75236 USA 1752 Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546 1753 Email: iso639-3@sil.org 1754 URL: http://www.sil.org/iso639-3 1756 The maintenance agency for ISO 3166 (country codes) is: 1758 ISO 3166 Maintenance Agency 1759 c/o International Organization for Standardization 1760 Case postale 56 1761 CH-1211 Geneva 20 Switzerland 1762 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1763 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1765 The registration authority for ISO 15924 (script codes) is: 1767 Unicode Consortium Box 391476 1768 Mountain View, CA 94039-1476, USA 1769 URL: http://www.unicode.org/iso15924 1771 The Statistics Division of the United Nations Secretariat maintains 1772 the Standard Country or Area Codes for Statistical Use and can be 1773 reached at: 1775 Statistical Services Branch 1776 Statistics Division 1777 United Nations, Room DC2-1620 1778 New York, NY 10017, USA 1780 Fax: +1-212-963-0623 1781 E-mail: statistics@un.org 1782 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1784 3.7. Extensions and the Extensions Registry 1786 Extension subtags are those introduced by single-character subtags 1787 ("singletons") other than 'x'. They are reserved for the generation 1788 of identifiers that contain a language component and are compatible 1789 with applications that understand language tags. 1791 The structure and form of extensions are defined by this document so 1792 that implementations can be created that are forward compatible with 1793 applications that might be created using singletons in the future. 1794 In addition, defining a mechanism for maintaining singletons will 1795 lend stability to this document by reducing the likely need for 1796 future revisions or updates. 1798 Single-character subtags are assigned by IANA using the "IETF 1799 Consensus" policy defined by [RFC2434]. This policy requires the 1800 development of an RFC, which SHALL define the name, purpose, 1801 processes, and procedures for maintaining the subtags. The 1802 maintaining or registering authority, including name, contact email, 1803 discussion list email, and URL location of the registry, MUST be 1804 indicated clearly in the RFC. The RFC MUST specify or include each 1805 of the following: 1807 o The specification MUST reference the specific version or revision 1808 of this document that governs its creation and MUST reference this 1809 section of this document. 1811 o The specification and all subtags defined by the specification 1812 MUST follow the ABNF and other rules for the formation of tags and 1813 subtags as defined in this document. In particular, it MUST 1814 specify that case is not significant and that subtags MUST NOT 1815 exceed eight characters in length. 1817 o The specification MUST specify a canonical representation. 1819 o The specification of valid subtags MUST be available over the 1820 Internet and at no cost. 1822 o The specification MUST be in the public domain or available via a 1823 royalty-free license acceptable to the IETF and specified in the 1824 RFC. 1826 o The specification MUST be versioned, and each version of the 1827 specification MUST be numbered, dated, and stable. 1829 o The specification MUST be stable. That is, extension subtags, 1830 once defined by a specification, MUST NOT be retracted or change 1831 in meaning in any substantial way. 1833 o The specification MUST include in a separate section the 1834 registration form reproduced in this section (below) to be used in 1835 registering the extension upon publication as an RFC. 1837 o IANA MUST be informed of changes to the contact information and 1838 URL for the specification. 1840 IANA will maintain a registry of allocated single-character 1841 (singleton) subtags. This registry MUST use the record-jar format 1842 described by the ABNF in Section 3.1. Upon publication of an 1843 extension as an RFC, the maintaining authority defined in the RFC 1844 MUST forward this registration form to iesg@ietf.org, who MUST 1845 forward the request to iana@iana.org. The maintaining authority of 1846 the extension MUST maintain the accuracy of the record by sending an 1847 updated full copy of the record to iana@iana.org with the subject 1848 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only 1849 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY 1850 be modified in these updates. 1852 Failure to maintain this record, maintain the corresponding registry, 1853 or meet other conditions imposed by this section of this document MAY 1854 be appealed to the IESG [RFC2028] under the same rules as other IETF 1855 decisions (see [RFC2026]) and MAY result in the authority to maintain 1856 the extension being withdrawn or reassigned by the IESG. 1857 %% 1858 Identifier: 1859 Description: 1860 Comments: 1861 Added: 1862 RFC: 1863 Authority: 1864 Contact_Email: 1865 Mailing_List: 1866 URL: 1867 %% 1869 Figure 7: Format of Records in the Language Tag Extensions Registry 1871 'Identifier' contains the single-character subtag (singleton) 1872 assigned to the extension. The Internet-Draft submitted to define 1873 the extension SHOULD specify which letter or digit to use, although 1874 the IESG MAY change the assignment when approving the RFC. 1876 'Description' contains the name and description of the extension. 1878 'Comments' is an OPTIONAL field and MAY contain a broader description 1879 of the extension. 1881 'Added' contains the date the RFC was published in the "full-date" 1882 format specified in [RFC3339]. For example: 2004-06-28 represents 1883 June 28, 2004, in the Gregorian calendar. 1885 'RFC' contains the RFC number assigned to the extension. 1887 'Authority' contains the name of the maintaining authority for the 1888 extension. 1890 'Contact_Email' contains the email address used to contact the 1891 maintaining authority. 1893 'Mailing_List' contains the URL or subscription email address of the 1894 mailing list used by the maintaining authority. 1896 'URL' contains the URL of the registry for this extension. 1898 The determination of whether an Internet-Draft meets the above 1899 conditions and the decision to grant or withhold such authority rests 1900 solely with the IESG and is subject to the normal review and appeals 1901 process associated with the RFC process. 1903 Extension authors are strongly cautioned that many (including most 1904 well-formed) processors will be unaware of any special relationships 1905 or meaning inherent in the order of extension subtags. Extension 1906 authors SHOULD avoid subtag relationships or canonicalization 1907 mechanisms that interfere with matching or with length restrictions 1908 that sometimes exist in common protocols where the extension is used. 1909 In particular, applications MAY truncate the subtags in doing 1910 matching or in fitting into limited lengths, so it is RECOMMENDED 1911 that the most significant information be in the most significant 1912 (left-most) subtags and that the specification gracefully handle 1913 truncated subtags. 1915 When a language tag is to be used in a specific, known, protocol, it 1916 is RECOMMENDED that the language tag not contain extensions not 1917 supported by that protocol. In addition, note that some protocols 1918 MAY impose upper limits on the length of the strings used to store or 1919 transport the language tag. 1921 3.8. Update of the Language Subtag Registry 1923 Upon adoption of this document the IANA Language Subtag Registry will 1924 need an update so that it contains the complete set of subtags valid 1925 in a language tag. This collection of subtags, along with a 1926 description of the process used to create it, is described by 1927 [registry-update]. IANA will publish the updated version of the 1928 registry described by this document using the instructions and 1929 content of [registry-update]. Once published by IANA, the 1930 maintenance procedures, rules, and registration processes described 1931 in this document will be available for new registrations or updates. 1933 Registrations that are in process under the rules defined in 1934 [RFC4646] when this document is adopted MUST be completed under the 1935 rules contained in this document. 1937 4. Formation and Processing of Language Tags 1939 This section addresses how to use the information in the registry 1940 with the tag syntax to choose, form, and process language tags. 1942 4.1. Choice of Language Tag 1944 The guiding principle in forming language tags is to "tag content 1945 wisely." Sometimes there is a choice between several possible tags 1946 for the same content. The choice of which tag to use depends on the 1947 content and application in question and some amount of judgment might 1948 be necessary when selecting a tag. 1950 Interoperability is best served when the same language tag is used 1951 consistently to represent the same language. If an application has 1952 requirements that make the rules here inapplicable, then that 1953 application risks damaging interoperability. It is strongly 1954 RECOMMENDED that users not define their own rules for language tag 1955 choice. 1957 A subtag SHOULD only be used when it adds useful distinguishing 1958 information to the tag. Extraneous subtags interfere with the 1959 meaning, understanding, and processing of language tags. In 1960 particular, users and implementations SHOULD follow the 'Prefix' and 1961 'Suppress-Script' fields in the registry (defined in Section 3.1): 1962 these fields provide guidance on when specific additional subtags 1963 SHOULD be used or avoided in a language tag. 1965 Some applications can benefit from the use of script subtags in 1966 language tags, as long as the use is consistent for a given context. 1967 Script subtags are never appropriate for unwritten content (such as 1968 audio recordings). 1970 Script subtags were not formally defined in [RFC3066] and their use 1971 can affect matching and subtag identification for implementations of 1972 RFC 3066, as these subtags appear between the primary language and 1973 region subtags. For example, if an implementation selects content 1974 using Basic Filtering [RFC4647] (originally described in Section 2.5 1975 of [RFC3066]) and the user requested the language range "en-US", 1976 content labeled "en-Latn-US" will not match the request and thus not 1977 be selected. Therefore, it is important to know when script subtags 1978 will customarily be used and when they ought not be used. In the 1979 registry, the Suppress-Script field helps ensure greater 1980 compatibility between the language tags by defining when users SHOULD 1981 NOT include a script subtag with a particular primary language 1982 subtag. 1984 The choice of subtags used to form a language tag SHOULD be guided by 1985 the following rules: 1987 1. Use as precise a tag as possible, but no more specific than is 1988 justified. Avoid using subtags that are not important for 1989 distinguishing content in an application. 1991 * For example, 'de' might suffice for tagging an email written 1992 in German, while "de-CH-1996" is probably unnecessarily 1993 precise for such a task. 1995 2. The script subtag SHOULD NOT be used to form language tags unless 1996 the script adds some distinguishing information to the tag. The 1997 field 'Suppress-Script' in the primary language record in the 1998 registry indicates script subtags that do not add distinguishing 1999 information for most applications. For example: 2001 * The subtag 'Latn' should not be used with the primary language 2002 'en' because nearly all English documents are written in the 2003 Latin script and it adds no distinguishing information. 2004 However, if a document were written in English mixing Latin 2005 script with another script such as Braille ('Brai'), then it 2006 might be appropriate to choose to indicate both scripts to aid 2007 in content selection, such as the application of a style 2008 sheet. 2010 * When labeling content that is unwritten (such as a recording 2011 of human speech), the script subtag should not be used, even 2012 if the language is customarily written in several scripts. 2013 Thus the subtitles to a movie might use the tag "zh-cmn-Hant" 2014 (Chinese, Mandarin, Traditional script), but the audio track 2015 for the same language would be tagged "zh-cmn". 2017 3. If a tag or subtag has a 'Preferred-Value' field in its registry 2018 entry, then the value of that field SHOULD be used to form the 2019 language tag in preference to the tag or subtag in which the 2020 preferred value appears. 2022 * For example, use 'he' for Hebrew in preference to 'iw'. 2024 4. [ISO639-2] has defined several codes included in the subtag 2025 registry that require additional care when choosing language 2026 tags. In most of these cases, where omitting the language tag is 2027 permitted, such omission is preferable to using these codes. 2028 Language tags SHOULD NOT incorporate these subtags as a prefix, 2029 unless the additional information conveys some value to the 2030 application. 2032 1. Use specific language subtags or subtag sequences in 2033 preference to subtags for language collections. A "language 2034 collection" is a subtag derived from one of the [ISO639-2] 2035 codes that represents multiple related languages. These 2036 codes are included as primary language subtags in the 2037 registry. For example, the code 'cmc' represents "Chamic 2038 languages". The registry contains values for each of the 2039 approximately ten individual languages represented by this 2040 collective code. Some other examples include the subtags 2041 Germanic languages ('gem') or Algonquian languages ('alg'). 2042 Since these codes are interpreted inclusively, content tagged 2043 with "en" (English), "de" (German), or "gsw" (Swiss German, 2044 Alemannic) could also (but SHOULD NOT) be tagged with "gem" 2045 (Germanic languages). Subtags derived from collection codes 2046 SHOULD NOT be used be used unless more specific language 2047 information is not available. Note that matching 2048 implementations generally do not understand the relationship 2049 between the collection and its encompassed languages, and so 2050 users ought not assume a subtag based on a language 2051 collection is a useful means for selecting content in its 2052 encompassed languages. 2054 2. The 'mul' (Multiple) primary language subtag identifies 2055 content in multiple languages. It SHOULD NOT be used when a 2056 list of languages (such as Content-Language) or individual 2057 tags for each content element can be used instead. 2059 3. The 'und' (Undetermined) primary language subtag identifies 2060 linguistic content whose language is not known. It SHOULD 2061 NOT be used unless a language tag is required and language 2062 information is not available or cannot be determined. 2063 Omitting the language tag (where permitted) is preferred. 2064 The 'und' subtag MAY be useful for protocols that require a 2065 language tag to be provided or where a primary language 2066 subtag is required (such as in "und-Latn"). The 'und' subtag 2067 MAY also be useful when matching language tags in certain 2068 situations. 2070 4. The 'zxx' (Non-Linguistic) primary language subtag identifies 2071 content that has no language. Some examples might include 2072 instrumental or electronic music; sound recordings consisting 2073 of nonverbal sounds; audiovisual materials with no narration, 2074 printed titles, or subtitles; machine-readable data files 2075 consisting of machine languages or character codes; or 2076 programming source code. Note: where there are fragments of 2077 linguistic content, such as programming source code 2078 containing comments written in English, the subtag 'zxx' 2079 might still be used to indicate the primary status of the 2080 content, just as 'en' can be applied to a predominantly 2081 English text that contains a few French phrases. 2083 5. The 'mis' (Uncoded) primary language subtag identifies 2084 content whose language is known but which does not currently 2085 have a corresponding subtag. This subtag SHOULD NOT be used. 2086 Because the addition of other codes in the future can render 2087 its application invalid, it is inherently unstable and hence 2088 incompatible with the stability goals of BCP 47. It is 2089 always preferable to use other subtags: either 'und' or (with 2090 prior agreement) private use subtags. 2092 6. The grandfathered tag "i-default" (Default Language) was 2093 originally registered according to [RFC1766] to meet the 2094 needs of [RFC2277]. It is used to indicate not a specific 2095 language, but rather, it identifies the condition or content 2096 used where the language preferences of the user cannot be 2097 established. It SHOULD NOT be used except as a means of 2098 labeling the default content for applications or protocols 2099 that require default language content to be labeled with that 2100 specific tag. It MAY also be used by an application or 2101 protocol to identify when the default language content is 2102 being returned. 2104 5. The same variant subtag MUST NOT be used more than once within a 2105 language tag. 2107 * For example, the tag "de-DE-1901-1901" is not valid. 2109 Some of the languages in the registry are labeled "macrolanguages" by 2110 ISO 639-3, which defines the term as "clusters of closely-related 2111 language varieties that [...] can be considered distinct individual 2112 languages, yet in certain usage contexts a single language identity 2113 for all is needed". These correspond to codes registered in ISO 2114 639-2 as single languages that were found to correspond to more than 2115 one language in ISO 639-3. The record for each of the languages 2116 encompassed by a macrolanguage contains a 'Macrolanguage' field in 2117 the registry; the macrolanguages themselves are not specially marked. 2119 It is always permitted, and sometimes useful, to tag an encompassed 2120 language using the subtag for its macrolanguage. However, the 2121 Macrolanguage field doesn't define what the relationship is between 2122 the encompassed language and its macrolanguage, nor does it define 2123 how languages encompassed by the same macrolanguage are related to 2124 each other. In some cases, one of the encompassed languages serves 2125 as a standard form for the entire macrolanguage and is frequently 2126 identified with it; in other cases there is no dominant language, and 2127 the macrolanguage simply serves as a cover term for the entire group. 2129 Applications MAY use macrolanguage information to improve matching or 2130 language negotiation. For example, the information that 'sr' 2131 (Serbian) and 'hr' (Croatian) share a macrolanguage expresses a 2132 closer relation between those languages than between, say, 'sr' 2133 (Serbian) and 'ma' (Macedonian). It is valid to use either the 2134 subtag of the encompassed language or of the macrolanguage to form 2135 language tags. However, many matching applications will not be aware 2136 of the relationship between the languages. Care in selecting which 2137 subtags are used is crucial to interoperability. 2139 In general, use the most specific subtag to form the language tag. 2140 However, where the macrolanguage tag has been historically used to 2141 denote a dominant encompassed language, it SHOULD be used in place of 2142 the subtag specific to that encompassed language unless it is 2143 necessary to clearly distinguish the macrolanguage as a whole from 2144 that enclosed dominant language variety. 2146 In particular, the Chinese family of languages call for special 2147 consideration. Because the written form is very similar for most 2148 languages having 'zh' (Chinese) as a macrolanguage (and because 2149 historically subtags for the various encompassed languages were not 2150 available), languages such as 'yue' (Cantonese) have historically 2151 used either 'zh' or a tag (now grandfathered) beginning with 'zh'. 2152 This means that macrolanguage information can be usefully applied 2153 when searching for content or when providing fallbacks in language 2154 negotiation. For example, the information that 'yue' has a 2155 macrolangauge of 'zh' could be used in the Lookup algorithm to 2156 fallback from a request for "yue-Hans-CN" to "zh-Hans-CN" without 2157 losing the script and region information (even though the user did 2158 not specify "zh-Hans-CN" in their request). 2160 To ensure consistent backward compatibility, this document contains 2161 several provisions to account for potential instability in the 2162 standards used to define the subtags that make up language tags. 2163 These provisions mean that no language tag created under the rules in 2164 this document will become invalid. 2166 Standards, protocols, and applications that reference this document 2167 normatively but apply different rules to the ones given in this 2168 section MUST specify how language tag selection varies from the 2169 guidelines given here. 2171 4.2. Meaning of the Language Tag 2173 The meaning of a language tag is related to the meaning of the 2174 subtags that it contains. Each subtag, in turn, implies a certain 2175 range of expectations one might have for related content, although it 2176 is not a guarantee. For example, the use of a script subtag such as 2177 'Arab' (Arabic script) does not mean that the content contains only 2178 Arabic characters. It does mean that the language involved is 2179 predominantly in the Arabic script. Thus a language tag and its 2180 subtags can encompass a very wide range of variation and yet remain 2181 valid in each particular instance. 2183 Validity of a tag is not everything. While every valid tag has a 2184 meaning, it might not represent any real-world language usage. This 2185 is unavoidable in a system in which subtags can be combined freely. 2186 For example, tags such as "ar-Cyrl-CO" (Arabic, Cyrillic script, as 2187 used in Colombia ) or "tlh-Kore-AQ-fonipa" (Klingon, Korean script, 2188 as used in Antarctica, IPA phonetic transcription) are both valid and 2189 unlikely to represent a useful combination of language attributes. 2191 The relationship between the tag and the information it identifies is 2192 defined by the context in which the tag appears. Accordingly, this 2193 section gives only possible examples of its usage. 2195 o For a single information object, the associated language tags 2196 might be interpreted as the set of languages that is necessary for 2197 a complete comprehension of the complete object. Example: Plain 2198 text documents. 2200 o For an aggregation of information objects, the associated language 2201 tags could be taken as the set of languages used inside components 2202 of that aggregation. Examples: Document stores and libraries. 2204 o For information objects whose purpose is to provide alternatives, 2205 the associated language tags could be regarded as a hint that the 2206 content is provided in several languages and that one has to 2207 inspect each of the alternatives in order to find its language or 2208 languages. In this case, the presence of multiple tags might not 2209 mean that one needs to be multi-lingual to get complete 2210 understanding of the document. Example: MIME multipart/ 2211 alternative. 2213 o In markup languages, such as HTML and XML, language information 2214 can be added to each part of the document identified by the markup 2215 structure (including the whole document itself). For example, one 2216 could write C'est la vie. inside a 2217 Norwegian document; the Norwegian-speaking user could then access 2218 a French-Norwegian dictionary to find out what the marked section 2219 meant. If the user were listening to that document through a 2220 speech synthesis interface, this formation could be used to signal 2221 the synthesizer to appropriately apply French text-to-speech 2222 pronunciation rules to that span of text, instead of applying the 2223 inappropriate Norwegian rules. 2225 Language tags are related when they contain a similar sequence of 2226 subtags. For example, if a language tag B contains language tag A as 2227 a prefix, then B is typically "narrower" or "more specific" than A. 2228 Thus, "zh-Hant-TW" is more specific than "zh-Hant". 2230 This relationship is not guaranteed in all cases: specifically, 2231 languages that begin with the same sequence of subtags are NOT 2232 guaranteed to be mutually intelligible, although they might be. For 2233 example, the tag "az" shares a prefix with both "az-Latn" 2234 (Azerbaijani written using the Latin script) and "az-Cyrl" 2235 (Azerbaijani written using the Cyrillic script). A person fluent in 2236 one script might not be able to read the other, even though the text 2237 might be identical. Content tagged as "az" most probably is written 2238 in just one script and thus might not be intelligible to a reader 2239 familiar with the other script. 2241 4.3. Length Considerations 2243 There is no defined upper limit on the size of language tags. While 2244 historically most language tags have consisted of language and region 2245 subtags with a combined total length of up to six characters, larger 2246 tags have always been both possible and actually appeared in use. 2248 Neither the language tag syntax nor other requirements in this 2249 document impose a fixed upper limit on the number of subtags in a 2250 language tag (and thus an upper bound on the size of a tag). The 2251 language tag syntax suggests that, depending on the specific 2252 language, more subtags (and thus a longer tag) are sometimes 2253 necessary to completely identify the language for certain 2254 applications; thus, it is possible to envision long or complex subtag 2255 sequences. 2257 4.3.1. Working with Limited Buffer Sizes 2259 Some applications and protocols are forced to allocate fixed buffer 2260 sizes or otherwise limit the length of a language tag. A conformant 2261 implementation or specification MAY refuse to support the storage of 2262 language tags that exceed a specified length. Any such limitation 2263 SHOULD be clearly documented, and such documentation SHOULD include 2264 what happens to longer tags (for example, whether an error value is 2265 generated or the language tag is truncated). A protocol that allows 2266 tags to be truncated at an arbitrary limit, without giving any 2267 indication of what that limit is, has the potential for causing harm 2268 by changing the meaning of tags in substantial ways. 2270 In practice, most language tags do not require more than a few 2271 subtags and will not approach reasonably sized buffer limitations; 2272 see Section 4.1. 2274 Some specifications or protocols have limits on tag length but do not 2275 have a fixed length limitation. For example, [RFC2231] has no 2276 explicit length limitation: the length available for the language tag 2277 is constrained by the length of other header components (such as the 2278 charset's name) coupled with the 76-character limit in [RFC2047]. 2279 Thus, the "limit" might be 50 or more characters, but it could 2280 potentially be quite small. 2282 The considerations for assigning a buffer limit are: 2284 Implementations SHOULD NOT truncate language tags unless the 2285 meaning of the tag is purposefully being changed, or unless the 2286 tag does not fit into a limited buffer size specified by a 2287 protocol for storage or transmission. 2289 Implementations SHOULD warn the user when a tag is truncated since 2290 truncation changes the semantic meaning of the tag. 2292 Implementations of protocols or specifications that are space 2293 constrained but do not have a fixed limit SHOULD use the longest 2294 possible tag in preference to truncation. 2296 Protocols or specifications that specify limited buffer sizes for 2297 language tags MUST allow for language tags of up to 33 characters. 2299 Protocols or specifications that specify limited buffer sizes for 2300 language tags SHOULD allow for language tags of at least 30 2301 characters. Note that RFC 4646 [RFC4646] recommended a field size 2302 of 42 character because it included the permanently reserved (and 2303 unused) 'extlang' production. The current size recommendation 2304 does not include the use of the 'extlang' field. 2306 The following illustration shows how the 30-character recommendation 2307 was derived. The combination of language and extended language 2308 subtags was chosen for future compatibility. At up to 15 characters, 2309 this combination is longer than the longest possible primary language 2310 subtag (8 characters): 2312 language = 3 (ISO 639-2; ISO 639-1 requires 2) 2313 script = 5 (if not suppressed: see Section 4.1) 2314 region = 4 (UN M.49; ISO 3166 requires 3) 2315 variant1 = 9 (needs 'language' as a prefix) 2316 variant2 = 9 (needs 'language-variant1' as a prefix) 2318 total = 30 characters 2320 Figure 8: Derivation of the Limit on Tag Length 2322 4.3.2. Truncation of Language Tags 2324 Truncation of a language tag alters the meaning of the tag, and thus 2325 SHOULD be avoided. However, truncation of language tags is sometimes 2326 necessary due to limited buffer sizes. Such truncation MUST NOT 2327 permit a subtag to be chopped off in the middle or the formation of 2328 invalid tags (for example, one ending with the "-" character). 2330 This means that applications or protocols that truncate tags MUST do 2331 so by progressively removing subtags along with their preceding "-" 2332 from the right side of the language tag until the tag is short enough 2333 for the given buffer. If the resulting tag ends with a single- 2334 character subtag, that subtag and its preceding "-" MUST also be 2335 removed. For example: 2337 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 2338 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 2339 2. zh-Latn-CN-variant1-a-extend1 2340 3. zh-Latn-CN-variant1 2341 4. zh-Latn-CN 2342 5. zh-Latn 2343 6. zh 2345 Figure 9: Example of Tag Truncation 2347 4.4. Canonicalization of Language Tags 2349 Since a particular language tag is sometimes used by many processes, 2350 language tags SHOULD always be created or generated in a canonical 2351 form. 2353 A language tag is in canonical form when: 2355 1. The tag is well-formed according the rules in Section 2.1 and 2356 Section 2.2. 2358 2. Subtags of type 'Region' that have a Preferred-Value mapping in 2359 the IANA registry (see Section 3.1) SHOULD be replaced with their 2360 mapped value. Note: In rare cases, the mapped value will also 2361 have a Preferred-Value. 2363 3. Redundant or grandfathered tags that have a Preferred-Value 2364 mapping in the IANA registry (see Section 3.1) MUST be replaced 2365 with their mapped value. These items either are deprecated 2366 mappings created before the adoption of this document (such as 2367 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 2368 the result of later registrations or additions to this document 2369 (for example, "zh-hakka" was deprecated in favor of the ISO 639-3 2370 code 'hak' when this document was adopted). 2372 4. Other subtags that have a Preferred-Value mapping in the IANA 2373 registry (see Section 3.1) MUST be replaced with their mapped 2374 value. These items consist entirely of clerical corrections to 2375 ISO 639-1 in which the deprecated subtags have been maintained 2376 for compatibility purposes. 2378 5. If more than one extension subtag sequence exists, the extension 2379 sequences are ordered into case-insensitive ASCII order by 2380 singleton subtag. 2382 Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical 2383 form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in 2384 canonical form. 2386 Example: The language tag "en-BU" (English as used in Burma) is not 2387 canonical because the 'BU' subtag has a canonical mapping to 'MM' 2388 (Myanmar), although the tag "en-BU" maintains its validity. 2390 Canonicalization of language tags does not imply anything about the 2391 use of upper or lowercase letters when processing or comparing 2392 subtags (and as described in Section 2.1). All comparisons MUST be 2393 performed in a case-insensitive manner. 2395 When performing canonicalization of language tags, processors MAY 2396 regularize the case of the subtags (that is, this process is 2397 OPTIONAL), following the case used in the registry. Note that this 2398 corresponds to the following casing rules: uppercase all non-initial 2399 two-letter subtags; titlecase all non-initial four-letter subtags; 2400 lowercase everything else. 2402 Note: Case folding of ASCII letters in certain locales, unless 2403 carefully handled, sometimes produces non-ASCII character values. 2404 The Unicode Character Database file "SpecialCasing.txt" defines the 2405 specific cases that are known to cause problems with this. In 2406 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2407 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2408 Implementers SHOULD specify a locale-neutral casing operation to 2409 ensure that case folding of subtags does not produce this value, 2410 which is illegal in language tags. For example, if one were to 2411 uppercase the region subtag 'in' using Turkish locale rules, the 2412 sequence U+0130 U+004E would result instead of the expected 'IN'. 2414 Note: if the field 'Deprecated' appears in a registry record without 2415 an accompanying 'Preferred-Value' field, then that tag or subtag is 2416 deprecated without a replacement. Validating processors SHOULD NOT 2417 generate tags that include these values, although the values are 2418 canonical when they appear in a language tag. 2420 An extension MUST define any relationships that exist between the 2421 various subtags in the extension and thus MAY define an alternate 2422 canonicalization scheme for the extension's subtags. Extensions MAY 2423 define how the order of the extension's subtags are interpreted. For 2424 example, an extension could define that its subtags are in canonical 2425 order when the subtags are placed into ASCII order: that is, "en-a- 2426 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2427 define that the order of the subtags influences their semantic 2428 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2429 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2430 so that they are tolerant of the typical processes described in 2431 Section 3.7. 2433 4.5. Considerations for Private Use Subtags 2435 Private use subtags, like all other subtags, MUST conform to the 2436 format and content constraints in the ABNF. Private use subtags have 2437 no meaning outside the private agreement between the parties that 2438 intend to use or exchange language tags that employ them. The same 2439 subtags MAY be used with a different meaning under a separate private 2440 agreement. They SHOULD NOT be used where alternatives exist and 2441 SHOULD NOT be used in content or protocols intended for general use. 2443 Private use subtags are simply useless for information exchange 2444 without prior arrangement. The value and semantic meaning of private 2445 use tags and of the subtags used within such a language tag are not 2446 defined by this document. 2448 Subtags defined in the IANA registry as having a specific private use 2449 meaning convey more information that a purely private use tag 2450 prefixed by the singleton subtag 'x'. For applications, this 2451 additional information MAY be useful. 2453 For example, the region subtags 'AA', 'ZZ', and in the ranges 2454 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2455 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2456 great deal of public, interchangeable information about the language 2457 material (that it is Chinese in the simplified Chinese script and is 2458 suitable for some geographic region 'XQ'). While the precise 2459 geographic region is not known outside of private agreement, the tag 2460 conveys far more information than an opaque tag such as "x-someLang", 2461 which contains no information about the language subtag or script 2462 subtag outside of the private agreement. 2464 However, in some cases content tagged with private use subtags MAY 2465 interact with other systems in a different and possibly unsuitable 2466 manner compared to tags that use opaque, privately defined subtags, 2467 so the choice of the best approach sometimes depends on the 2468 particular domain in question. 2470 5. IANA Considerations 2472 This section deals with the processes and requirements necessary for 2473 IANA to undertake to maintain the subtag and extension registries as 2474 defined by this document and in accordance with the requirements of 2475 [RFC2434]. 2477 The impact on the IANA maintainers of the two registries defined by 2478 this document will be a small increase in the frequency of new 2479 entries or updates. IANA also is required to create a new mailing 2480 list (described below in Section 5.1) to announce registry changes 2481 and updates. 2483 5.1. Language Subtag Registry 2485 Upon adoption of this document, IANA will update the registry using 2486 instructions and content provided in a companion document: 2487 [registry-update]. The criteria and process for selecting the 2488 updated set of records are described in that document. The updated 2489 set of records represents no impact on IANA, since the work to create 2490 it will be performed externally. 2492 Future work on the Language Subtag Registry includes the following 2493 activities: 2495 Inserting or replacing whole records. These records are 2496 preformatted for IANA by the Language Subtag Reviewer, as 2497 described in Section 3.3. 2499 Archiving and making publicly available the registration forms. 2501 Announcing each updated version of the registry on the 2502 "ietf-languages-announcements@iana.org" mailing list. 2504 Each registration form sent to IANA contains a single record for 2505 incorporation into the registry. The form will be sent to 2506 "iana@iana.org" by the Language Subtag Reviewer. It will have a 2507 subject line indicating whether the enclosed form represents an 2508 insertion of a new record (indicated by the word "INSERT" in the 2509 subject line) or a replacement of an existing record (indicated by 2510 the word "MODIFY" in the subject line). At no time can a record be 2511 deleted from the registry. 2513 IANA will extract the record from the form and place the inserted or 2514 modified record into the appropriate section of the language subtag 2515 registry, grouping the records by their 'Type' field. Inserted 2516 records can be placed anywhere in the appropriate section; there is 2517 no guarantee of the order of the records beyond grouping them 2518 together by 'Type'. Modified records overwrite the record they 2519 replace. 2521 IANA will also update the File-Date record to contain the most recent 2522 modification date when performing any inserting or modification: 2523 included in any request to insert or modify records will be a new 2524 File-Date record indicating the acceptance date of the record. This 2525 record is to be placed first in the registry, replacing the existing 2526 File-Date record. In the event that the File-Date record present in 2527 the registry has a later date than the record being inserted or 2528 modified, then the latest (most recent) record will be preserved. 2529 IANA should process multiple registration requests in order according 2530 to the File-Date in the form, since one registration could otherwise 2531 cause a more recent change to be overwritten. 2533 The updated registry file MUST use the UTF-8 character encoding and 2534 IANA MUST check the registry file for proper encoding. Non-ASCII 2535 characters can be sent to IANA by attaching the registration form to 2536 the email message or by using various encodings in the mail message 2537 body (UTF-8 is recommended). IANA will verify any unclear or 2538 corrupted characters with the Language Subtag Reviewer prior to 2539 posting the updated registry. 2541 IANA will also archive and make publicly available from 2542 "http://www.iana.org/assignments/lang-subtags-templates/" each 2543 registration form. Note that multiple registrations can pertain to 2544 the same record in the registry. 2546 Developers who are dependent upon the language subtag registry 2547 sometimes would like to be informed of changes in the registry so 2548 that they can update their implementations. When any change is made 2549 to the language subtag registry, IANA will send an announcement 2550 message to "ietf-languages-announcements@iana.org" (a self- 2551 subscribing list that only IANA can post to). 2553 5.2. Extensions Registry 2555 The Language Tag Extensions Registry can contain at most 35 records 2556 and thus changes to this registry are expected to be very infrequent. 2558 Future work by IANA on the Language Tag Extensions Registry is 2559 limited to two cases. First, the IESG MAY request that new records 2560 be inserted into this registry from time to time. These requests 2561 MUST include the record to insert in the exact format described in 2562 Section 3.7. In addition, there MAY be occasional requests from the 2563 maintaining authority for a specific extension to update the contact 2564 information or URLs in the record. These requests MUST include the 2565 complete, updated record. IANA is not responsible for validating the 2566 information provided, only that it is properly formatted. It should 2567 reasonably be seen to come from the maintaining authority named in 2568 the record present in the registry. 2570 6. Security Considerations 2572 Language tags used in content negotiation, like any other information 2573 exchanged on the Internet, might be a source of concern because they 2574 might be used to infer the nationality of the sender, and thus 2575 identify potential targets for surveillance. 2577 This is a special case of the general problem that anything sent is 2578 visible to the receiving party and possibly to third parties as well. 2579 It is useful to be aware that such concerns can exist in some cases. 2581 The evaluation of the exact magnitude of the threat, and any possible 2582 countermeasures, is left to each application protocol (see BCP 72 2583 [RFC3552] for best current practice guidance on security threats and 2584 defenses). 2586 The language tag associated with a particular information item is of 2587 no consequence whatsoever in determining whether that content might 2588 contain possible homographs. The fact that a text is tagged as being 2589 in one language or using a particular script subtag provides no 2590 assurance whatsoever that it does not contain characters from scripts 2591 other than the one(s) associated with or specified by that language 2592 tag. 2594 Since there is no limit to the number of variant, private use, and 2595 extension subtags, and consequently no limit on the possible length 2596 of a tag, implementations need to guard against buffer overflow 2597 attacks. See Section 4.3 for details on language tag truncation, 2598 which can occur as a consequence of defenses against buffer overflow. 2600 Although the specification of valid subtags for an extension (see 2601 Section 3.7) MUST be available over the Internet, implementations 2602 SHOULD NOT mechanically depend on it being always accessible, to 2603 prevent denial-of-service attacks. 2605 7. Character Set Considerations 2607 The syntax in this document requires that language tags use only the 2608 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2609 character sets, so the composition of language tags should not have 2610 any character set issues. 2612 Rendering of characters based on the content of a language tag is not 2613 addressed in this memo. Historically, some languages have relied on 2614 the use of specific character sets or other information in order to 2615 infer how a specific character should be rendered (notably this 2616 applies to language- and culture-specific variations of Han 2617 ideographs as used in Japanese, Chinese, and Korean). When language 2618 tags are applied to spans of text, rendering engines sometimes use 2619 that information in deciding which font to use in the absence of 2620 other information, particularly where languages with distinct writing 2621 traditions use the same characters. 2623 8. Changes from RFC 4646 2625 The main goal for this revision of this document was to incorporate 2626 ISO 639-3 and its attendant set of language codes into the IANA 2627 Language Subtag Registry, permitting the identification of many more 2628 languages and dialects than previously supported. 2630 The specific changes in this document to meet these goals are: 2632 o Defines the incorporation of ISO 639-3 codes as language. It also 2633 permanently reserves and disallows the use of extlang subtags. 2634 The changes necessary to achieve this were: 2636 * something 2638 o Changed the ABNF related to grandfathered tags. The irregular 2639 tags are now listed. Well-formed grandfathered tags are now 2640 described by the 'langtag' production and the 'grandfathered' 2641 production was removed as a result. Also: added description of 2642 both types of grandfathered tags to Section 2.2.8. 2644 o Added the paragraph on "collections" to Section 4.1. 2646 o Changed the capitalization rules for 'Tag' fields in Section 3.1. 2648 o Split section 3.1 up into subsections. 2650 o Modified section 3.5 to allow Suppress-Script fields to be added, 2651 modified, or removed via the registration process. This was an 2652 erratum from RFC 4646. 2654 o Modified examples that used region code 'CS' (formerly Serbia and 2655 Montenegro) to use 'RS' (Serbia) instead. 2657 o Modified the rules for creating and maintaining record 2658 'Description' fields to prevent duplicates, including inverted 2659 duplicates. 2661 o Removed the lengthy description of why RFC 4646 was created from 2662 this section, which also caused the removal of the reference to 2663 XML Schema. 2665 o Modified the text in section 2.1 to place more emphasis on the 2666 fact that language tags are not case sensitive. 2668 o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS" 2669 and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the 2670 Suppress-Script on 'Latn' with 'fr'. 2672 o Changed the requirements for well-formedness to make singleton 2673 repetition checking optional (it is required for validity 2674 checking) in Section 2.2.9. 2676 o Changed the text in Section 2.2.9 referring to grandfathered 2677 checking to note that the list is now included in the ABNF. 2679 o Modified and added text to Section 3.2. The job description was 2680 placed first. A note was added making clear that the Language 2681 Subtag Reviewer may delegate various non-critical duties, 2682 including list moderation. Finally, additional text was added to 2683 make the appointment process clear and to clarify that decisions 2684 and performance of the reviewer are appealable. 2686 o Added text to Section 3.5 clarifying that the ietf-languages list 2687 is operated by whomever the IESG appoints. 2689 o Added text to Section 3.1.4 clarifying that the first Description 2690 in a 'language' record matches the corresponding Reference Name 2691 for the language in ISO 639-3. 2693 o Modified Section 2.2.9 to define classes of conformance related to 2694 specific tags (formerly 'well-formed' and 'valid' referred to 2695 implementations). Notes were added about the removal of 'extlang' 2696 from the ABNF provided in RFC 4646, allowing for well-formedness 2697 using this older definition. Reference to RFC 3066 well- 2698 formedness was also added. 2700 o Added text to the end of Section 3.1.2 noting that future versions 2701 of this document might add new field types to the Registry format 2702 and recommending that implementations ignore any unrecognized 2703 fields. 2705 o Added text about what the lack of a Suppress-Script field means in 2706 a record to Section 3.1.8. 2708 o Added text allowing the correction of misspellings and typographic 2709 errors to Section 3.1.4. 2711 o Added text to Section 3.1.7 disallowing Prefix field conflicts 2712 (such as circular prefix references). 2714 o Modified text in Section 3.5 to require the subtag reviewer to 2715 announce his/her decision (or extension) following the two-week 2716 period. Also clarified that any decision or failure to decide can 2717 be appealed. 2719 o Modified text in Section 4.1 to include the (heretofore anecdotal) 2720 guiding principle of tag choice, and clarifying the non-use of 2721 script subtags in non-written applications. Also updated examples 2722 in this section to use Chamic languages as an example of language 2723 collections. 2725 o Prohibited multiple use of the same variant in a tag (i.e. "de- 2726 1901-1901"). Previously this was only a recommendation 2727 ("SHOULD"). 2729 o Removed inappropriate [RFC2119] language from the illustration in 2730 Section 4.3.1. 2732 o Replaced the example of deprecating "zh-gouyu" with "zh- 2733 hakka"->"hak" in Section 4.4, noting that it was this document 2734 that caused the change. 2736 o Replaced the section in Section 4.1 dealing with "mul"/"und" to 2737 include the subtags 'zxx' and 'mis', as well as the tag 2738 "i-default". A normative reference to RFC 2277 was added, along 2739 with an informative reference to MARC21. 2741 o Added text to Section 3.5 clarifying that any modifications of a 2742 registration request must be sent to the ietf-languages list 2743 before submission to IANA. 2745 o Changed the ABNF for the record-jar format from using the LWSP 2746 production to use a folding whitespace production similar to obs- 2747 FWS in [RFC4234]. This effectively prevents unintentional blank 2748 lines inside a field. 2750 o Clarified and revised text in Section 3.3, Section 3.5, and 2751 Section 5.1 to clarify that the Language Subtag Reviewer sends the 2752 complete registration forms to IANA, that IANA extracts the record 2753 from the form, and that the forms must also be archived separately 2754 from the registry. 2756 o Added text to Section 5 requiring IANA to send an announcement to 2757 an ietf-languages-announce list whenever the registry is updated. 2759 o Modification of the registry to use UTF-8 as its character 2760 encoding. This also entails additional instructions to IANA and 2761 the Language Subtag Reviewer in the registration process. 2763 o Modified the rules in Section 2.2.4 so that "exceptionally 2764 reserved" ISO 3166-1 codes other than 'UK' were included into the 2765 registry. In particular, this allows the code 'EU' (European 2766 Union) to be used to form language tags or (more commonly) for 2767 applications that use the registry for region codes to reference 2768 this subtag. 2770 o Modified the IANA considerations section (Section 5) to remove 2771 unnecessary normative [RFC2119] language. 2773 9. References 2775 9.1. Normative References 2777 [ISO15924] 2778 International Organization for Standardization, "ISO 2779 15924:2004. Information and documentation -- Codes for the 2780 representation of names of scripts", January 2004. 2782 [ISO3166-1] 2783 International Organization for Standardization, "ISO 3166- 2784 1:2006. Codes for the representation of names of countries 2785 and their subdivisions -- Part 1: Country codes", 2786 November 2006. 2788 [ISO639-1] 2789 International Organization for Standardization, "ISO 639- 2790 1:2002. Codes for the representation of names of languages 2791 -- Part 1: Alpha-2 code", 2002. 2793 [ISO639-2] 2794 International Organization for Standardization, "ISO 639- 2795 2:1998. Codes for the representation of names of languages 2796 -- Part 2: Alpha-3 code, first edition", 1998. 2798 [ISO639-3] 2799 International Organization for Standardization, "ISO 639- 2800 3:2007. Codes for the representation of names of languages 2801 -- Part 3: Alpha-3 code for comprehensive coverage of 2802 languages", 2007. 2804 [ISO646] International Organization for Standardization, "ISO/IEC 2805 646:1991, Information technology -- ISO 7-bit coded 2806 character set for information interchange.", 1991. 2808 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2809 3", BCP 9, RFC 2026, October 1996. 2811 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2812 the IETF Standards Process", BCP 11, RFC 2028, 2813 October 1996. 2815 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2816 Requirement Levels", BCP 14, RFC 2119, March 1997. 2818 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 2819 Languages", BCP 18, RFC 2277, January 1998. 2821 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2822 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2823 October 1998. 2825 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2826 Understanding Concerning the Technical Work of the 2827 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2829 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2830 Timestamps", RFC 3339, July 2002. 2832 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 2833 Specifications: ABNF", RFC 4234, October 2005. 2835 [RFC4645] Ewell, D., Ed., "Initial Language Subtag Registry", 2836 September 2006, . 2838 [RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of Language 2839 Tags", September 2006, 2840 . 2842 [UN_M.49] Statistics Division, United Nations, "Standard Country or 2843 Area Codes for Statistical Use", UN Standard Country or 2844 Area Codes for Statistical Use, Revision 4 (United Nations 2845 publication, Sales No. 98.XVII.9, June 1999. 2847 9.2. Informative References 2849 [RFC1766] Alvestrand, H., "Tags for the Identification of 2850 Languages", RFC 1766, March 1995. 2852 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2853 Part Three: Message Header Extensions for Non-ASCII Text", 2854 RFC 2047, November 1996. 2856 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2857 Word Extensions: Character Sets, Languages, and 2858 Continuations", RFC 2231, November 1997. 2860 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2861 10646", RFC 2781, February 2000. 2863 [RFC3066] Alvestrand, H., "Tags for the Identification of 2864 Languages", BCP 47, RFC 3066, January 2001. 2866 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2867 Text on Security Considerations", BCP 72, RFC 3552, 2868 July 2003. 2870 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 2871 10646", STD 63, RFC 3629, November 2003. 2873 [RFC4646] Phillips, A., Ed. and M. Davis, Ed., "Tags for the 2874 Identification of Languages", September 2006, 2875 . 2877 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2878 Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003. 2879 ISBN 0-321-49081-0)", January 2007. 2881 [iso639.prin] 2882 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2883 Committee: Working principles for ISO 639 maintenance", 2884 March 2000, 2885 . 2888 [record-jar] 2889 Raymond, E., "The Art of Unix Programming", 2003, 2890 . 2892 [registry-update] 2893 Ewell, D., Ed., "Update to the Language Subtag Registry", 2894 September 2006, . 2897 Appendix A. Acknowledgements 2899 Any list of contributors is bound to be incomplete; please regard the 2900 following as only a selection from the group of people who have 2901 contributed to make this document what it is today. 2903 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the 2904 precursors of this document, made enormous contributions directly or 2905 indirectly to this document and are generally responsible for the 2906 success of language tags. 2908 The following people contributed to this document: 2910 Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan, 2911 Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion 2912 Gunn, Kent Karlsson, Chris Newman, Randy Presuhn, Stephen Silver, and 2913 many, many others. 2915 Very special thanks must go to Harald Tveit Alvestrand, who 2916 originated RFCs 1766 and 3066, and without whom this document would 2917 not have been possible. 2919 Special thanks go to Michael Everson, who served as the Language Tag 2920 Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as 2921 the Language Subtag Reviewer since the adoption of RFC 4646. 2923 Special thanks also to Doug Ewell, for his production of the first 2924 complete subtag registry, his work to support and maintain new 2925 registrations, and his careful editorship of both RFC 4645 and 2926 [registry-update]. 2928 Appendix B. Examples of Language Tags (Informative) 2930 Simple language subtag: 2932 de (German) 2934 fr (French) 2936 ja (Japanese) 2938 i-enochian (example of a grandfathered tag) 2940 Language subtag plus Script subtag: 2942 zh-Hant (Chinese written using the Traditional Chinese script) 2944 zh-Hans (Chinese written using the Simplified Chinese script) 2946 sr-Cyrl (Serbian written using the Cyrillic script) 2948 sr-Latn (Serbian written using the Latin script) 2950 Language-Script-Region: 2952 zh-Hans-CN (Chinese written using the Simplified script as used in 2953 mainland China) 2955 sr-Latn-RS (Serbian written using the Latin script as used in 2956 Serbia) 2958 Language-Variant: 2960 sl-rozaj (Resian dialect of Slovenian) 2962 sl-nedis (Nadiza dialect of Slovenian) 2964 Language-Region-Variant: 2966 de-CH-1901 (German as used in Switzerland using the 1901 variant 2967 [orthography]) 2969 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 2971 Language-Script-Region-Variant: 2973 hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as 2974 used in Italy) 2976 Language-Region: 2978 de-DE (German for Germany) 2980 en-US (English as used in the United States) 2982 es-419 (Spanish appropriate for the Latin America and Caribbean 2983 region using the UN region code) 2985 Private use subtags: 2987 de-CH-x-phonebk 2989 az-Arab-x-AZE-derbend 2991 Private use registry values: 2993 x-whatever (private use using the singleton 'x') 2995 qaa-Qaaa-QM-x-southern (all private tags) 2997 de-Qaaa (German, with a private script) 2999 sr-Latn-QM (Serbian, Latin-script, private region) 3001 sr-Qaaa-RS (Serbian, private script, for Serbia) 3003 Tags that use extensions (examples ONLY: extensions MUST be defined 3004 by revision or update to this document or by RFC): 3006 en-US-u-islamCal 3008 zh-CN-a-myExt-x-private 3010 en-a-myExt-b-another 3012 Some Invalid Tags: 3014 de-419-DE (two region tags) 3016 a-DE (use of a single-character subtag in primary position; note 3017 that there are a few grandfathered tags that start with "i-" that 3018 are valid) 3019 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter 3020 prefix) 3022 Appendix C. Examples of Registration Forms 3023 LANGUAGE SUBTAG REGISTRATION FORM 3024 1. Name of requester: Han Steenwijk 3025 2. E-mail address of requester: han.steenwijk @ unipd.it 3026 3. Record Requested: 3028 Type: variant 3029 Subtag: biske 3030 Description: The San Giorgio dialect of Resian 3031 Description: The Bila dialect of Resian 3032 Prefix: sl-rozaj 3033 Comments: The dialect of San Giorgio/Bila is one of the 3034 four major local dialects of Resian 3036 4. Intended meaning of the subtag: The local variety of Resian as 3037 spoken in San Giorgio/Bila 3039 5. Reference to published description of the language (book or 3040 article): 3041 -- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich 3042 govorov, Varsava - Peterburg: Vende - Kozancikov, 1875. 3044 LANGUAGE SUBTAG REGISTRATION FORM 3045 1. Name of requester: Jaska Zedlik 3046 2. E-mail address of requester: jz53 @ zedlik.com 3047 3. Record Requested: 3049 Type: variant 3050 Subtag: tarask 3051 Description: Belarusian in Taraskievica orthography 3052 Prefix: be 3053 Comments: The subtag represents Branislau Taraskievic's Belarusian 3054 orthography as published in "Bielaruski klasycny pravapis" by Juras 3055 Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka 3056 (Vilnia-Miensk 2005). 3058 4. Intended meaning of the subtag: 3060 The subtag is intended to represent the Belarusian orthography as 3061 published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk 3062 Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005). 3064 5. Reference to published description of the language (book or article): 3066 Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd. 3067 "Bielaruskaha kamitetu", 1929, 5th edition. 3069 Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier. 3070 Bielaruski klasycny pravapis. Vilnia-Miensk, 2005. 3072 6. Any other relevant information: 3074 Belarusian in Taraskievica orthography became widely used, especially in 3075 Belarusian-speaking Internet segment, but besides this some books and 3076 newspapers are also printed using this orthography of Belarusian. 3078 Authors' Addresses 3080 Addison Phillips (editor) 3081 Yahoo! Inc. 3083 Email: addison@inter-locale.com 3084 URI: http://www.inter-locale.com 3086 Mark Davis (editor) 3087 Google 3089 Email: mark.davis@macchiato.com or mark.davis@google.com 3091 Full Copyright Statement 3093 Copyright (C) The IETF Trust (2007). 3095 This document is subject to the rights, licenses and restrictions 3096 contained in BCP 78, and except as set forth therein, the authors 3097 retain all their rights. 3099 This document and the information contained herein are provided on an 3100 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 3101 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 3102 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 3103 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 3104 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 3105 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 3107 Intellectual Property 3109 The IETF takes no position regarding the validity or scope of any 3110 Intellectual Property Rights or other rights that might be claimed to 3111 pertain to the implementation or use of the technology described in 3112 this document or the extent to which any license under such rights 3113 might or might not be available; nor does it represent that it has 3114 made any independent effort to identify any such rights. Information 3115 on the procedures with respect to rights in RFC documents can be 3116 found in BCP 78 and BCP 79. 3118 Copies of IPR disclosures made to the IETF Secretariat and any 3119 assurances of licenses to be made available, or the result of an 3120 attempt made to obtain a general license or permission for the use of 3121 such proprietary rights by implementers or users of this 3122 specification can be obtained from the IETF on-line IPR repository at 3123 http://www.ietf.org/ipr. 3125 The IETF invites any interested party to bring to its attention any 3126 copyrights, patents or patent applications, or other proprietary 3127 rights that may cover technology that may be required to implement 3128 this standard. Please address the information to the IETF at 3129 ietf-ipr@ietf.org. 3131 Acknowledgment 3133 Funding for the RFC Editor function is provided by the IETF 3134 Administrative Support Activity (IASA).