idnits 2.17.1 draft-ietf-ltru-4646bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2741. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2718. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2725. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2731. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC4646, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 18, 2006) is 6339 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2860 ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) ** Downref: Normative reference to an Informational RFC: RFC 4645 -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) -- Obsolete informational reference (is this intentional?): RFC 4646 (Obsoleted by RFC 5646) Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc. 4 Obsoletes: 4646 (if approved) M. Davis, Ed. 5 Expires: June 21, 2007 Google 6 December 18, 2006 8 Tags for Identifying Languages 9 draft-ietf-ltru-4646bis-02 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on June 21, 2007. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 This document describes the structure, content, construction, and 43 semantics of language tags for use in cases where it is desirable to 44 indicate the language used in an information object. It also 45 describes how to register values for use in language tags and the 46 creation of user-defined extensions for private interchange. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 51 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 5 52 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 5 53 2.2. Language Subtag Sources and Interpretation . . . . . . . . 8 54 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . 9 55 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11 56 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 12 57 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 12 58 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 14 59 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 15 60 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 17 61 2.2.8. Grandfathered Registrations . . . . . . . . . . . . . 17 62 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 18 63 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 20 64 3.1. Format of the IANA Language Subtag Registry . . . . . . . 20 65 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 20 66 3.1.2. Record Definitions . . . . . . . . . . . . . . . . . . 21 67 3.1.3. Subtag and Tag Fields . . . . . . . . . . . . . . . . 23 68 3.1.4. Description Field . . . . . . . . . . . . . . . . . . 24 69 3.1.5. Deprecated Field . . . . . . . . . . . . . . . . . . . 25 70 3.1.6. Preferred-Value Field . . . . . . . . . . . . . . . . 25 71 3.1.7. Prefix Field . . . . . . . . . . . . . . . . . . . . . 26 72 3.1.8. Comments Field . . . . . . . . . . . . . . . . . . . . 26 73 3.1.9. Suppress-Script Field . . . . . . . . . . . . . . . . 26 74 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 27 75 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 27 76 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 29 77 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 33 78 3.6. Possibilities for Registration . . . . . . . . . . . . . . 36 79 3.7. Extensions and Extensions Registry . . . . . . . . . . . . 38 80 3.8. Update of the Language Subtag Registry . . . . . . . . . . 41 81 4. Formation and Processing of Language Tags . . . . . . . . . . 42 82 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 42 83 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 44 84 4.3. Length Considerations . . . . . . . . . . . . . . . . . . 45 85 4.3.1. Working with Limited Buffer Sizes . . . . . . . . . . 46 86 4.3.2. Truncation of Language Tags . . . . . . . . . . . . . 47 87 4.4. Canonicalization of Language Tags . . . . . . . . . . . . 47 88 4.5. Considerations for Private Use Subtags . . . . . . . . . . 49 89 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 51 90 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 51 91 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 51 92 6. Security Considerations . . . . . . . . . . . . . . . . . . . 53 93 7. Character Set Considerations . . . . . . . . . . . . . . . . . 54 94 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 55 95 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 57 96 9.1. Normative References . . . . . . . . . . . . . . . . . . . 57 97 9.2. Informative References . . . . . . . . . . . . . . . . . . 58 98 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 60 99 Appendix B. Examples of Language Tags (Informative) . . . . . . . 61 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 64 101 Intellectual Property and Copyright Statements . . . . . . . . . . 65 103 1. Introduction 105 Human beings on our planet have, past and present, used a number of 106 languages. There are many reasons why one would want to identify the 107 language used when presenting or requesting information. 109 A user's language preferences often need to be identified so that 110 appropriate processing can be applied. For example, the user's 111 language preferences in a Web browser can be used to select Web pages 112 appropriately. Language preferences can also be used to select among 113 tools (such as dictionaries) to assist in the processing or 114 understanding of content in different languages. 116 In addition, knowledge about the particular language used by some 117 piece of information content might be useful or even required by some 118 types of processing; for example, spell-checking, computer- 119 synthesized speech, Braille transcription, or high-quality print 120 renderings. 122 One means of indicating the language used is by labeling the 123 information content with an identifier or "tag". These tags can be 124 used to specify user preferences when selecting information content, 125 or for labeling additional attributes of content and associated 126 resources. 128 Tags can also be used to indicate additional language attributes of 129 content. For example, indicating specific information about the 130 dialect, writing system, or orthography used in a document or 131 resource may enable the user to obtain information in a form that 132 they can understand, or it can be important in processing or 133 rendering the given content into an appropriate form or style. 135 This document specifies a particular identifier mechanism (the 136 language tag) and a registration function for values to be used to 137 form tags. It also defines a mechanism for private use values and 138 future extension. 140 This document replaces [RFC4646], which replaced [RFC3066] and its 141 predecessor [RFC1766]. For a list of changes in this document, see 142 Section 8. 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in [RFC2119]. 148 2. The Language Tag 150 Language tags are used to help identify languages, whether spoken, 151 written, signed, or otherwise signaled, for the purpose of 152 communication. This includes constructed and artificial languages, 153 but excludes languages not intended primarily for human 154 communication, such as programming languages. 156 2.1. Syntax 158 The language tag is composed of one or more parts, known as 159 "subtags". Each subtag consists of a sequence of alphanumeric 160 characters. Subtags are distinguished and separated from one another 161 by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a 162 "primary language" subtag and a (possibly empty) series of subsequent 163 subtags, each of which refines or narrows the range of languages 164 identified by the overall tag. 166 Usually, each type of subtag is distinguished by length, position in 167 the tag, and content: subtags can be recognized solely by these 168 features. The only exception to this is a fixed list of 169 grandfathered tags registered under RFC 3066 [RFC3066]. This makes 170 it possible to construct a parser that can extract and assign some 171 semantic information to the subtags, even if the specific subtag 172 values are not recognized. Thus, a parser need not have an up-to- 173 date copy (or any copy at all) of the subtag registry to perform most 174 searching and matching operations. 176 The syntax of the language tag in ABNF [RFC4234] is: 178 Language-Tag = langtag 179 / privateuse ; private use tag 180 / grandfathered ; grandfathered registrations 182 langtag = (language 183 ["-" script] 184 ["-" region] 185 *("-" variant) 186 *("-" extension) 187 ["-" privateuse]) 189 language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code 190 / 4ALPHA ; reserved for future use 191 / 5*8ALPHA ; registered language subtag 193 extlang = *3("-" 3ALPHA) ; specific ISO 639-3 codes 195 script = 4ALPHA ; ISO 15924 code 197 region = 2ALPHA ; ISO 3166 code 198 / 3DIGIT ; UN M.49 code 200 variant = 5*8alphanum ; registered variants 201 / (DIGIT 3alphanum) 203 extension = singleton 1*("-" (2*8alphanum)) 205 singleton = "a"-"w" / "y"-"z" / "0"-"9" 206 ; Single alphanumerics; "x" reserved for private use 207 ; NOTE: ABNF is case-insensitive 209 privateuse = ("x") 1*("-" (1*8alphanum)) 211 grandfathered = langtag ; well-formed grandfathered tags 212 / irregular ; grandfathered that don't match langtag 214 irregular = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default" 215 / "i-enochian" / "i-hak" / "i-klingon" / "i-lux" 216 / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" 217 / "i-tay" / "i-tsu" / "sgn-BE-fr" / "sgn-BE-nl" 218 / "sgn-CH-de" 220 alphanum = (ALPHA / DIGIT) ; letters and numbers 222 Figure 1: Language Tag ABNF 224 All subtags have a maximum length of eight characters and whitespace 225 is not permitted in a language tag. There is a subtlety in the ABNF 226 production 'variant': variants starting with a digit MAY be four 227 characters long, while those starting with a letter MUST be at least 228 five characters long. For examples of language tags, see Appendix B. 230 Note Well: the ABNF syntax does not distinguish between upper and 231 lowercase. The appearance of upper and lowercase letters in the 232 varous ABNF productions above do not affect how implementations 233 interpret tags. That is, the tag "I-AMI" matches the item "i-ami" in 234 the 'irregular' production. At all times, the tags and their 235 subtags, including private use and extensions, are to be treated as 236 case insensitive: there exist conventions for the capitalization of 237 some of the subtags, but these MUST NOT be taken to carry meaning. 239 For example: 241 o [ISO639-1] recommends that language codes be written in lowercase 242 ('mn' Mongolian). 244 o [ISO3166-1] recommends that country codes be capitalized ('MN' 245 Mongolia). 247 o [ISO15924] recommends that script codes use lowercase with the 248 initial letter capitalized ('Cyrl' Cyrillic). 250 However, in the tags defined by this document, the uppercase US-ASCII 251 letters in the range 'A' through 'Z' are considered equivalent and 252 mapped directly to their US-ASCII lowercase equivalents in the range 253 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from 254 "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of 255 these variations conveys the same meaning: Mongolian written in the 256 Cyrillic script as used in Mongolia. 258 Although case distinctions do not carry meaning in language tags, 259 consistent formatting and presentation of the tags will aid users. 260 The format of the tags and subtags in the registry is RECOMMENDED. 261 In this format, all non-initial two-letter subtags are uppercase, all 262 non-initial four-letter subtags are titlecase, and all other subtags 263 are lowercase. 265 Note that although [RFC4234] refers to octets, the language tags 266 described in this document are sequences of characters from the US- 267 ASCII [ISO646] repertoire. Language tags MAY be used in documents 268 and applications that use other encodings, so long as these encompass 269 the US-ASCII repertoire. An example of this would be an XML document 270 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 272 2.2. Language Subtag Sources and Interpretation 274 The namespace of language tags and their subtags is administered by 275 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 276 the rules in Section 5 of this document. The Language Subtag 277 Registry maintained by IANA is the source for valid subtags: other 278 standards referenced in this section provide the source material for 279 that registry. 281 Terminology used in this document: 283 o Tag or tags refers to a complete language tag, such as 284 "sr-Latn-RS" or "az-Arab-IR". Examples of tags in this document 285 are enclosed in double-quotes ("en-US"). 287 o Subtag refers to a specific section of a tag, delimited by hyphen, 288 such as the subtag 'Hant' in "zh-Hant-CN". Examples of subtags in 289 this document are enclosed in single quotes ('Hant'). 291 o Code or codes refers to values defined in external standards (and 292 which are used as subtags in this document). For example, 'Hant' 293 is an [ISO15924] script code that was used to define the 'Hant' 294 script subtag for use in a language tag. Examples of codes in 295 this document are enclosed in single quotes ('en', 'Hant'). 297 The definitions in this section apply to the various subtags within 298 the language tags defined by this document, excepting those 299 "grandfathered" tags defined in Section 2.2.8. 301 Language tags are designed so that each subtag type has unique length 302 and content restrictions. These make identification of the subtag's 303 type possible, even if the content of the subtag itself is 304 unrecognized. This allows tags to be parsed and processed without 305 reference to the latest version of the underlying standards or the 306 IANA registry and makes the associated exception handling when 307 parsing tags simpler. 309 Subtags in the IANA registry that do not come from an underlying 310 standard can only appear in specific positions in a tag. 311 Specifically, they can only occur as primary language subtags or as 312 variant subtags. 314 Note that sequences of private use and extension subtags MUST occur 315 at the end of the sequence of subtags and MUST NOT be interspersed 316 with subtags defined elsewhere in this document. 318 Single-letter and single-digit subtags are reserved for current or 319 future use. These include the following current uses: 321 o The single-letter subtag 'x' is reserved to introduce a sequence 322 of private use subtags. The interpretation of any private use 323 subtags is defined solely by private agreement and is not defined 324 by the rules in this section or in any standard or registry 325 defined in this document. 327 o All other single-letter subtags are reserved to introduce 328 standardized extension subtag sequences as described in 329 Section 3.7. 331 The single-letter subtag 'i' is used by some grandfathered tags, such 332 as "i-default", where it always appears in the first position and 333 cannot be confused with an extension. 335 2.2.1. Primary Language Subtag 337 The primary language subtag is the first subtag in a language tag 338 (with the exception of private use and certain grandfathered tags) 339 and cannot be omitted. The following rules apply to the primary 340 language subtag: 342 1. All two-character primary language subtags were defined in the 343 IANA registry according to the assignments found in the standard 344 ISO 639 Part 1, "ISO 639-1:2002, Codes for the representation of 345 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 346 assignments subsequently made by the ISO 639-1 registration 347 authority (RA) or governing standardization bodies. 349 2. All three-character primary language subtags were defined in the 350 IANA registry according to the assignments found in either ISO 351 639 Part 2, "ISO 639-2:1998 - Codes for the representation of 352 names of languages -- Part 2: Alpha-3 code - edition 1" 353 [ISO639-2], ISO 639 Part 3, "ISO 639-3:200?, [[??missing official 354 title??]]", or assignments subsequently made by the relevant ISO 355 639 registration authorities or governing standardization bodies. 357 3. The subtags in the range 'qaa' through 'qtz' are reserved for 358 private use in language tags. These subtags correspond to codes 359 reserved by ISO 639-2 for private use. These codes MAY be used 360 for non-registered primary language subtags (instead of using 361 private use subtags following 'x-'). Please refer to Section 4.5 362 for more information on private use subtags. 364 4. All four-character language subtags are reserved for possible 365 future standardization. 367 5. All language subtags of 5 to 8 characters in length in the IANA 368 registry were defined via the registration process in Section 3.5 369 and MAY be used to form the primary language subtag. At the time 370 this document was created, there were no examples of this kind of 371 subtag and future registrations of this type will be discouraged: 372 primary languages are strongly RECOMMENDED for registration with 373 ISO 639, and proposals rejected by ISO 639/RA will be closely 374 scrutinized before they are registered with IANA. 376 6. The single-character subtag 'x' as the primary subtag indicates 377 that the language tag consists solely of subtags whose meaning is 378 defined by private agreement. For example, in the tag "x-fr-CH", 379 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 380 French language or the country of Switzerland (or any other value 381 in the IANA registry) unless there is a private agreement in 382 place to do so. See Section 4.5. 384 7. The single-character subtag 'i' is used by some grandfathered 385 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 386 grandfathered tags have a primary language subtag in their first 387 position.) 389 8. Other values MUST NOT be assigned to the primary subtag except by 390 revision or update of this document. 392 Note: For languages that have both an ISO 639-1 two-character code 393 and a three character code assigned by either ISO 639-2 or ISO 693-3, 394 only the ISO 639-1 two-character code is defined in the IANA 395 registry. 397 Note: For languages that have no ISO 639-1 two-character code and for 398 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 399 (Bibliographic) codes differ, only the Terminology code is defined in 400 the IANA registry. At the time this document was created, all 401 languages that had both kinds of three-character code were also 402 assigned a two-character code; it is expected that future assignments 403 of this nature will not occur. 405 Note: To avoid problems with versioning and subtag choice as 406 experienced during the transition between RFC 1766 and RFC 3066, as 407 well as the canonical nature of subtags defined by this document, the 408 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 409 RA-JAC) has included the following statement in [iso639.prin]: 411 "A language code already in ISO 639-2 at the point of freezing ISO 412 639-1 shall not later be added to ISO 639-1. This is to ensure 413 consistency in usage over time, since users are directed in 414 Internet applications to employ the alpha-3 code when an alpha-2 415 code for that language is not available." 417 In order to avoid instability in the canonical form of tags, if a 418 two-character code is added to ISO 639-1 for a language for which a 419 three-character code was already included in either ISO 639-2 or ISO 420 639-3, the two-character code MUST NOT be registered. See 421 Section 3.4. 423 For example, if some content were tagged with 'haw' (Hawaiian), which 424 currently has no two-character code, the tag would not be invalidated 425 if ISO 639-1 were to assign a two-character code to the Hawaiian 426 language at a later date. 428 Note: An example of independent primary language subtag registration 429 might include: one of the grandfathered IANA registrations is 430 "i-enochian". The subtag 'enochian' could be registered in the IANA 431 registry as a primary language subtag (assuming that ISO 639 does not 432 register this language first), making tags such as "enochian-AQ" and 433 "enochian-Latn" valid. 435 2.2.2. Extended Language Subtags 437 Extended language subtags are used to identify languages or dialects 438 that are subdivisions within another language. Such an enclosing 439 language is sometimes called a "collective" or "macro" language. The 440 following rules apply to the extended language subtags: 442 1. These subtags were defined in the IANA registry according to 443 assignments found in ISO 639 Part 3. 445 2. A sequence of up to three extended language subtags MAY appear in 446 a language tag. This sequence MUST follow the primary language 447 subtag and precede any other subtags. 449 3. Each extended language subtag MUST only be used with the exact 450 sequence of subtags that appears in the 'Prefix' field in its 451 registry record. 453 4. There MAY be up to three extended language subtags. 455 5. Other values MUST NOT be assigned to the extended language subtag 456 except by revision or update of this document. 458 Extended language subtag records MUST include exactly one 'Prefix' 459 field indicating an appropriate subtag or sequence of subtags for 460 that extended language subtag. 462 For example, the 'gan' subtag, representing the 'Gan' dialect of 463 Chinese, has a prefix of "zh" in its registry record. The 'cmn' 464 subtag, representing the 'Mandarin' dialect of Chinese has the same 465 prefix. Thus, the tags "zh-gan-Hant" or "zh-cmn-CN" are appropriate, 466 while the tag "zh-cmn-gan" is not. 468 Now suppose that 'xxx' is a subtag that represents a dialect of 469 'Gan'. It would have a 'Prefix' field of "zh-gan", making the tag 470 "zh-gan-xxx" appropriate, while the tags "zh-xxx" and "zh-xxx-gan" 471 would not be appropriate. 473 2.2.3. Script Subtag 475 Script subtags are used to indicate the script or writing system 476 variations that distinguish the written forms of a language or its 477 dialects. The following rules apply to the script subtags: 479 1. All four-character subtags were defined according to 480 [ISO15924]--"Codes for the representation of the names of 481 scripts": alpha-4 script codes, or subsequently assigned by the 482 ISO 15924 maintenance agency or governing standardization bodies, 483 denoting the script or writing system used in conjunction with 484 this language. 486 2. Script subtags MUST immediately follow the primary language 487 subtag and all extended language subtags and MUST occur before 488 any other type of subtag described below. 490 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 491 use in language tags. These subtags correspond to codes reserved 492 by ISO 15924 for private use. These codes MAY be used for non- 493 registered script values. Please refer to Section 4.5 for more 494 information on private use subtags. 496 4. Script subtags MUST NOT be registered using the process in 497 Section 3.5 of this document. Variant subtags MAY be considered 498 for registration for that purpose. 500 5. There MUST be at most one script subtag in a language tag, and 501 the script subtag SHOULD be omitted when it adds no 502 distinguishing value to the tag or when the primary language 503 subtag's record includes a Suppress-Script field listing the 504 applicable script subtag. 506 Example: "sr-Latn" represents Serbian written using the Latin script. 508 2.2.4. Region Subtag 510 Region subtags are used to indicate linguistic variations associated 511 with or appropriate to a specific country, territory, or region. 512 Typically, a region subtag is used to indicate regional dialects or 513 usage, or region-specific spelling conventions. A region subtag can 514 also be used to indicate that content is expressed in a way that is 515 appropriate for use throughout a region, for instance, Spanish 516 content tailored to be useful throughout Latin America. 518 The following rules apply to the region subtags: 520 1. Region subtags MUST follow any language, extended language, or 521 script subtags and MUST precede all other subtags. 523 2. All two-character subtags following the primary subtag were 524 defined in the IANA registry according to the assignments found 525 in [ISO3166-1] ("Codes for the representation of names of 526 countries and their subdivisions -- Part 1: Country codes") using 527 the list of alpha-2 country codes, or using assignments 528 subsequently made by the ISO 3166 maintenance agency or governing 529 standardization bodies. 531 3. All three-character subtags consisting of digit (numeric) 532 characters following the primary subtag were defined in the IANA 533 registry according to the assignments found in UN Standard 534 Country or Area Codes for Statistical Use [UN_M.49] or 535 assignments subsequently made by the governing standards body. 536 Note that not all of the UN M.49 codes are defined in the IANA 537 registry. The following rules define which codes are entered 538 into the registry as valid subtags: 540 A. UN numeric codes assigned to 'macro-geographical 541 (continental)' or sub-regions MUST be registered in the 542 registry. These codes are not associated with an assigned 543 ISO 3166 alpha-2 code and represent supra-national areas, 544 usually covering more than one nation, state, province, or 545 territory. 547 B. UN numeric codes for 'economic groupings' or 'other 548 groupings' MUST NOT be registered in the IANA registry and 549 MUST NOT be used to form language tags. 551 C. UN numeric codes for countries or areas with ambiguous ISO 552 3166 alpha-2 codes, when entered into the registry, MUST be 553 defined according to the rules in Section 3.4 and MUST be 554 used to form language tags that represent the country or 555 region for which they are defined. 557 D. UN numeric codes for countries or areas for which there is an 558 associated ISO 3166 alpha-2 code in the registry MUST NOT be 559 entered into the registry and MUST NOT be used to form 560 language tags. Note that the ISO 3166-based subtag in the 561 registry MUST actually be associated with the UN M.49 code in 562 question. 564 E. UN numeric codes and ISO 3166 alpha-2 codes for countries or 565 areas listed as eligible for registration in [RFC4645] but 566 not presently registered MAY be entered into the IANA 567 registry via the process described in Section 3.5. Once 568 registered, these codes MAY be used to form language tags. 570 F. All other UN numeric codes for countries or areas that do not 571 have an associated ISO 3166 alpha-2 code MUST NOT be entered 572 into the registry and MUST NOT be used to form language tags. 573 For more information about these codes, see Section 3.4. 575 4. Note: The alphanumeric codes in Appendix X of the UN document 576 MUST NOT be entered into the registry and MUST NOT be used to 577 form language tags. (At the time this document was created, 578 these values matched the ISO 3166 alpha-2 codes.) 580 5. There MUST be at most one region subtag in a language tag and the 581 region subtag MAY be omitted, as when it adds no distinguishing 582 value to the tag. 584 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 585 reserved for private use in language tags. These subtags 586 correspond to codes reserved by ISO 3166 for private use. These 587 codes MAY be used for private use region subtags (instead of 588 using a private use subtag sequence). Please refer to 589 Section 4.5 for more information on private use subtags. 591 "de-CH" represents German ('de') as used in Switzerland ('CH'). 593 "sr-Latn-RS" represents Serbian ('sr') written using Latin script 594 ('Latn') as used in Serbia ('RS'). 596 "es-419" represents Spanish ('es') appropriate to the UN-defined 597 Latin America and Caribbean region ('419'). 599 2.2.5. Variant Subtags 601 Variant subtags are used to indicate additional, well-recognized 602 variations that define a language or its dialects that are not 603 covered by other available subtags. The following rules apply to the 604 variant subtags: 606 1. Variant subtags are not associated with any external standard. 607 Variant subtags and their meanings are defined by the 608 registration process defined in Section 3.5. 610 2. Variant subtags MUST follow all of the other defined subtags, but 611 precede any extension or private use subtag sequences. 613 3. More than one variant MAY be used to form the language tag. 615 4. Variant subtags MUST be registered with IANA according to the 616 rules in Section 3.5 of this document before being used to form 617 language tags. In order to distinguish variants from other types 618 of subtags, registrations MUST meet the following length and 619 content restrictions: 621 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 622 at least five characters long. 624 2. Variant subtags that begin with a digit (0-9) MUST be at 625 least four characters long. 627 Variant subtag records in the language subtag registry MAY include 628 one or more 'Prefix' fields, which indicate the language tag or tags 629 that would make a suitable prefix (with other subtags, as 630 appropriate) in forming a language tag with the variant. For 631 example, the subtag 'nedis' has a Prefix of "sl", making it suitable 632 to form language tags such as "sl-nedis" and "sl-IT-nedis", but not 633 suitable for use in a tag such as "zh-nedis" or "it-IT-nedis". 635 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 637 "de-CH-1996" represents German as used in Switzerland and as written 638 using the spelling reform beginning in the year 1996 C.E. 640 Most variants that share a prefix are mutually exclusive. For 641 example, the German orthographic variations '1996' and '1901' SHOULD 642 NOT be used in the same tag, as they represent the dates of different 643 spelling reforms. A variant that can meaningfully be used in 644 combination with another variant SHOULD include a 'Prefix' field in 645 its registry record that lists that other variant. For example, if 646 another German variant 'example' were created that made sense to use 647 with '1996', then 'example' should include two Prefix fields: "de" 648 and "de-1996". 650 2.2.6. Extension Subtags 652 Extensions provide a mechanism for extending language tags for use in 653 various applications. See Section 3.7. The following rules apply to 654 extensions: 656 1. Extension subtags are separated from the other subtags defined 657 in this document by a single-character subtag ("singleton"). 658 The singleton MUST be one allocated to a registration authority 659 via the mechanism described in Section 3.7 and MUST NOT be the 660 letter 'x', which is reserved for private use subtag sequences. 662 2. Note: Private use subtag sequences starting with the singleton 663 subtag 'x' are described in Section 2.2.7 below. 665 3. An extension MUST follow at least a primary language subtag. 666 That is, a language tag cannot begin with an extension. 667 Extensions extend language tags, they do not override or replace 668 them. For example, "a-value" is not a well-formed language tag, 669 while "de-a-value" is. 671 4. Each singleton subtag MUST appear at most one time in each tag 672 (other than as a private use subtag). That is, singleton 673 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 674 ccc" is invalid because the subtag 'a' appears twice. Note that 675 the tag "en-a-bbb-x-a-ccc" is valid because the second 676 appearance of the singleton 'a' is in a private use sequence. 678 5. Extension subtags MUST meet all of the requirements for the 679 content and format of subtags defined in this document. 681 6. Extension subtags MUST meet whatever requirements are set by the 682 document that defines their singleton prefix and whatever 683 requirements are provided by the maintaining authority. 685 7. Each extension subtag MUST be from two to eight characters long 686 and consist solely of letters or digits, with each subtag 687 separated by a single '-'. 689 8. Each singleton MUST be followed by at least one extension 690 subtag. For example, the tag "tlh-a-b-foo" is invalid because 691 the first singleton 'a' is followed immediately by another 692 singleton 'b'. 694 9. Extension subtags MUST follow all language, extended language, 695 script, region, and variant subtags in a tag. 697 10. All subtags following the singleton and before another singleton 698 are part of the extension. Example: In the tag "fr-a-Latn", the 699 subtag 'Latn' does not represent the script subtag 'Latn' 700 defined in the IANA Language Subtag Registry. Its meaning is 701 defined by the extension 'a'. 703 11. In the event that more than one extension appears in a single 704 tag, the tag SHOULD be canonicalized as described in 705 Section 4.4. 707 For example, if the prefix singleton 'r' and the shown subtags were 708 defined, then the following tag would be a valid example: "en-Latn- 709 GB-boont-r-extended-sequence-x-private" 711 2.2.7. Private Use Subtags 713 Private use subtags are used to indicate distinctions in language 714 important in a given context by private agreement. The following 715 rules apply to private use subtags: 717 1. Private use subtags are separated from the other subtags defined 718 in this document by the reserved single-character subtag 'x'. 720 2. Private use subtags MUST conform to the format and content 721 constraints defined in the ABNF for all subtags. 723 3. Private use subtags MUST follow all language, extended language, 724 script, region, variant, and extension subtags in the tag. 725 Another way of saying this is that all subtags following the 726 singleton 'x' MUST be considered private use. Example: The 727 subtag 'US' in the tag "en-x-US" is a private use subtag. 729 4. A tag MAY consist entirely of private use subtags. 731 5. No source is defined for private use subtags. Use of private use 732 subtags is by private agreement only. 734 6. Private use subtags are NOT RECOMMENDED where alternatives exist 735 or for general interchange. See Section 4.5 for more information 736 on private use subtag choice. 738 For example: Users who wished to utilize codes from the Ethnologue 739 publication of SIL International for language identification might 740 agree to exchange tags such as "az-Arab-x-AZE-derbend". This example 741 contains two private use subtags. The first is 'AZE' and the second 742 is 'derbend'. 744 2.2.8. Grandfathered Registrations 746 Prior to RFC 4646, whole language tags were registered according to 747 the rules in RFC 1766 and/or RFC 3066. These registered tags 748 maintain their validity. Of those tags, those that were made 749 obsolete or redundant by the advent of RFC 4646 or by subsequent 750 registration of subtags are maintained in the registry in records as 751 "redundant" tag records. Those tags that would not be well-formed 752 according to the ABNF in this document or that contain subtags that 753 do not individually appear in the registry are maintained in the 754 registry in records of the "grandfathered" type. 756 Grandfathered tags contain one or more subtags that are not defined 757 in the Language Subtag Registry (see Section 3). Redundant tags 758 consist entirely of subtags defined above and whose independent 759 registration was superseded by [RFC4646]. For more information see 760 Section 3.8. 762 Some grandfathered tags are "regular" in that they match the 763 'langtag' production in Figure 1. In some cases, these tags could 764 become redundant if their (current unregistered) subtags were to be 765 registered (as variants, for example). In other cases, although the 766 subtags match the language tag pattern, the meaning assigned to the 767 various subtags is prohibited by rules elsewhere in this document. 768 Those tags can never become redundant. 770 The remaining grandfathered tags are "irregular" and do not match the 771 'langtag' production. These are listed in the 'irregular' production 772 in Figure 1. These grandfathered tags can never become redundant. 773 Many of these tags have been superseded by other registrations: their 774 record contains a Preferred-Value field that really ought to be used 775 to form language tags representing that value. 777 2.2.9. Classes of Conformance 779 Implementations sometimes need to describe their capabilities with 780 regard to the rules and practices described in this document. There 781 are two classes of conforming implementations described by this 782 document: "well-formed" processors and "validating" processors. 783 Claims of conformance SHOULD explicitly reference one of these 784 definitions. 786 An implementation of "well-formed" checking of language tags MUST 787 check that the tag and all of its subtags, including extension and 788 private use subtags, conform to the ABNF. Note that irregular 789 grandfathered tags are now listed in the 'irregular' production. 791 An implementation of "well-formed" checking SHOULD check that 792 singleton subtags that identify extensions do not repeat. For 793 example, the tag "en-a-xx-b-yy-a-zz" is not well-formed. Well-formed 794 processors are strongly encouraged to implement the canonicalization 795 rules contained in Section 4.4. 797 An implementation that claims to be validating MUST: 799 o Check that the tag is well-formed. 801 o Check that singletone subtags that identify extensions do not 802 repeat. 804 o Specify the particular registry date for which the implementation 805 performs validation of subtags. 807 o Check that either the tag is a grandfathered tag, or that all 808 language, script, region, and variant subtags consist of valid 809 codes for use in language tags according to the IANA registry as 810 of the particular date specified by the implementation. 812 o Specify which, if any, extension RFCs as defined in Section 3.7 813 are supported, including version, revision, and date. 815 o For any such extensions supported, check that all subtags used in 816 that extension are valid. 818 o For extended language subtags, check that the tag matches the 819 'Prefix' field associated with the subtag. The tag matches if the 820 'Prefix' exactly matches the start of the tag. For example, the 821 prefix "sgn-ase" matches the tag "sgn-ase-US" but does not match 822 the tag "sgn-bvs-ase-US". 824 3. Registry Format and Maintenance 826 This section defines the Language Subtag Registry and the maintenance 827 and update procedures associated with it, as well as a registry for 828 extensions to language tags (Section 3.7). 830 The Language Subtag Registry contains a comprehensive list of all of 831 the subtags valid in language tags. This allows implementers a 832 straightforward and reliable way to validate language tags. The 833 Language Subtag Registry will be maintained so that, except for 834 extension subtags, it is possible to validate all of the subtags that 835 appear in a language tag under the provisions of this document or its 836 revisions or successors. In addition, the meaning of the various 837 subtags will be unambiguous and stable over time. (The meaning of 838 private use subtags, of course, is not defined by the IANA registry.) 840 3.1. Format of the IANA Language Subtag Registry 842 The IANA Language Subtag Registry ("the registry") consists of a text 843 file that is machine readable in the format described in this 844 section, plus copies of the registration forms approved in accordance 845 with the process described in Section 3.5. The existing registration 846 forms for grandfathered and redundant tags taken from RFC 3066 will 847 be maintained as part of the obsolete RFC 3066 registry. The 848 remaining set of initial subtags will not have registration forms 849 created for them. 851 3.1.1. File Format 853 The registry is in the text format described below. This format was 854 based on the record-jar format described in [record-jar]. 856 Each line of text is limited to 72 characters, including all 857 whitespace. Records are separated by lines containing only the 858 sequence "%%" (%x25.25). 860 Each field can be viewed as a single, logical line of ASCII 861 characters, comprising a field-name and a field-body separated by a 862 COLON character (%x3A). For convenience, the field-body portion of 863 this conceptual entity can be split into a multiple-line 864 representation; this is called "folding". The format of the registry 865 is described by the following ABNF (per [RFC4234]): 867 registry = record *("%%" CRLF record) 868 record = 1*( field-name *SP ":" *SP field-body CRLF ) 869 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] 870 field-body = *(ASCCHAR/LWSP) 871 ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 872 UNICHAR = "&#x" 2*6HEXDIG ";" 874 Figure 2: Registry Format ABNF 876 The sequence '..' (%x2E.2E) in a field-body denotes a range of 877 values. Such a range represents all subtags of the same length that 878 are in alphabetic or numeric order within that range, including the 879 values explicitly mentioned. For example 'a..c' denotes the values 880 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 881 '13'. 883 Characters from outside the US-ASCII [ISO646] repertoire, as well as 884 the AMPERSAND character ("&", %x26) when it occurs in a field-body, 885 are represented by a "Numeric Character Reference" using hexadecimal 886 notation in the style used by [XML10] (see 887 ). This consists of the 888 sequence "&#x" (%x26.23.78) followed by a hexadecimal representation 889 of the character's code point in [ISO10646] followed by a closing 890 semicolon (%x3B). For example, the EURO SIGN, U+20AC, would be 891 represented by the sequence "€". Note that the hexadecimal 892 notation MAY have between two and six digits. 894 All fields whose field-body contains a date value use the "full-date" 895 format specified in [RFC3339]. For example: "2004-06-28" represents 896 June 28, 2004, in the Gregorian calendar. 898 3.1.2. Record Definitions 900 There are three types of records in the registry: "File-Date", 901 "Subtag", and "Tag" records. 903 The first record in the registry is a "File-Date" record. This 904 record contains the single field whose field-name is "File-Date" (see 905 Figure 2). The field-body of this record contains the last 906 modification date of this copy of the registry, making it possible to 907 compare different versions of the registry. The registry on the IANA 908 website is the most current. Versions with an older date than that 909 one are not up-to-date. 911 File-Date: 2004-06-28 912 %% 914 Figure 3: Example of the File-Date Record 915 Subsequent records represent either subtags or tags in the registry. 916 "Subtag" records contain a field with a field-name of "Subtag", 917 while, unsurprisingly, "Tag" records contain a field with a field- 918 name of "Tag". Each of the fields in each record MUST occur no more 919 than once, unless otherwise noted below. Each record MUST contain 920 the following fields: 922 o 'Type' 924 * Type's field-body MUST consist of one of the following strings: 925 "language", "extlang", "script", "region", "variant", 926 "grandfathered", and "redundant" and denotes the type of tag or 927 subtag. 929 o Either 'Subtag' or 'Tag' 931 * Subtag's field-body contains the subtag being defined. This 932 field MUST only appear in records of whose 'Type' has one of 933 these values: "language", "extlang", "script", "region", or 934 "variant". 936 * Tag's field-body contains a complete language tag. This field 937 MUST only appear in records whose 'Type' has one of these 938 values: "grandfathered" or "redundant". Note that the field- 939 body will always follow the 'grandfathered' production in the 940 ABNF in Section 2.1 942 o Description 944 * Description's field-body contains a non-normative description 945 of the subtag or tag. 947 o Added 949 * Added's field-body contains the date the record was added to 950 the registry. 952 Each record MAY also contain the following fields: 954 o Preferred-Value 956 * For fields of type 'script', 'region', and 'variant', 957 'Preferred-Value' contains the subtag of the same 'Type' that 958 is preferred for forming the language tag. 960 * For fields of type 'language' and 'extlang', 'Preferred-Value' 961 contains the language production (see Figure 1) that is 962 preferred when forming the language tag. This can be simply a 963 'language' subtag, or it can be a 'language' subtag followed by 964 an extended language sequence. 966 * For fields of type 'grandfathered' and 'redundant', a canonical 967 mapping to a complete language tag. 969 o Deprecated 971 * Deprecated's field-body contains the date the record was 972 deprecated. 974 o Prefix 976 * Prefix's field-body contains a language tag with which this 977 subtag MAY be used to form a new language tag, perhaps with 978 other subtags as well. This field MUST only appear in records 979 whose 'Type' field-body is 'variant' or 'extlang'. For 980 example, the 'Prefix' for the variant 'nedis' is 'sl', meaning 981 that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate 982 while the tag "is-nedis" is not. 984 o Comments 986 * Comments contains additional information about the subtag, as 987 deemed appropriate for understanding the registry and 988 implementing language tags using the subtag or tag. 990 o Suppress-Script 992 * Suppress-Script contains a script subtag that SHOULD NOT be 993 used to form language tags with the associated primary language 994 subtag. This field MUST only appear in records whose 'Type' 995 field-body is 'language'. See Section 4.1. 997 Future versions of this document might add additional fields to the 998 registry, so implementations SHOULD ignore fields found in the 999 registry that are not defined in this document. 1001 3.1.3. Subtag and Tag Fields 1003 The 'Subtag' field MUST use lowercase letters to form the subtag, 1004 with two exceptions. Subtags whose 'Type' field is 'script' (in 1005 other words, subtags defined by ISO 15924) MUST use titlecase. 1006 Subtags whose 'Type' field is 'region' (in other words, subtags 1007 defined by ISO 3166) MUST use uppercase. These exceptions mirror the 1008 use of case in the underlying standards. 1010 Each subtag in the tags contained in a 'Tag' field MUST be formatted 1011 using the rules in the preceeding paragraph. That is, all subtags 1012 are lowercase except for subtags that represent script or region 1013 codes. 1015 3.1.4. Description Field 1017 The field 'Description' contains a description of the tag or subtag 1018 in the record. The 'Description' field MAY appear more than once per 1019 record, that is, there can be multiple descriptions for a given 1020 record. At least one of the 'Description' fields MUST be written or 1021 transcribed into the Latin script; additional 'Description' fields 1022 MAY also include a description in a non-Latin script. Each 1023 'Description' field MUST be unique, both within the record in which 1024 it appears and for the collection of records of the same type. 1025 Moreover, formatting variations of the same description MUST NOT 1026 occur in that specific record or in any other record of the same 1027 type. For example, while the ISO 639-1 code 'fy' contains both the 1028 descriptions "Western Frisian" and "Frisian, Western", only one of 1029 these descriptions appears in the registry. 1031 The 'Description' field is used for identification purposes and 1032 SHOULD NOT be taken to represent the actual native name of the 1033 language or variation or to be in any particular language. 1035 For records taken from a source standard (such as ISO 639 or ISO 1036 3166), the 'Description' value(s) SHOULD also be taken from the 1037 source standard. Multiple descriptions in the source standard MUST 1038 be split into separate 'Description' fields. The source standard's 1039 descriptions MAY be edited, either prior to insertion or via the 1040 registration process. For fields of type 'language' or 'extlang', 1041 the first 'Description' field appearing in the Registry corresponds 1042 to the Reference Name assigned by ISO 639-3. This helps facilitate 1043 cross-referencing between ISO 639 and the registry. 1045 When creating or updating a record due to the action of one of the 1046 source standards, the Language Subtag Reviewer SHOULD remove 1047 duplicate or redundant descriptions and MAY edit descriptions to 1048 correct irregularities in formatting prior to submitting the proposed 1049 record to the ietf-languages list. 1051 Note: Descriptions in registry entries that correspond to ISO 639, 1052 ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate 1053 the meaning of that identifier as defined in the source standard at 1054 the time it was added to the registry. The description does not 1055 replace the content of the source standard itself. The descriptions 1056 are not intended to be the English localized names for the subtags. 1057 Localization or translation of language tag and subtag descriptions 1058 is out of scope of this document. 1060 3.1.5. Deprecated Field 1062 The field 'Deprecated' MAY be added to any record via the maintenance 1063 process described in Section 3.3 or via the registration process 1064 described in Section 3.5. Usually, the addition of a 'Deprecated' 1065 field is due to the action of one of the standards bodies, such as 1066 ISO 3166, withdrawing a code. In some historical cases, it might not 1067 have been possible to reconstruct the original deprecation date. For 1068 these cases, an approximate date appears in the registry. Although 1069 valid in language tags, subtags and tags with a 'Deprecated' field 1070 are deprecated and validating processors SHOULD NOT generate these 1071 subtags. Note that a record that contains a 'Deprecated' field and 1072 no corresponding 'Preferred-Value' field has no replacement mapping. 1074 3.1.6. Preferred-Value Field 1076 The field 'Preferred-Value' contains a mapping between the record in 1077 which it appears and another tag or subtag. The value in this field 1078 is strongly RECOMMENDED as the best choice to represent the value of 1079 this record when selecting a language tag. These values form three 1080 groups: 1082 1. ISO 639 language codes that were later withdrawn in favor of 1083 other codes. These values are mostly a historical curiosity. 1085 2. ISO 3166 region codes that have been withdrawn in favor of a new 1086 code. This sometimes happens when a country changes its name or 1087 administration in such a way that warrants a new region code. 1089 3. Grandfathered or redundant tags from RFC 3066. In many cases, 1090 these tags have become obsolete because the values they represent 1091 were later encoded by ISO 639. 1093 Records that contain a 'Preferred-Value' field MUST also have a 1094 'Deprecated' field. This field contains a date of deprecation. 1095 Thus, a language tag processor can use the registry to construct the 1096 valid, non-deprecated set of subtags for a given date. In addition, 1097 for any given tag, a processor can construct the set of valid 1098 language tags that correspond to that tag for all dates up to the 1099 date of the registry. The ability to do these mappings MAY be 1100 beneficial to applications that are matching, selecting, for 1101 filtering content based on its language tags. 1103 Note that 'Preferred-Value' mappings in records of type 'region' 1104 sometimes do not represent exactly the same meaning as the original 1105 value. There are many reasons for a country code to be changed, and 1106 the effect this has on the formation of language tags will depend on 1107 the nature of the change in question. 1109 In particular, the 'Preferred-Value' field does not imply retagging 1110 content that uses the affected subtag. 1112 The field 'Preferred-Value' MUST NOT be modified once created in the 1113 registry. The field MAY be added to records according to the rules 1114 in Section 3.3. 1116 The 'Preferred-Value' field in records of type "grandfathered" and 1117 "redundant" contains whole language tags that are strongly 1118 RECOMMENDED for use in place of the record's value. In many cases, 1119 the mappings were created by deprecation of the tags during the 1120 period before this document was adopted. For example, the tag "no- 1121 nyn" was deprecated in favor of the ISO 639-1-defined language code 1122 'nn'. 1124 3.1.7. Prefix Field 1126 The field of type 'Prefix' MUST NOT be removed from any record. The 1127 field-body for this type of field MUST NOT be modified. 1129 The field-body of the 'Prefix' field consists of a language tag whose 1130 subtags are appropriate to use with this subtag. For example, the 1131 variant subtag '1996' has a 'Prefix' field of "de". This means that 1132 tags starting with the sequence "de-" are appropriate with this 1133 subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while 1134 the tag "fr-1996" is an inappropriate choice. 1136 Records of type 'variant' MAY have more than one field of type 1137 'Prefix'. Additional fields of this type MAY be added to a 'variant' 1138 record via the registration process. 1140 Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. 1142 3.1.8. Comments Field 1144 The field 'Comments' MAY appear more than once per record. This 1145 field MAY be inserted or changed via the registration process and no 1146 guarantee of stability is provided. The content of this field is not 1147 restricted, except by the need to register the information, the 1148 suitability of the request, and by reasonable practical size 1149 limitations. 1151 3.1.9. Suppress-Script Field 1153 The field 'Suppress-Script' MUST only appear in records whose 'Type' 1154 field-body is 'language'. This field MUST NOT appear more than one 1155 time in a record. This field indicates a script used to write the 1156 overwhelming majority of documents for the given language and that 1157 therefore adds no distinguishing information to a language tag. It 1158 helps ensure greater compatibility between the language tags 1159 generated according to the rules in this document and language tags 1160 and tag processors or consumers based on RFC 3066. For example, 1161 virtually all Icelandic documents are written in the Latin script, 1162 making the subtag 'Latn' redundant in the tag "is-Latn". 1164 3.2. Language Subtag Reviewer 1166 The Language Subtag Reviewer is appointed by the IESG for an 1167 indefinite term, subject to removal or replacement at the IESG's 1168 discretion. The Language Subtag Reviewer moderates the ietf- 1169 languages mailing list, responds to requests for registration, and 1170 performs the other registry maintenance duties described in 1171 Section 3.3. Only the Language Subtag Reviewer is permitted to 1172 request IANA to change, update, or add records to the Language Subtag 1173 Registry. The Language Subtag Reviewer MAY delegate list moderation 1174 and other clerical duties as needed. 1176 The performance or decisions of the Language Subtag Reviewer MAY be 1177 appealed to the IESG under the same rules as other IETF decisions 1178 (see [RFC2026]). The IESG can reverse or overturn the decision of 1179 the Language Subtag Reviewer, provide guidance, or take other 1180 appropriate actions. 1182 3.3. Maintenance of the Registry 1184 Maintenance of the registry requires that as codes are assigned or 1185 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1186 Subtag Reviewer MUST evaluate each change, determine whether it 1187 conflicts with existing registry entries, and submit the information 1188 to IANA for inclusion in the registry. If a change takes place and 1189 the Language Subtag Reviewer does not do this in a timely manner, 1190 then any interested party MAY use the procedure in Section 3.5 to 1191 register the appropriate update. 1193 Note: The redundant and grandfathered entries together are the 1194 complete list of tags registered under [RFC3066]. The redundant tags 1195 are those that can now be formed using the subtags defined in the 1196 registry together with the rules of Section 2.2. The grandfathered 1197 entries include those that can never be legal under those same 1198 provisions plus those tags that contain subtags not yet registered 1199 or, perhaps, inappropriate for registration. 1201 The set of redundant and grandfathered tags is permanent and stable: 1202 new entries in this section MUST NOT be added and existing entries 1203 MUST NOT be removed. Records of type 'grandfathered' MAY have their 1204 type converted to 'redundant'; see item 12 in Section 3.6 for more 1205 information. The decision-making process about which tags were 1206 initially grandfathered and which were made redundant is described in 1207 [RFC4645]. 1209 RFC 3066 tags that were deprecated prior to the adoption of [RFC4646] 1210 are part of the list of grandfathered tags, and their component 1211 subtags were not included as registered variants (although they 1212 remain eligible for registration). For example, the tag "art-lojban" 1213 was deprecated in favor of the language subtag 'jbo'. 1215 The Language Subtag Reviewer MUST ensure that new subtags meet the 1216 requirements in Section 4.1 or submit an appropriate alternate subtag 1217 as described in that section. When either a change or addition to 1218 the registry is needed, the Language Subtag Reviewer MUST prepare the 1219 complete record, including all fields, and forward it to IANA for 1220 insertion into the registry. Each record being modified or inserted 1221 MUST be forwarded in a separate message. 1223 If a record represents a new subtag that does not currently exist in 1224 the registry, then the message's subject line MUST include the word 1225 "INSERT". If the record represents a change to an existing subtag, 1226 then the subject line of the message MUST include the word "MODIFY". 1227 The message MUST contain both the record for the subtag being 1228 inserted or modified and the new File-Date record. Here is an 1229 example of what the body of the message might contain: 1231 LANGUAGE SUBTAG MODIFICATION 1232 File-Date: 2005-01-02 1233 %% 1234 Type: variant 1235 Subtag: nedis 1236 Description: Natisone dialect 1237 Description: Nadiza dialect 1238 Added: 2003-10-09 1239 Prefix: sl 1240 Comments: This is a comment shown 1241 as an example. 1242 %% 1244 Figure 4: Example of a Language Subtag Modification Form 1246 Whenever an entry is created or modified in the registry, the 'File- 1247 Date' record at the start of the registry is updated to reflect the 1248 most recent modification date in the [RFC3339] "full-date" format. 1250 Before forwarding a new registration to IANA, the Language Subtag 1251 Reviewer MUST ensure that values in the 'Subtag' field match case 1252 according to the description in Section 3.1. 1254 3.4. Stability of IANA Registry Entries 1256 The stability of entries and their meaning in the registry is 1257 critical to the long-term stability of language tags. The rules in 1258 this section guarantee that a specific language tag's meaning is 1259 stable over time and will not change. 1261 These rules specifically deal with how changes to codes (including 1262 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1263 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1264 Subtag Registry. Assignments to the IANA Language Subtag Registry 1265 MUST follow the following stability rules: 1267 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1268 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1269 guaranteed to be stable over time. 1271 2. Values in the 'Description' field MUST NOT be changed in a way 1272 that would invalidate previously-existing tags. They MAY be 1273 broadened somewhat in scope, changed to add information, or 1274 adapted to the most common modern usage. For example, countries 1275 occasionally change their official names; a historical example 1276 of this would be "Upper Volta" changing to "Burkina Faso". 1278 3. Values in the field 'Prefix' MAY be added to records of type 1279 'variant' via the registration process. If a prefix is added to 1280 a variant record, 'Comment' fields SHOULD be used to explain 1281 different usages with the various prefixes. 1283 4. Values in the field 'Prefix' in records of type 'variant' MAY be 1284 modified, so long as the modifications broaden the set of 1285 prefixes. That is, a prefix MAY be replaced by one of its own 1286 prefixes. For example, the prefix "en-US" could be replaced by 1287 "en", but not by the prefixes "en-Latn", "fr", or "en-US-boont". 1288 If one of those prefixes were needed, a new Prefix SHOULD be 1289 registered. 1291 5. Values in the field 'Prefix' in records of type 'extlang' MUST 1292 NOT be modified. 1294 6. Values in the field 'Prefix' MUST NOT be removed. 1296 7. The field 'Comments' MAY be added, changed, modified, or removed 1297 via the registration process or any of the processes or 1298 considerations described in this section. 1300 8. The field 'Suppress-Script' MAY be added or removed via the 1301 registration process. 1303 9. Codes assigned by ISO 639-1 that do not conflict with existing 1304 two-letter primary language subtags and which have no 1305 corresponding three-letter primary or extended language subtags 1306 defined in the registry are entered into the IANA registry as 1307 new records of type 'language'. 1309 10. Codes assigned by ISO 639-2 that do not conflict with existing 1310 three-letter primary or extended language subtags are entered 1311 into the IANA registry as new records of type 'language'. 1313 11. Codes assigned by ISO 639-3 that do not conflict with existing 1314 three-letter primary or extended language subtags are entered 1315 into the IANA registry as new records. 1317 1. Codes that have a defined "macro-language" mapping at the 1318 time of their registration MUST be entered into the registry 1319 as records of type 'extlang' with a 'Prefix' field 1320 containing the appropriate prefix tag. 1322 2. Codes that represent sign languages MUST be entered into the 1323 registry as record of type 'extlang' with a 'Prefix' field 1324 that matches the Basic Language Range "sgn" (see Section 1325 3.3.1 "Basic Filtering" in [RFC4647]). 1327 3. All other codes MUST be entered into the registry as records 1328 of type 'language'. 1330 12. A record of type 'language' or 'extlang' MUST NOT be registered 1331 if there exists a record of either type with the same subtag 1332 value. For example, if an 'extlang' subtag 'foo' exists in the 1333 registry, all attempts to register a 'language' subtag 'foo' 1334 will be rejected. 1336 13. Codes assigned by ISO 15924 and ISO 3166 that do not conflict 1337 with existing subtags of the associated type and whose meaning 1338 is not the same as an existing subtag of the same type are 1339 entered into the IANA registry as new records. 1341 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1342 withdrawn by their respective maintenance or registration 1343 authority remain valid in language tags. A 'Deprecated' field 1344 containing the date of withdrawal MUST be added to the record. 1345 If a new record of the same type is added that represents a 1346 replacement value, then a 'Preferred-Value' field MAY also be 1347 added. The registration process MAY be used to add comments 1348 about the withdrawal of the code by the respective standard. 1350 Example The region code 'TL' was assigned to the country 'Timor- 1351 Leste', replacing the code 'TP' (which was assigned to 'East 1352 Timor' when it was under administration by Portugal). The 1353 subtag 'TP' remains valid in language tags, but its record 1354 contains the a 'Preferred-Value' of 'TL' and its field 1355 'Deprecated' contains the date the new code was assigned 1356 ('2004-07-06'). 1358 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1359 with existing subtags of the associated type, including subtags 1360 that are deprecated, MUST NOT be entered into the registry. The 1361 following additional considerations apply to subtag values that 1362 are reassigned: 1364 A. For ISO 639 codes, if the newly assigned code's meaning is 1365 not represented by a subtag in the IANA registry, the 1366 Language Subtag Reviewer, as described in Section 3.5, SHALL 1367 prepare a proposal for entering in the IANA registry as soon 1368 as practical a registered language subtag as an alternate 1369 value for the new code. The form of the registered language 1370 subtag will be at the discretion of the Language Subtag 1371 Reviewer and MUST conform to other restrictions on language 1372 subtags in this document. 1374 B. For all subtags whose meaning is derived from an external 1375 standard (that is, by ISO 639, ISO 15924, ISO 3166, or UN 1376 M.49), if a new meaning is assigned to an existing code and 1377 the new meaning broadens the meaning of that code, then the 1378 meaning for the associated subtag MAY be changed to match. 1379 The meaning of a subtag MUST NOT be narrowed, however, as 1380 this can result in an unknown proportion of the existing 1381 uses of a subtag becoming invalid. Note: ISO 639 1382 maintenance agency/registration authority (MA/RA) has 1383 adopted a similar stability policy. 1385 C. For ISO 15924 codes, if the newly assigned code's meaning is 1386 not represented by a subtag in the IANA registry, the 1387 Language Subtag Reviewer, as described in Section 3.5, SHALL 1388 prepare a proposal for entering in the IANA registry as soon 1389 as practical a registered variant subtag as an alternate 1390 value for the new code. The form of the registered variant 1391 subtag will be at the discretion of the Language Subtag 1392 Reviewer and MUST conform to other restrictions on variant 1393 subtags in this document. 1395 D. For ISO 3166 codes, if the newly assigned code's meaning is 1396 associated with the same UN M.49 code as another 'region' 1397 subtag, then the existing region subtag remains as the 1398 preferred value for that region and no new entry is created. 1399 A comment MAY be added to the existing region subtag 1400 indicating the relationship to the new ISO 3166 code. 1402 E. For ISO 3166 codes, if the newly assigned code's meaning is 1403 associated with a UN M.49 code that is not represented by an 1404 existing region subtag, then the Language Subtag Reviewer, 1405 as described in Section 3.5, SHALL prepare a proposal for 1406 entering the appropriate UN M.49 country code as an entry in 1407 the IANA registry. 1409 F. For ISO 3166 codes, if there is no associated UN numeric 1410 code, then the Language Subtag Reviewer SHALL petition the 1411 UN to create one. If there is no response from the UN 1412 within ninety days of the request being sent, the Language 1413 Subtag Reviewer SHALL prepare a proposal for entering in the 1414 IANA registry as soon as practical a registered variant 1415 subtag as an alternate value for the new code. The form of 1416 the registered variant subtag will be at the discretion of 1417 the Language Subtag Reviewer and MUST conform to other 1418 restrictions on variant subtags in this document. This 1419 situation is very unlikely to ever occur. 1421 16. UN M.49 has codes for both countries and areas (such as '276' 1422 for Germany) and geographical regions and sub-regions (such as 1423 '150' for Europe). UN M.49 country or area codes for which 1424 there is no corresponding ISO 3166 code SHOULD NOT be 1425 registered, except as a surrogate for an ISO 3166 code that is 1426 blocked from registration by an existing subtag. If such a code 1427 becomes necessary, then the registration authority for ISO 3166 1428 SHOULD first be petitioned to assign a code to the region. If 1429 the petition for a code assignment by ISO 3166 is refused or not 1430 acted on in a timely manner, the registration process described 1431 in Section 3.5 MAY then be used to register the corresponding UN 1432 M.49 code. This way, UN M.49 codes remain available as the 1433 value of last resort in cases where ISO 3166 reassigns a 1434 deprecated value in the registry. 1436 17. Stability provisions apply to grandfathered tags with this 1437 exception: should it be possible to compose one of the 1438 grandfathered tags from registered subtags, then the field 1439 'Type' in that record is changed from 'grandfathered' to 1440 'redundant'. Note that this will not affect language tags that 1441 match the grandfathered tag, since these tags will now match 1442 valid generative subtag sequences. For example, this document 1443 caused the ISO 639-3 code 'gan', used in the redundant tag "zh- 1444 gan", to be registered as an extended language subtag. The 1445 formerly-grandfathered tag "zh-gan" became a redundant tag as a 1446 result (but existing content or implementations that use "zh- 1447 gan" remain valid). 1449 3.5. Registration Procedure for Subtags 1451 The procedure given here MUST be used by anyone who wants to use a 1452 subtag not currently in the IANA Language Subtag Registry. 1454 Only subtags of type 'language' and 'variant' will be considered for 1455 independent registration of new subtags. Handling of subtags needed 1456 for stability and subtags necessary to keep the registry synchronized 1457 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1458 defined by this document are described in Section 3.3. Stability 1459 provisions are described in Section 3.4. 1461 This procedure MAY also be used to register or alter the information 1462 for the 'Description', 'Comments', 'Deprecated', 'Prefix', or 1463 'Suppress-Script' fields in a subtag's record as described in 1464 Section 3.4. Changes to all other fields in the IANA registry are 1465 NOT permitted. 1467 Registering a new subtag or requesting modifications to an existing 1468 tag or subtag starts with the requester filling out the registration 1469 form reproduced below. Note that each response is not limited in 1470 size so that the request can adequately describe the registration. 1471 The fields in the "Record Requested" section SHOULD follow the 1472 requirements in Section 3.1. 1474 LANGUAGE SUBTAG REGISTRATION FORM 1475 1. Name of requester: 1476 2. E-mail address of requester: 1477 3. Record Requested: 1479 Type: 1480 Subtag: 1481 Description: 1482 Prefix: 1483 Preferred-Value: 1484 Deprecated: 1485 Suppress-Script: 1486 Comments: 1488 4. Intended meaning of the subtag: 1489 5. Reference to published description 1490 of the language (book or article): 1491 6. Any other relevant information: 1493 Figure 5: The Language Subtag Registration Form 1494 The subtag registration form MUST be sent to 1495 for a two-week review period before it can 1496 be submitted to IANA. (This is an open list and can be joined by 1497 sending a request to . The list can 1498 be hosted by IANA or by any third party at the request of IESG.) 1500 Variant subtags are usually registered for use with a particular 1501 range of language tags. For example, the subtag 'rozaj' is intended 1502 for use with language tags that start with the primary language 1503 subtag "sl", since Resian is a dialect of Slovenian. Thus, the 1504 subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" 1505 or "sl-IT-rozaj". This information is stored in the 'Prefix' field 1506 in the registry. Variant registration requests SHOULD include at 1507 least one 'Prefix' field in the registration form. 1509 Extended language subtags MUST include exactly one 'Prefix' field. 1511 The 'Prefix' field for a given registered subtag exists in the IANA 1512 registry as a guide to usage. Additional prefixes MAY be added by 1513 filing an additional registration form. In that form, the "Any other 1514 relevant information:" field MUST indicate that it is the addition of 1515 a prefix. 1517 Requests to add a prefix to a variant subtag that imply a different 1518 semantic meaning will probably be rejected. For example, a request 1519 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1520 nedis" represented some German dialect would be rejected. The 1521 'nedis' subtag represents a particular Slovenian dialect and the 1522 additional registration would change the semantic meaning assigned to 1523 the subtag. A separate subtag SHOULD be proposed instead. 1525 The 'Description' field MUST contain a description of the tag being 1526 registered written or transcribed into the Latin script; it MAY also 1527 include a description in a non-Latin script. Non-ASCII characters 1528 MUST be escaped using the syntax described in Section 3.1. The 1529 'Description' field is used for identification purposes and doesn't 1530 necessarily represent the actual native name of the language or 1531 variation or to be in any particular language. 1533 While the 'Description' field itself is not guaranteed to be stable 1534 and errata corrections MAY be undertaken from time to time, attempts 1535 to provide translations or transcriptions of entries in the registry 1536 itself will probably be frowned upon by the community or rejected 1537 outright, as changes of this nature have an impact on the provisions 1538 in Section 3.4. 1540 When the two-week period has passed, the Language Subtag Reviewer 1541 either forwards the record to be inserted or modified to 1542 iana@iana.org according to the procedure described in Section 3.3, or 1543 rejects the request because of significant objections raised on the 1544 list or due to problems with constraints in this document (which MUST 1545 be explicitly cited). The Language Subtag Reviewer MAY also extend 1546 the review period in two-week increments to permit further 1547 discussion. The Language Subtag Reviewer MUST indicate on the list 1548 whether the registration has been accepted, rejected, or extended 1549 following each two-week period. 1551 Note that the Language Subtag Reviewer MAY raise objections on the 1552 list if he or she so desires. The important thing is that the 1553 objection MUST be made publicly. 1555 The applicant is free to modify a rejected application with 1556 additional information and submit it again; this restarts the two- 1557 week comment period. 1559 Decisions made by the Language Subtag Reviewer MAY be appealed to the 1560 IESG [RFC2028] under the same rules as other IETF decisions 1561 [RFC2026]. 1563 All approved registration forms are available online in the directory 1564 http://www.iana.org/numbers.html under "languages". 1566 Updates or changes to existing records follow the same procedure as 1567 new registrations. The Language Subtag Reviewer decides whether 1568 there is consensus to update the registration following the two week 1569 review period; normally, objections by the original registrant will 1570 carry extra weight in forming such a consensus. 1572 Registrations are permanent and stable. Once registered, subtags 1573 will not be removed from the registry and will remain a valid way in 1574 which to specify a specific language or variant. 1576 Note: The purpose of the "Reference to published description" section 1577 in the registration form is to aid in verifying whether a language is 1578 registered or what language or language variation a particular subtag 1579 refers to. In most cases, reference to an authoritative grammar or 1580 dictionary of that language will be useful; in cases where no such 1581 work exists, other well-known works describing that language or in 1582 that language MAY be appropriate. The Language Subtag Reviewer 1583 decides what constitutes "good enough" reference material. This 1584 requirement is not intended to exclude particular languages or 1585 dialects due to the size of the speaker population or lack of a 1586 standardized orthography. Minority languages will be considered 1587 equally on their own merits. 1589 3.6. Possibilities for Registration 1591 Possibilities for registration of subtags or information about 1592 subtags include: 1594 o Primary language subtags for languages not listed in ISO 639 that 1595 are not variants of any listed or registered language MAY be 1596 registered. At the time this document was created, there were no 1597 examples of this form of subtag. Before attempting to register a 1598 language subtag, there MUST be an attempt to register the language 1599 with ISO 639. Subtags MUST NOT be registered for languages 1600 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3, 1601 or that are under consideration by the ISO 639 registration 1602 authorities, or that have never been attempted for registration 1603 with those authorities. If ISO 639 has previously rejected a 1604 language for registration, it is reasonable to assume that there 1605 must be additional, very compelling evidence of need before it 1606 will be registered as a primary language subtag in the IANA 1607 registry (to the extent that it is very unlikely that any subtags 1608 will be registered of this type). 1610 o Dialect or other divisions or variations within a language, its 1611 orthography, writing system, regional or historical usage, 1612 transliteration or other transformation, or distinguishing 1613 variation MAY be registered as variant subtags. An example is the 1614 'rozaj' subtag (the Resian dialect of Slovenian). 1616 o The addition or maintenance of fields (generally of an 1617 informational nature) in Tag or Subtag records as described in 1618 Section 3.1 and subject to the stability provisions in 1619 Section 3.4. This includes descriptions, comments, deprecation 1620 and preferred values for obsolete or withdrawn codes, or the 1621 addition of script or extlang information to primary language 1622 subtags. 1624 o The addition of records and related field value changes necessary 1625 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1626 UN M.49 as described in Section 3.4. 1628 Subtags proposed for registration that would cause all or part of a 1629 grandfathered tag to become redundant but whose meaning conflicts 1630 with or alters the meaning of the grandfathered tag MUST be rejected. 1632 This document leaves the decision on what subtags or changes to 1633 subtags are appropriate (or not) to the registration process 1634 described in Section 3.5. 1636 Note: four-character primary language subtags are reserved to allow 1637 for the possibility of alpha4 codes in some future addition to the 1638 ISO 639 family of standards. 1640 ISO 639 defines a maintenance agency for additions to and changes in 1641 the list of languages in ISO 639. This agency is: 1643 International Information Centre for Terminology (Infoterm) 1644 Aichholzgasse 6/12, AT-1120 1645 Wien, Austria 1646 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1648 ISO 639-2 defines a maintenance agency for additions to and changes 1649 in the list of languages in ISO 639-2. This agency is: 1651 Library of Congress 1652 Network Development and MARC Standards Office 1653 Washington, D.C. 20540 USA 1654 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1655 URL: http://www.loc.gov/standards/iso639-2 1657 ISO 639-3 defines a maintenance agency for additions to and changes 1658 in the list of languages in ISO 639-3. This agency is: 1660 SIL International 1661 ISO 639-3 Registrar 1662 7500 W. Camp Wisdom Rd. 1663 Dallas, TX 75236 USA 1664 Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546 1665 Email: iso639-3@sil.org 1666 URL: http://www.sil.org/iso639-3 1668 The maintenance agency for ISO 3166 (country codes) is: 1670 ISO 3166 Maintenance Agency 1671 c/o International Organization for Standardization 1672 Case postale 56 1673 CH-1211 Geneva 20 Switzerland 1674 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1675 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1677 The registration authority for ISO 15924 (script codes) is: 1679 Unicode Consortium Box 391476 1680 Mountain View, CA 94039-1476, USA 1681 URL: http://www.unicode.org/iso15924 1683 The Statistics Division of the United Nations Secretariat maintains 1684 the Standard Country or Area Codes for Statistical Use and can be 1685 reached at: 1687 Statistical Services Branch 1688 Statistics Division 1689 United Nations, Room DC2-1620 1690 New York, NY 10017, USA 1692 Fax: +1-212-963-0623 1693 E-mail: statistics@un.org 1694 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1696 3.7. Extensions and Extensions Registry 1698 Extension subtags are those introduced by single-character subtags 1699 ("singletons") other than 'x'. They are reserved for the generation 1700 of identifiers that contain a language component and are compatible 1701 with applications that understand language tags. 1703 The structure and form of extensions are defined by this document so 1704 that implementations can be created that are forward compatible with 1705 applications that might be created using singletons in the future. 1706 In addition, defining a mechanism for maintaining singletons will 1707 lend stability to this document by reducing the likely need for 1708 future revisions or updates. 1710 Single-character subtags are assigned by IANA using the "IETF 1711 Consensus" policy defined by [RFC2434]. This policy requires the 1712 development of an RFC, which SHALL define the name, purpose, 1713 processes, and procedures for maintaining the subtags. The 1714 maintaining or registering authority, including name, contact email, 1715 discussion list email, and URL location of the registry, MUST be 1716 indicated clearly in the RFC. The RFC MUST specify or include each 1717 of the following: 1719 o The specification MUST reference the specific version or revision 1720 of this document that governs its creation and MUST reference this 1721 section of this document. 1723 o The specification and all subtags defined by the specification 1724 MUST follow the ABNF and other rules for the formation of tags and 1725 subtags as defined in this document. In particular, it MUST 1726 specify that case is not significant and that subtags MUST NOT 1727 exceed eight characters in length. 1729 o The specification MUST specify a canonical representation. 1731 o The specification of valid subtags MUST be available over the 1732 Internet and at no cost. 1734 o The specification MUST be in the public domain or available via a 1735 royalty-free license acceptable to the IETF and specified in the 1736 RFC. 1738 o The specification MUST be versioned, and each version of the 1739 specification MUST be numbered, dated, and stable. 1741 o The specification MUST be stable. That is, extension subtags, 1742 once defined by a specification, MUST NOT be retracted or change 1743 in meaning in any substantial way. 1745 o The specification MUST include in a separate section the 1746 registration form reproduced in this section (below) to be used in 1747 registering the extension upon publication as an RFC. 1749 o IANA MUST be informed of changes to the contact information and 1750 URL for the specification. 1752 IANA will maintain a registry of allocated single-character 1753 (singleton) subtags. This registry MUST use the record-jar format 1754 described by the ABNF in Section 3.1. Upon publication of an 1755 extension as an RFC, the maintaining authority defined in the RFC 1756 MUST forward this registration form to iesg@ietf.org, who MUST 1757 forward the request to iana@iana.org. The maintaining authority of 1758 the extension MUST maintain the accuracy of the record by sending an 1759 updated full copy of the record to iana@iana.org with the subject 1760 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only 1761 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY 1762 be modified in these updates. 1764 Failure to maintain this record, maintain the corresponding registry, 1765 or meet other conditions imposed by this section of this document MAY 1766 be appealed to the IESG [RFC2028] under the same rules as other IETF 1767 decisions (see [RFC2026]) and MAY result in the authority to maintain 1768 the extension being withdrawn or reassigned by the IESG. 1770 %% 1771 Identifier: 1772 Description: 1773 Comments: 1774 Added: 1775 RFC: 1776 Authority: 1777 Contact_Email: 1778 Mailing_List: 1779 URL: 1780 %% 1782 Figure 6: Format of Records in the Language Tag Extensions Registry 1784 'Identifier' contains the single-character subtag (singleton) 1785 assigned to the extension. The Internet-Draft submitted to define 1786 the extension SHOULD specify which letter or digit to use, although 1787 the IESG MAY change the assignment when approving the RFC. 1789 'Description' contains the name and description of the extension. 1791 'Comments' is an OPTIONAL field and MAY contain a broader description 1792 of the extension. 1794 'Added' contains the date the RFC was published in the "full-date" 1795 format specified in [RFC3339]. For example: 2004-06-28 represents 1796 June 28, 2004, in the Gregorian calendar. 1798 'RFC' contains the RFC number assigned to the extension. 1800 'Authority' contains the name of the maintaining authority for the 1801 extension. 1803 'Contact_Email' contains the email address used to contact the 1804 maintaining authority. 1806 'Mailing_List' contains the URL or subscription email address of the 1807 mailing list used by the maintaining authority. 1809 'URL' contains the URL of the registry for this extension. 1811 The determination of whether an Internet-Draft meets the above 1812 conditions and the decision to grant or withhold such authority rests 1813 solely with the IESG and is subject to the normal review and appeals 1814 process associated with the RFC process. 1816 Extension authors are strongly cautioned that many (including most 1817 well-formed) processors will be unaware of any special relationships 1818 or meaning inherent in the order of extension subtags. Extension 1819 authors SHOULD avoid subtag relationships or canonicalization 1820 mechanisms that interfere with matching or with length restrictions 1821 that sometimes exist in common protocols where the extension is used. 1822 In particular, applications MAY truncate the subtags in doing 1823 matching or in fitting into limited lengths, so it is RECOMMENDED 1824 that the most significant information be in the most significant 1825 (left-most) subtags and that the specification gracefully handle 1826 truncated subtags. 1828 When a language tag is to be used in a specific, known, protocol, it 1829 is RECOMMENDED that that the language tag not contain extensions not 1830 supported by that protocol. In addition, note that some protocols 1831 MAY impose upper limits on the length of the strings used to store or 1832 transport the language tag. 1834 3.8. Update of the Language Subtag Registry 1836 Upon adoption of this document the IANA Language Subtag Registry will 1837 need an update so that it contains the complete set of subtags valid 1838 in a language tag. This collection of subtags, along with a 1839 description of the process used to create it, is described by 1840 [registry-update]. IANA will publish the updated version of the 1841 registry described by this document using the instructions and 1842 content of [registry-update]. Once published by IANA, the 1843 maintenance procedures, rules, and registration processes described 1844 in this document will be available for new registrations or updates. 1846 Registrations that are in process under the rules defined in 1847 [RFC4646] when this document is adopted MUST be completed under the 1848 rules contained in this document. 1850 4. Formation and Processing of Language Tags 1852 This section addresses how to use the information in the registry 1853 with the tag syntax to choose, form, and process language tags. 1855 4.1. Choice of Language Tag 1857 One is sometimes faced with the choice between several possible tags 1858 for the same body of text. 1860 Interoperability is best served when all users use the same language 1861 tag in order to represent the same language. If an application has 1862 requirements that make the rules here inapplicable, then that 1863 application risks damaging interoperability. It is strongly 1864 RECOMMENDED that users not define their own rules for language tag 1865 choice. 1867 Subtags SHOULD only be used where they add useful distinguishing 1868 information; extraneous subtags interfere with the meaning, 1869 understanding, and processing of language tags. In particular, users 1870 and implementations SHOULD follow the 'Prefix' and 'Suppress-Script' 1871 fields in the registry (defined in Section 3.1): these fields provide 1872 guidance on when specific additional subtags SHOULD (and SHOULD NOT) 1873 be used in a language tag. 1875 Of particular note, many applications can benefit from the use of 1876 script subtags in language tags, as long as the use is consistent for 1877 a given context. Script subtags were not formally defined in RFC 1878 3066 and their use can affect matching and subtag identification by 1879 implementations of RFC 3066, as these subtags appear between the 1880 primary language and region subtags. For example, if a user requests 1881 content in an implementation of Section 2.5 of [RFC3066] using the 1882 language range "en-US", content labeled "en-Latn-US" will not match 1883 the request. Therefore, it is important to know when script subtags 1884 will customarily be used and when they ought not be used. In the 1885 registry, the Suppress-Script field helps ensure greater 1886 compatibility between the language tags generated according to the 1887 rules in this document and language tags and tag processors or 1888 consumers based on RFC 3066 by defining when users SHOULD NOT include 1889 a script subtag with a particular primary language subtag. 1891 Extended language subtags (type 'extlang' in the registry; see 1892 Section 3.1) also appear between the primary language and region 1893 subtags. Applications might benefit from their judicious use in 1894 forming language tags. [[ guidelines here?? ]] 1896 Standards, protocols, and applications that reference this document 1897 normatively but apply different rules to the ones given in this 1898 section MUST specify how the procedure varies from the one given 1899 here. 1901 The choice of subtags used to form a language tag SHOULD be guided by 1902 the following rules: 1904 1. Use as precise a tag as possible, but no more specific than is 1905 justified. Avoid using subtags that are not important for 1906 distinguishing content in an application. 1908 * For example, 'de' might suffice for tagging an email written 1909 in German, while "de-CH-1996" is probably unnecessarily 1910 precise for such a task. 1912 2. The script subtag SHOULD NOT be used to form language tags unless 1913 the script adds some distinguishing information to the tag. The 1914 field 'Suppress-Script' in the primary language record in the 1915 registry indicates script subtags that do not add distinguishing 1916 information for most applications. 1918 * For example, the subtag 'Latn' should not be used with the 1919 primary language 'en' because nearly all English documents are 1920 written in the Latin script and it adds no distinguishing 1921 information. However, if a document were written in English 1922 mixing Latin script with another script such as Braille 1923 ('Brai'), then it might be appropriate to choose to indicate 1924 both scripts to aid in content selection, such as the 1925 application of a style sheet. 1927 3. Use specific language subtags or subtag sequences in preference 1928 to subtags for language collections. A "language collection" is 1929 a subtag derived from one of the ISO 639-2 codes that represents 1930 multiple related languages. For example, the code 'nai' 1931 represents "North American languages". The registry contains 1932 values for the specific languages represented by this collective 1933 code. For example 'xxx' (language1) and 'yyy' (language2). Note 1934 that the languages contained in a collection (such as the two 1935 examples shown) are often unrelated except for their inclusion in 1936 the collection. 1938 4. If a tag or subtag has a 'Preferred-Value' field in its registry 1939 entry, then the value of that field SHOULD be used to form the 1940 language tag in preference to the tag or subtag in which the 1941 preferred value appears. 1943 * For example, use 'he' for Hebrew in preference to 'iw'. 1945 5. The 'und' (Undetermined) primary language subtag SHOULD NOT be 1946 used to label content, even if the language is unknown. Omitting 1947 the language tag altogether is preferred to using a tag with a 1948 primary language subtag of 'und'. The 'und' subtag MAY be useful 1949 for protocols that require a language tag to be provided. The 1950 'und' subtag MAY also be useful when matching language tags in 1951 certain situations. 1953 6. The 'mul' (Multiple) primary language subtag SHOULD NOT be used 1954 whenever the protocol allows the separate tags for multiple 1955 languages, as is the case for the Content-Language header in 1956 HTTP. The 'mul' subtag conveys little useful information: 1957 content in multiple languages SHOULD individually tag the 1958 languages where they appear or otherwise indicate the actual 1959 language in preference to the 'mul' subtag. 1961 7. The same variant subtag SHOULD NOT be used more than once within 1962 a language tag. 1964 * For example, do not use "de-DE-1901-1901". 1966 To ensure consistent backward compatibility, this document contains 1967 several provisions to account for potential instability in the 1968 standards used to define the subtags that make up language tags. 1969 These provisions mean that no language tag created under the rules in 1970 this document will become obsolete. 1972 4.2. Meaning of the Language Tag 1974 The relationship between the tag and the information it relates to is 1975 defined by the context in which the tag appears. Accordingly, this 1976 section gives only possible examples of its usage. 1978 o For a single information object, the associated language tags 1979 might be interpreted as the set of languages that is necessary for 1980 a complete comprehension of the complete object. Example: Plain 1981 text documents. 1983 o For an aggregation of information objects, the associated language 1984 tags could be taken as the set of languages used inside components 1985 of that aggregation. Examples: Document stores and libraries. 1987 o For information objects whose purpose is to provide alternatives, 1988 the associated language tags could be regarded as a hint that the 1989 content is provided in several languages and that one has to 1990 inspect each of the alternatives in order to find its language or 1991 languages. In this case, the presence of multiple tags might not 1992 mean that one needs to be multi-lingual to get complete 1993 understanding of the document. Example: MIME multipart/ 1994 alternative. 1996 o In markup languages, such as HTML and XML, language information 1997 can be added to each part of the document identified by the markup 1998 structure (including the whole document itself). For example, one 1999 could write C'est la vie. inside a 2000 Norwegian document; the Norwegian-speaking user could then access 2001 a French-Norwegian dictionary to find out what the marked section 2002 meant. If the user were listening to that document through a 2003 speech synthesis interface, this formation could be used to signal 2004 the synthesizer to appropriately apply French text-to-speech 2005 pronunciation rules to that span of text, instead of applying the 2006 inappropriate Norwegian rules. 2008 Language tags are related when they contain a similar sequence of 2009 subtags. For example, if a language tag B contains language tag A as 2010 a prefix, then B is typically "narrower" or "more specific" than A. 2011 Thus, "zh-Hant-TW" is more specific than "zh-Hant". 2013 This relationship is not guaranteed in all cases: specifically, 2014 languages that begin with the same sequence of subtags are NOT 2015 guaranteed to be mutually intelligible, although they might be. For 2016 example, the tag "az" shares a prefix with both "az-Latn" 2017 (Azerbaijani written using the Latin script) and "az-Cyrl" 2018 (Azerbaijani written using the Cyrillic script). A person fluent in 2019 one script might not be able to read the other, even though the text 2020 might be identical. Content tagged as "az" most probably is written 2021 in just one script and thus might not be intelligible to a reader 2022 familiar with the other script. 2024 4.3. Length Considerations 2026 There is no defined upper limit on the size of language tags. While 2027 historically most language tags have consisted of language and region 2028 subtags with a combined total length of up to six characters, larger 2029 tags have always been both possible and actually appeared in use. 2031 Neither the language tag syntax nor other requirements in this 2032 document impose a fixed upper limit on the number of subtags in a 2033 language tag (and thus an upper bound on the size of a tag). The 2034 language tag syntax suggests that, depending on the specific 2035 language, more subtags (and thus a longer tag) are sometimes 2036 necessary to completely identify the language for certain 2037 applications; thus, it is possible to envision long or complex subtag 2038 sequences. 2040 4.3.1. Working with Limited Buffer Sizes 2042 Some applications and protocols are forced to allocate fixed buffer 2043 sizes or otherwise limit the length of a language tag. A conformant 2044 implementation or specification MAY refuse to support the storage of 2045 language tags that exceed a specified length. Any such limitation 2046 SHOULD be clearly documented, and such documentation SHOULD include 2047 what happens to longer tags (for example, whether an error value is 2048 generated or the language tag is truncated). A protocol that allows 2049 tags to be truncated at an arbitrary limit, without giving any 2050 indication of what that limit is, has the potential for causing harm 2051 by changing the meaning of tags in substantial ways. 2053 In practice, most language tags do not require more than a few 2054 subtags and will not approach reasonably sized buffer limitations; 2055 see Section 4.1. 2057 Some specifications or protocols have limits on tag length but do not 2058 have a fixed length limitation. For example, [RFC2231] has no 2059 explicit length limitation: the length available for the language tag 2060 is constrained by the length of other header components (such as the 2061 charset's name) coupled with the 76-character limit in [RFC2047]. 2062 Thus, the "limit" might be 50 or more characters, but it could 2063 potentially be quite small. 2065 The considerations for assigning a buffer limit are: 2067 Implementations SHOULD NOT truncate language tags unless the 2068 meaning of the tag is purposefully being changed, or unless the 2069 tag does not fit into a limited buffer size specified by a 2070 protocol for storage or transmission. 2072 Implementations SHOULD warn the user when a tag is truncated since 2073 truncation changes the semantic meaning of the tag. 2075 Implementations of protocols or specifications that are space 2076 constrained but do not have a fixed limit SHOULD use the longest 2077 possible tag in preference to truncation. 2079 Protocols or specifications that specify limited buffer sizes for 2080 language tags MUST allow for language tags of up to 33 characters. 2082 Protocols or specifications that specify limited buffer sizes for 2083 language tags SHOULD allow for language tags of at least 42 2084 characters. 2086 The following illustration shows how the 42-character recommendation 2087 was derived. The combination of language and extended language 2088 subtags was chosen for future compatibility. At up to 15 characters, 2089 this combination is longer than the longest possible primary language 2090 subtag (8 characters): 2092 language = 3 (ISO 639-2; ISO 639-1 requires 2) 2093 extlang1 = 4 (each subsequent subtag includes '-') 2094 extlang2 = 4 (unlikely: needs prefix="language-extlang1") 2095 extlang3 = 4 (extremely unlikely) 2096 script = 5 (if not suppressed: see Section 4.1) 2097 region = 4 (UN M.49; ISO 3166 requires 3) 2098 variant1 = 9 (MUST have language as a prefix) 2099 variant2 = 9 (MUST have language-variant1 as a prefix) 2101 total = 42 characters 2103 Figure 7: Derivation of the Limit on Tag Length 2105 4.3.2. Truncation of Language Tags 2107 Truncation of a language tag alters the meaning of the tag, and thus 2108 SHOULD be avoided. However, truncation of language tags is sometimes 2109 necessary due to limited buffer sizes. Such truncation MUST NOT 2110 permit a subtag to be chopped off in the middle or the formation of 2111 invalid tags (for example, one ending with the "-" character). 2113 This means that applications or protocols that truncate tags MUST do 2114 so by progressively removing subtags along with their preceding "-" 2115 from the right side of the language tag until the tag is short enough 2116 for the given buffer. If the resulting tag ends with a single- 2117 character subtag, that subtag and its preceding "-" MUST also be 2118 removed. For example: 2120 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 2121 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 2122 2. zh-Latn-CN-variant1-a-extend1 2123 3. zh-Latn-CN-variant1 2124 4. zh-Latn-CN 2125 5. zh-Latn 2126 6. zh 2128 Figure 8: Example of Tag Truncation 2130 4.4. Canonicalization of Language Tags 2132 Since a particular language tag is sometimes used by many processes, 2133 language tags SHOULD always be created or generated in a canonical 2134 form. 2136 A language tag is in canonical form when: 2138 1. The tag is well-formed according the rules in Section 2.1 and 2139 Section 2.2. 2141 2. Subtags of type 'Region' that have a Preferred-Value mapping in 2142 the IANA registry (see Section 3.1) SHOULD be replaced with their 2143 mapped value. Note: In rare cases, the mapped value will also 2144 have a Preferred-Value. 2146 3. Redundant or grandfathered tags that have a Preferred-Value 2147 mapping in the IANA registry (see Section 3.1) MUST be replaced 2148 with their mapped value. These items either are deprecated 2149 mappings created before the adoption of this document (such as 2150 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 2151 the result of later registrations or additions to this document 2152 (for example, "zh-guoyu" might be mapped to a language-extlang 2153 combination such as "zh-cmn" by some future update of this 2154 document). 2156 4. Other subtags that have a Preferred-Value mapping in the IANA 2157 registry (see Section 3.1) MUST be replaced with their mapped 2158 value. These items consist entirely of clerical corrections to 2159 ISO 639-1 in which the deprecated subtags have been maintained 2160 for compatibility purposes. 2162 5. If more than one extension subtag sequence exists, the extension 2163 sequences are ordered into case-insensitive ASCII order by 2164 singleton subtag. 2166 Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical 2167 form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in 2168 canonical form. 2170 Example: The language tag "en-BU" (English as used in Burma) is not 2171 canonical because the 'BU' subtag has a canonical mapping to 'MM' 2172 (Myanmar), although the tag "en-BU" maintains its validity. 2174 Canonicalization of language tags does not imply anything about the 2175 use of upper or lowercase letters when processing or comparing 2176 subtags (and as described in Section 2.1). All comparisons MUST be 2177 performed in a case-insensitive manner. 2179 When performing canonicalization of language tags, processors MAY 2180 regularize the case of the subtags (that is, this process is 2181 OPTIONAL), following the case used in the registry. Note that this 2182 corresponds to the following casing rules: uppercase all non-initial 2183 two-letter subtags; titlecase all non-initial four-letter subtags; 2184 lowercase everything else. 2186 Note: Case folding of ASCII letters in certain locales, unless 2187 carefully handled, sometimes produces non-ASCII character values. 2188 The Unicode Character Database file "SpecialCasing.txt" defines the 2189 specific cases that are known to cause problems with this. In 2190 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2191 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2192 Implementers SHOULD specify a locale-neutral casing operation to 2193 ensure that case folding of subtags does not produce this value, 2194 which is illegal in language tags. For example, if one were to 2195 uppercase the region subtag 'in' using Turkish locale rules, the 2196 sequence U+0130 U+004E would result instead of the expected 'IN'. 2198 Note: if the field 'Deprecated' appears in a registry record without 2199 an accompanying 'Preferred-Value' field, then that tag or subtag is 2200 deprecated without a replacement. Validating processors SHOULD NOT 2201 generate tags that include these values, although the values are 2202 canonical when they appear in a language tag. 2204 An extension MUST define any relationships that exist between the 2205 various subtags in the extension and thus MAY define an alternate 2206 canonicalization scheme for the extension's subtags. Extensions MAY 2207 define how the order of the extension's subtags are interpreted. For 2208 example, an extension could define that its subtags are in canonical 2209 order when the subtags are placed into ASCII order: that is, "en-a- 2210 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2211 define that the order of the subtags influences their semantic 2212 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2213 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2214 so that they are tolerant of the typical processes described in 2215 Section 3.7. 2217 4.5. Considerations for Private Use Subtags 2219 Private use subtags, like all other subtags, MUST conform to the 2220 format and content constraints in the ABNF. Private use subtags have 2221 no meaning outside the private agreement between the parties that 2222 intend to use or exchange language tags that employ them. The same 2223 subtags MAY be used with a different meaning under a separate private 2224 agreement. They SHOULD NOT be used where alternatives exist and 2225 SHOULD NOT be used in content or protocols intended for general use. 2227 Private use subtags are simply useless for information exchange 2228 without prior arrangement. The value and semantic meaning of private 2229 use tags and of the subtags used within such a language tag are not 2230 defined by this document. 2232 Subtags defined in the IANA registry as having a specific private use 2233 meaning convey more information that a purely private use tag 2234 prefixed by the singleton subtag 'x'. For applications, this 2235 additional information MAY be useful. 2237 For example, the region subtags 'AA', 'ZZ', and in the ranges 2238 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2239 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2240 great deal of public, interchangeable information about the language 2241 material (that it is Chinese in the simplified Chinese script and is 2242 suitable for some geographic region 'XQ'). While the precise 2243 geographic region is not known outside of private agreement, the tag 2244 conveys far more information than an opaque tag such as "x-someLang", 2245 which contains no information about the language subtag or script 2246 subtag outside of the private agreement. 2248 However, in some cases content tagged with private use subtags MAY 2249 interact with other systems in a different and possibly unsuitable 2250 manner compared to tags that use opaque, privately defined subtags, 2251 so the choice of the best approach sometimes depends on the 2252 particular domain in question. 2254 5. IANA Considerations 2256 This section deals with the processes and requirements necessary for 2257 IANA to undertake to maintain the subtag and extension registries as 2258 defined by this document and in accordance with the requirements of 2259 [RFC2434]. 2261 The impact on the IANA maintainers of the two registries defined by 2262 this document will be a small increase in the frequency of new 2263 entries or updates. 2265 5.1. Language Subtag Registry 2267 Upon adoption of this document, IANA will update the registry using 2268 instructions and content provided in a companion document: [registry- 2269 update]. The criteria and process for selecting the updated set of 2270 records are described in that document. The updated set of records 2271 represents no impact on IANA, since the work to create it will be 2272 performed externally. 2274 Future work on the Language Subtag Registry has been limited to 2275 inserting or replacing whole records preformatted for IANA by the 2276 Language Subtag Reviewer as described in Section 3.3 of this document 2277 and archiving the forwarded registration form. 2279 Each record MUST be sent to iana@iana.org with a subject line 2280 indicating whether the enclosed record is an insertion of a new 2281 record (indicated by the word "INSERT" in the subject line) or a 2282 replacement of an existing record (indicated by the word "MODIFY" in 2283 the subject line). Records MUST NOT be deleted from the registry. 2284 IANA MUST place any inserted or modified records into the appropriate 2285 section of the language subtag registry, grouping the records by 2286 their 'Type' field. Inserted records MAY be placed anywhere in the 2287 appropriate section; there is no guarantee of the order of the 2288 records beyond grouping them together by 'Type'. Modified records 2289 MUST overwrite the record they replace. 2291 Included in any request to insert or modify records MUST be a new 2292 File-Date record. This record MUST be placed first in the registry. 2293 In the event that the File-Date record present in the registry has a 2294 later date than the record being inserted or modified, the existing 2295 record MUST be preserved. 2297 5.2. Extensions Registry 2299 The Language Tag Extensions Registry can contain at most 35 records 2300 and thus changes to this registry are expected to be very infrequent. 2302 Future work by IANA on the Language Tag Extensions Registry is 2303 limited to two cases. First, the IESG MAY request that new records 2304 be inserted into this registry from time to time. These requests 2305 MUST include the record to insert in the exact format described in 2306 Section 3.7. In addition, there MAY be occasional requests from the 2307 maintaining authority for a specific extension to update the contact 2308 information or URLs in the record. These requests MUST include the 2309 complete, updated record. IANA is not responsible for validating the 2310 information provided, only that it is properly formatted. It should 2311 reasonably be seen to come from the maintaining authority named in 2312 the record present in the registry. 2314 6. Security Considerations 2316 Language tags used in content negotiation, like any other information 2317 exchanged on the Internet, might be a source of concern because they 2318 might be used to infer the nationality of the sender, and thus 2319 identify potential targets for surveillance. 2321 This is a special case of the general problem that anything sent is 2322 visible to the receiving party and possibly to third parties as well. 2323 It is useful to be aware that such concerns can exist in some cases. 2325 The evaluation of the exact magnitude of the threat, and any possible 2326 countermeasures, is left to each application protocol (see BCP 72 2327 [RFC3552] for best current practice guidance on security threats and 2328 defenses). 2330 The language tag associated with a particular information item is of 2331 no consequence whatsoever in determining whether that content might 2332 contain possible homographs. The fact that a text is tagged as being 2333 in one language or using a particular script subtag provides no 2334 assurance whatsoever that it does not contain characters from scripts 2335 other than the one(s) associated with or specified by that language 2336 tag. 2338 Since there is no limit to the number of variant, private use, and 2339 extension subtags, and consequently no limit on the possible length 2340 of a tag, implementations need to guard against buffer overflow 2341 attacks. See Section 4.3 for details on language tag truncation, 2342 which can occur as a consequence of defenses against buffer overflow. 2344 Although the specification of valid subtags for an extension (see 2345 Section 3.7) MUST be available over the Internet, implementations 2346 SHOULD NOT mechanically depend on it being always accessible, to 2347 prevent denial-of-service attacks. 2349 7. Character Set Considerations 2351 The syntax in this document requires that language tags use only the 2352 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2353 character sets, so the composition of language tags should not have 2354 any character set issues. 2356 Rendering of characters based on the content of a language tag is not 2357 addressed in this memo. Historically, some languages have relied on 2358 the use of specific character sets or other information in order to 2359 infer how a specific character should be rendered (notably this 2360 applies to language- and culture-specific variations of Han 2361 ideographs as used in Japanese, Chinese, and Korean). When language 2362 tags are applied to spans of text, rendering engines sometimes use 2363 that information in deciding which font to use in the absence of 2364 other information, particularly where languages with distinct writing 2365 traditions use the same characters. 2367 8. Changes from RFC 4646 2369 The main goal for this revision of this document was to incorporate 2370 ISO 639-3 and its attendent set of language codes into the IANA 2371 Language Subtag Registry, permitting the identification of many more 2372 languages and dialects than previously supported. 2374 The specific changes in this document to meet these goals are: 2376 o Defines the incorporation of ISO 639-3 codes as language and 2377 extlang subtags. Extlangs are now permitted in language tags. 2378 The changes necessary to achieve this were: 2380 * something 2382 o Changed the ABNF related to grandfathered tags. The irregular 2383 tags are now listed. Users of RFC 4646 sometimes made the mistake 2384 of implementing the grandfathered ABNF without checking the actual 2385 list of tags, thus allowing some illegal tags. Also: added 2386 description of both types of grandfathered tags to Section 2.2.8. 2388 o Added the paragraph on "collections" to Section 4.1. 2390 o Changed the capitalization rules for 'Tag' fields in Section 3.1. 2392 o Split section 3.1 up into subsections. 2394 o Modified section 3.5 to allow Suppress-Script fields to be added, 2395 modified, or removed via the registration process. This was an 2396 erratum from RFC 4646. 2398 o Modified examples that used region code 'CS' (formerly Serbia and 2399 Montenegro) to use 'RS' (Serbia) instead. 2401 o Modified the rules for creating and maintaining record 2402 'Description' fields to prevent duplicates, including inverted 2403 duplicates. 2405 o Removed the lengthy description of why RFC 4646 was created from 2406 this section, which also caused the removal of the reference to 2407 XML Schema. 2409 o Modified the ABNF to eliminate the redundant use of upper and 2410 lowercase letters in the products (for example, the sequence 2411 ("x"/"X") in the private-use production is now ("x")). The text 2412 in section 2.1 was also edited and rearranged to place more 2413 emphasis on this fact. 2415 o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS" 2416 and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the 2417 Suppress-Script on 'Latn' with 'fr'. 2419 o Changed the requirements for well-formedness to make singleton 2420 repetition checking optional (it is required for validity 2421 checking) in Section 2.2.9. 2423 o Changed the note about irregular grandfathered tags in the ABNF to 2424 say 'grandfathered tags that don't match langtag'. 2426 o Changed the text in Section 2.2.9 refering to grandfathered 2427 checking to note that the list is now included in the ABNF. 2429 o Added text to Section 3.2 making clear that the Language Subtag 2430 Reviewer may delegate various non-critical duties, including list 2431 moderation. 2433 o Added text to Section 3.5 clarifying that the ietf-languages list 2434 is operated by whomever the IESG appoints. 2436 o Added text to Section 3.1.4 clarifying that the first Description 2437 in a 'language' or 'extlang' record matches the corresponding 2438 Reference Name for the language in ISO 639-3. 2440 o Added text to the end of Section 3.1.2 noting that future versions 2441 of this document might add new field types and recommending that 2442 implementations ignore any unrecognized fields. 2444 9. References 2446 9.1. Normative References 2448 [ISO10646] 2449 International Organization for Standardization, "ISO/IEC 2450 10646:2003. Information technology -- Universal Multiple- 2451 Octet Coded Character Set (UCS)", 2003. 2453 [ISO15924] 2454 International Organization for Standardization, "ISO 2455 15924:2004. Information and documentation -- Codes for the 2456 representation of names of scripts", January 2004. 2458 [ISO3166-1] 2459 International Organization for Standardization, "ISO 3166- 2460 1:1997. Codes for the representation of names of countries 2461 and their subdivisions -- Part 1: Country codes", 1997. 2463 [ISO639-1] 2464 International Organization for Standardization, "ISO 639- 2465 1:2002. Codes for the representation of names of languages 2466 -- Part 1: Alpha-2 code", 2002. 2468 [ISO639-2] 2469 International Organization for Standardization, "ISO 639- 2470 2:1998. Codes for the representation of names of languages 2471 -- Part 2: Alpha-3 code, first edition", 1998. 2473 [ISO646] International Organization for Standardization, "ISO/IEC 2474 646:1991, Information technology -- ISO 7-bit coded 2475 character set for information interchange.", 1991. 2477 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2478 3", BCP 9, RFC 2026, October 1996. 2480 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2481 the IETF Standards Process", BCP 11, RFC 2028, 2482 October 1996. 2484 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2485 Requirement Levels", BCP 14, RFC 2119, March 1997. 2487 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2488 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2489 October 1998. 2491 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2492 Understanding Concerning the Technical Work of the 2493 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2495 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2496 Timestamps", RFC 3339, July 2002. 2498 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 2499 Specifications: ABNF", RFC 4234, October 2005. 2501 [RFC4645] Ewell, D., Ed., "Initial Language Subtag Registry", 2502 September 2006, . 2504 [RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of Language 2505 Tags", September 2006, 2506 . 2508 [UN_M.49] Statistics Division, United Nations, "Standard Country or 2509 Area Codes for Statistical Use", UN Standard Country or 2510 Area Codes for Statistical Use, Revision 4 (United Nations 2511 publication, Sales No. 98.XVII.9, June 1999. 2513 9.2. Informative References 2515 [RFC1766] Alvestrand, H., "Tags for the Identification of 2516 Languages", RFC 1766, March 1995. 2518 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2519 Part Three: Message Header Extensions for Non-ASCII Text", 2520 RFC 2047, November 1996. 2522 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2523 Word Extensions: Character Sets, Languages, and 2524 Continuations", RFC 2231, November 1997. 2526 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2527 10646", RFC 2781, February 2000. 2529 [RFC3066] Alvestrand, H., "Tags for the Identification of 2530 Languages", BCP 47, RFC 3066, January 2001. 2532 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2533 Text on Security Considerations", BCP 72, RFC 3552, 2534 July 2003. 2536 [RFC4646] Phillips, A., Ed. and M. Davis, Ed., "Tags for the 2537 Identification of Languages", September 2006, 2538 . 2540 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2541 Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003. 2542 ISBN 0-321-49081-0)", January 2007. 2544 [XML10] Bray (et al), T., "Extensible Markup Language (XML) 1.0", 2545 02 2004. 2547 [iso639.prin] 2548 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2549 Committee: Working principles for ISO 639 maintenance", 2550 March 2000, 2551 . 2554 [record-jar] 2555 Raymond, E., "The Art of Unix Programming", 2003, 2556 . 2558 [registry-update] 2559 Ewell, D., Ed., "Update to the Language Subtag Registry", 2560 September 2006, . 2563 Appendix A. Acknowledgements 2565 Any list of contributors is bound to be incomplete; please regard the 2566 following as only a selection from the group of people who have 2567 contributed to make this document what it is today. 2569 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the 2570 precursors of this document, made enormous contributions directly or 2571 indirectly to this document and are generally responsible for the 2572 success of language tags. 2574 The following people contributed to this document: 2576 Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan, 2577 Martin Duerst, Frank Ellerman, Doug Ewell, Marion Gunn, Randy 2578 Presuhn, and many, many others. 2580 Very special thanks must go to Harald Tveit Alvestrand, who 2581 originated RFCs 1766 and 3066, and without whom this document would 2582 not have been possible. 2584 Special thanks go to Michael Everson, who served as the Language Tag 2585 Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as 2586 the Language Subtag Reviewer since the adoption of RFC 4646. 2588 Special thanks also to Doug Ewell ("the Official Doug"), for his 2589 production of the first complete subtag registry, his work to support 2590 and maintain new registrations, and his careful editorship of both 2591 RFC 4645 and [registry-update]. 2593 Appendix B. Examples of Language Tags (Informative) 2595 Simple language subtag: 2597 de (German) 2599 fr (French) 2601 ja (Japanese) 2603 i-enochian (example of a grandfathered tag) 2605 Language subtag plus Script subtag: 2607 zh-Hant (Chinese written using the Traditional Chinese script) 2609 zh-Hans (Chinese written using the Simplified Chinese script) 2611 sr-Cyrl (Serbian written using the Cyrillic script) 2613 sr-Latn (Serbian written using the Latin script) 2615 Language-Script-Region: 2617 zh-Hans-CN (Chinese written using the Simplified script as used in 2618 mainland China) 2620 sr-Latn-RS (Serbian written using the Latin script as used in 2621 Serbia) 2623 Language-Variant: 2625 sl-rozaj (Resian dialect of Slovenian 2627 sl-nedis (Nadiza dialect of Slovenian) 2629 Language-Region-Variant: 2631 de-CH-1901 (German as used in Switzerland using the 1901 variant 2632 [orthography]) 2634 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 2636 Language-Script-Region-Variant: 2638 sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the 2639 Latin script as used in Italy. Note that this tag is NOT 2640 RECOMMENDED because subtag 'sl' has a Suppress-Script value of 2641 'Latn') 2643 Language-Region: 2645 de-DE (German for Germany) 2647 en-US (English as used in the United States) 2649 es-419 (Spanish appropriate for the Latin America and Caribbean 2650 region using the UN region code) 2652 Private use subtags: 2654 de-CH-x-phonebk 2656 az-Arab-x-AZE-derbend 2658 Extended language subtags (examples ONLY: extended languages MUST be 2659 defined by revision or update to this document): 2661 zh-min 2663 zh-min-nan-Hant-CN 2665 Private use registry values: 2667 x-whatever (private use using the singleton 'x') 2669 qaa-Qaaa-QM-x-southern (all private tags) 2671 de-Qaaa (German, with a private script) 2673 sr-Latn-QM (Serbian, Latin-script, private region) 2675 sr-Qaaa-RS (Serbian, private script, for Serbia) 2677 Tags that use extensions (examples ONLY: extensions MUST be defined 2678 by revision or update to this document or by RFC): 2680 en-US-u-islamCal 2682 zh-CN-a-myExt-x-private 2683 en-a-myExt-b-another 2685 Some Invalid Tags: 2687 de-419-DE (two region tags) 2689 a-DE (use of a single-character subtag in primary position; note 2690 that there are a few grandfathered tags that start with "i-" that 2691 are valid) 2693 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter 2694 prefix) 2696 Authors' Addresses 2698 Addison Phillips (editor) 2699 Yahoo! Inc. 2701 Email: addison@inter-locale.com 2702 URI: http://www.inter-locale.com 2704 Mark Davis (editor) 2705 Google 2707 Email: mark.davis@macchiato.com or mark.davis@google.com 2709 Intellectual Property Statement 2711 The IETF takes no position regarding the validity or scope of any 2712 Intellectual Property Rights or other rights that might be claimed to 2713 pertain to the implementation or use of the technology described in 2714 this document or the extent to which any license under such rights 2715 might or might not be available; nor does it represent that it has 2716 made any independent effort to identify any such rights. Information 2717 on the procedures with respect to rights in RFC documents can be 2718 found in BCP 78 and BCP 79. 2720 Copies of IPR disclosures made to the IETF Secretariat and any 2721 assurances of licenses to be made available, or the result of an 2722 attempt made to obtain a general license or permission for the use of 2723 such proprietary rights by implementers or users of this 2724 specification can be obtained from the IETF on-line IPR repository at 2725 http://www.ietf.org/ipr. 2727 The IETF invites any interested party to bring to its attention any 2728 copyrights, patents or patent applications, or other proprietary 2729 rights that may cover technology that may be required to implement 2730 this standard. Please address the information to the IETF at 2731 ietf-ipr@ietf.org. 2733 Disclaimer of Validity 2735 This document and the information contained herein are provided on an 2736 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2737 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2738 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2739 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2740 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2741 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2743 Copyright Statement 2745 Copyright (C) The Internet Society (2006). This document is subject 2746 to the rights, licenses and restrictions contained in BCP 78, and 2747 except as set forth therein, the authors retain all their rights. 2749 Acknowledgment 2751 Funding for the RFC Editor function is currently provided by the 2752 Internet Society.