idnits 2.17.1 draft-ietf-ltru-4646bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2695. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2672. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2679. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2685. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC4646, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 6, 2006) is 6349 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2860 ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) ** Downref: Normative reference to an Informational RFC: RFC 4645 -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) -- Obsolete informational reference (is this intentional?): RFC 4646 (Obsoleted by RFC 5646) Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc. 4 Obsoletes: 4646 (if approved) M. Davis, Ed. 5 Expires: June 9, 2007 Google 6 December 6, 2006 8 Tags for Identifying Languages 9 draft-ietf-ltru-4646bis-01 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on June 9, 2007. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 This document describes the structure, content, construction, and 43 semantics of language tags for use in cases where it is desirable to 44 indicate the language used in an information object. It also 45 describes how to register values for use in language tags and the 46 creation of user-defined extensions for private interchange. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 51 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 5 52 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 5 53 2.2. Language Subtag Sources and Interpretation . . . . . . . . 7 54 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . 9 55 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11 56 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 12 57 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 12 58 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 14 59 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 15 60 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 17 61 2.2.8. Grandfathered Registrations . . . . . . . . . . . . . 17 62 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 18 63 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 20 64 3.1. Format of the IANA Language Subtag Registry . . . . . . . 20 65 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 20 66 3.1.2. Record Definitions . . . . . . . . . . . . . . . . . . 21 67 3.1.3. Subtag and Tag Fields . . . . . . . . . . . . . . . . 23 68 3.1.4. Description Field . . . . . . . . . . . . . . . . . . 24 69 3.1.5. Deprecated Field . . . . . . . . . . . . . . . . . . . 24 70 3.1.6. Preferred-Value Field . . . . . . . . . . . . . . . . 25 71 3.1.7. Prefix Field . . . . . . . . . . . . . . . . . . . . . 26 72 3.1.8. Comments Field . . . . . . . . . . . . . . . . . . . . 26 73 3.1.9. Suppress-Script Field . . . . . . . . . . . . . . . . 26 74 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 27 75 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 27 76 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 28 77 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 33 78 3.6. Possibilities for Registration . . . . . . . . . . . . . . 35 79 3.7. Extensions and Extensions Registry . . . . . . . . . . . . 38 80 3.8. Update of the Language Subtag Registry . . . . . . . . . . 40 81 4. Formation and Processing of Language Tags . . . . . . . . . . 42 82 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 42 83 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 44 84 4.3. Length Considerations . . . . . . . . . . . . . . . . . . 45 85 4.3.1. Working with Limited Buffer Sizes . . . . . . . . . . 46 86 4.3.2. Truncation of Language Tags . . . . . . . . . . . . . 47 87 4.4. Canonicalization of Language Tags . . . . . . . . . . . . 47 88 4.5. Considerations for Private Use Subtags . . . . . . . . . . 49 89 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 51 90 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 51 91 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 51 92 6. Security Considerations . . . . . . . . . . . . . . . . . . . 53 93 7. Character Set Considerations . . . . . . . . . . . . . . . . . 54 94 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 55 95 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 56 96 9.1. Normative References . . . . . . . . . . . . . . . . . . . 56 97 9.2. Informative References . . . . . . . . . . . . . . . . . . 57 98 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 59 99 Appendix B. Examples of Language Tags (Informative) . . . . . . . 60 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 63 101 Intellectual Property and Copyright Statements . . . . . . . . . . 64 103 1. Introduction 105 Human beings on our planet have, past and present, used a number of 106 languages. There are many reasons why one would want to identify the 107 language used when presenting or requesting information. 109 A user's language preferences often need to be identified so that 110 appropriate processing can be applied. For example, the user's 111 language preferences in a Web browser can be used to select Web pages 112 appropriately. Language preferences can also be used to select among 113 tools (such as dictionaries) to assist in the processing or 114 understanding of content in different languages. 116 In addition, knowledge about the particular language used by some 117 piece of information content might be useful or even required by some 118 types of processing; for example, spell-checking, computer- 119 synthesized speech, Braille transcription, or high-quality print 120 renderings. 122 One means of indicating the language used is by labeling the 123 information content with an identifier or "tag". These tags can be 124 used to specify user preferences when selecting information content, 125 or for labeling additional attributes of content and associated 126 resources. 128 Tags can also be used to indicate additional language attributes of 129 content. For example, indicating specific information about the 130 dialect, writing system, or orthography used in a document or 131 resource may enable the user to obtain information in a form that 132 they can understand, or it can be important in processing or 133 rendering the given content into an appropriate form or style. 135 This document specifies a particular identifier mechanism (the 136 language tag) and a registration function for values to be used to 137 form tags. It also defines a mechanism for private use values and 138 future extension. 140 This document replaces [RFC4646], which replaced [RFC3066] and its 141 predecessor [RFC1766]. For a list of changes in this document, see 142 Section 8. 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in [RFC2119]. 148 2. The Language Tag 150 Language tags are used to help identify languages, whether spoken, 151 written, signed, or otherwise signaled, for the purpose of 152 communication. This includes constructed and artificial languages, 153 but excludes languages not intended primarily for human 154 communication, such as programming languages. 156 2.1. Syntax 158 The language tag is composed of one or more parts, known as 159 "subtags". Each subtag consists of a sequence of alphanumeric 160 characters. Subtags are distinguished and separated from one another 161 by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a 162 "primary language" subtag and a (possibly empty) series of subsequent 163 subtags, each of which refines or narrows the range of languages 164 identified by the overall tag. 166 Usually, each type of subtag is distinguished by length, position in 167 the tag, and content: subtags can be recognized solely by these 168 features. The only exception to this is a fixed list of 169 grandfathered tags registered under RFC 3066 [RFC3066]. This makes 170 it possible to construct a parser that can extract and assign some 171 semantic information to the subtags, even if the specific subtag 172 values are not recognized. Thus, a parser need not have an up-to- 173 date copy (or any copy at all) of the subtag registry to perform most 174 searching and matching operations. 176 The syntax of the language tag in ABNF [RFC4234] is: 178 Language-Tag = langtag 179 / privateuse ; private use tag 180 / grandfathered ; grandfathered registrations 182 langtag = (language 183 ["-" script] 184 ["-" region] 185 *("-" variant) 186 *("-" extension) 187 ["-" privateuse]) 189 language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code 190 / 4ALPHA ; reserved for future use 191 / 5*8ALPHA ; registered language subtag 193 extlang = *3("-" 3ALPHA) ; specific ISO 639-3 codes 195 script = 4ALPHA ; ISO 15924 code 197 region = 2ALPHA ; ISO 3166 code 198 / 3DIGIT ; UN M.49 code 200 variant = 5*8alphanum ; registered variants 201 / (DIGIT 3alphanum) 203 extension = singleton 1*("-" (2*8alphanum)) 205 singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT 206 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" 207 ; Single letters: x/X is reserved for private use 209 privateuse = ("x"/"X") 1*("-" (1*8alphanum)) 211 grandfathered = langtag ; well-formed grandfathered tags 212 / irregular ; tags that are not well-formed 214 irregular = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default" 215 / "i-enochian" / "i-hak" / "i-klingon" / "i-lux" 216 / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" 217 / "i-tay" / "i-tsu" / "sgn-BE-fr" / "sgn-BE-nl" 218 / "sgn-CH-de" 220 alphanum = (ALPHA / DIGIT) ; letters and numbers 222 Figure 1: Language Tag ABNF 224 Note: There is a subtlety in the ABNF for 'variant': variants 225 starting with a digit MAY be four characters long, while those 226 starting with a letter MUST be at least five characters long. 228 All subtags have a maximum length of eight characters and whitespace 229 is not permitted in a language tag. For examples of language tags, 230 see Appendix B. 232 Note that although [RFC4234] refers to octets, the language tags 233 described in this document are sequences of characters from the US- 234 ASCII [ISO646] repertoire. Language tags MAY be used in documents 235 and applications that use other encodings, so long as these encompass 236 the US-ASCII repertoire. An example of this would be an XML document 237 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 239 The tags and their subtags, including private use and extensions, are 240 to be treated as case insensitive: there exist conventions for the 241 capitalization of some of the subtags, but these MUST NOT be taken to 242 carry meaning. 244 For example: 246 o [ISO639-1] recommends that language codes be written in lowercase 247 ('mn' Mongolian). 249 o [ISO3166-1] recommends that country codes be capitalized ('MN' 250 Mongolia). 252 o [ISO15924] recommends that script codes use lowercase with the 253 initial letter capitalized ('Cyrl' Cyrillic). 255 However, in the tags defined by this document, the uppercase US-ASCII 256 letters in the range 'A' through 'Z' are considered equivalent and 257 mapped directly to their US-ASCII lowercase equivalents in the range 258 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from 259 "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of 260 these variations conveys the same meaning: Mongolian written in the 261 Cyrillic script as used in Mongolia. 263 Although case distinctions do not carry meaning in language tags, 264 consistent formatting and presentation of the tags will aid users. 265 The format of the tags and subtags in the registry is RECOMMENDED. 266 In this format, all non-initial two-letter subtags are uppercase, all 267 non-initial four-letter subtags are titlecase, and all other subtags 268 are lowercase. 270 2.2. Language Subtag Sources and Interpretation 272 The namespace of language tags and their subtags is administered by 273 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 274 the rules in Section 5 of this document. The Language Subtag 275 Registry maintained by IANA is the source for valid subtags: other 276 standards referenced in this section provide the source material for 277 that registry. 279 Terminology used in this document: 281 o Tag or tags refers to a complete language tag, such as 282 "fr-Latn-CA". Examples of tags in this document are enclosed in 283 double-quotes ("en-US"). 285 o Subtag refers to a specific section of a tag, delimited by hyphen, 286 such as the subtag 'Hant' in "zh-Hant-CN". Examples of subtags in 287 this document are enclosed in single quotes ('Hant'). 289 o Code or codes refers to values defined in external standards (and 290 which are used as subtags in this document). For example, 'Hant' 291 is an [ISO15924] script code that was used to define the 'Hant' 292 script subtag for use in a language tag. Examples of codes in 293 this document are enclosed in single quotes ('en', 'Hant'). 295 The definitions in this section apply to the various subtags within 296 the language tags defined by this document, excepting those 297 "grandfathered" tags defined in Section 2.2.8. 299 Language tags are designed so that each subtag type has unique length 300 and content restrictions. These make identification of the subtag's 301 type possible, even if the content of the subtag itself is 302 unrecognized. This allows tags to be parsed and processed without 303 reference to the latest version of the underlying standards or the 304 IANA registry and makes the associated exception handling when 305 parsing tags simpler. 307 Subtags in the IANA registry that do not come from an underlying 308 standard can only appear in specific positions in a tag. 309 Specifically, they can only occur as primary language subtags or as 310 variant subtags. 312 Note that sequences of private use and extension subtags MUST occur 313 at the end of the sequence of subtags and MUST NOT be interspersed 314 with subtags defined elsewhere in this document. 316 Single-letter and single-digit subtags are reserved for current or 317 future use. These include the following current uses: 319 o The single-letter subtag 'x' is reserved to introduce a sequence 320 of private use subtags. The interpretation of any private use 321 subtags is defined solely by private agreement and is not defined 322 by the rules in this section or in any standard or registry 323 defined in this document. 325 o All other single-letter subtags are reserved to introduce 326 standardized extension subtag sequences as described in 327 Section 3.7. 329 The single-letter subtag 'i' is used by some grandfathered tags, such 330 as "i-default", where it always appears in the first position and 331 cannot be confused with an extension. 333 2.2.1. Primary Language Subtag 335 The primary language subtag is the first subtag in a language tag 336 (with the exception of private use and certain grandfathered tags) 337 and cannot be omitted. The following rules apply to the primary 338 language subtag: 340 1. All two-character primary language subtags were defined in the 341 IANA registry according to the assignments found in the standard 342 ISO 639 Part 1, "ISO 639-1:2002, Codes for the representation of 343 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 344 assignments subsequently made by the ISO 639-1 registration 345 authority (RA) or governing standardization bodies. 347 2. All three-character primary language subtags were defined in the 348 IANA registry according to the assignments found in either ISO 349 639 Part 2, "ISO 639-2:1998 - Codes for the representation of 350 names of languages -- Part 2: Alpha-3 code - edition 1" 351 [ISO639-2], ISO 639 Part 3, "ISO 639-3:200?, [[??missing official 352 title??]]", or assignments subsequently made by the relevant ISO 353 639 registration authorities or governing standardization bodies. 355 3. The subtags in the range 'qaa' through 'qtz' are reserved for 356 private use in language tags. These subtags correspond to codes 357 reserved by ISO 639-2 for private use. These codes MAY be used 358 for non-registered primary language subtags (instead of using 359 private use subtags following 'x-'). Please refer to Section 4.5 360 for more information on private use subtags. 362 4. All four-character language subtags are reserved for possible 363 future standardization. 365 5. All language subtags of 5 to 8 characters in length in the IANA 366 registry were defined via the registration process in Section 3.5 367 and MAY be used to form the primary language subtag. At the time 368 this document was created, there were no examples of this kind of 369 subtag and future registrations of this type will be discouraged: 371 primary languages are strongly RECOMMENDED for registration with 372 ISO 639, and proposals rejected by ISO 639/RA will be closely 373 scrutinized before they are registered with IANA. 375 6. The single-character subtag 'x' as the primary subtag indicates 376 that the language tag consists solely of subtags whose meaning is 377 defined by private agreement. For example, in the tag "x-fr-CH", 378 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 379 French language or the country of Switzerland (or any other value 380 in the IANA registry) unless there is a private agreement in 381 place to do so. See Section 4.5. 383 7. The single-character subtag 'i' is used by some grandfathered 384 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 385 grandfathered tags have a primary language subtag in their first 386 position.) 388 8. Other values MUST NOT be assigned to the primary subtag except by 389 revision or update of this document. 391 Note: For languages that have both an ISO 639-1 two-character code 392 and a three character code assigned by either ISO 639-2 or ISO 693-3, 393 only the ISO 639-1 two-character code is defined in the IANA 394 registry. 396 Note: For languages that have no ISO 639-1 two-character code and for 397 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 398 (Bibliographic) codes differ, only the Terminology code is defined in 399 the IANA registry. At the time this document was created, all 400 languages that had both kinds of three-character code were also 401 assigned a two-character code; it is expected that future assignments 402 of this nature will not occur. 404 Note: To avoid problems with versioning and subtag choice as 405 experienced during the transition between RFC 1766 and RFC 3066, as 406 well as the canonical nature of subtags defined by this document, the 407 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 408 RA-JAC) has included the following statement in [iso639.prin]: 410 "A language code already in ISO 639-2 at the point of freezing ISO 411 639-1 shall not later be added to ISO 639-1. This is to ensure 412 consistency in usage over time, since users are directed in 413 Internet applications to employ the alpha-3 code when an alpha-2 414 code for that language is not available." 416 In order to avoid instability in the canonical form of tags, if a 417 two-character code is added to ISO 639-1 for a language for which a 418 three-character code was already included in either ISO 639-2 or ISO 419 639-3, the two-character code MUST NOT be registered. See 420 Section 3.4. 422 For example, if some content were tagged with 'haw' (Hawaiian), which 423 currently has no two-character code, the tag would not be invalidated 424 if ISO 639-1 were to assign a two-character code to the Hawaiian 425 language at a later date. 427 Note: An example of independent primary language subtag registration 428 might include: one of the grandfathered IANA registrations is 429 "i-enochian". The subtag 'enochian' could be registered in the IANA 430 registry as a primary language subtag (assuming that ISO 639 does not 431 register this language first), making tags such as "enochian-AQ" and 432 "enochian-Latn" valid. 434 2.2.2. Extended Language Subtags 436 Extended language subtags are used to identify languages or dialects 437 that are subdivisions within another language. Such an enclosing 438 language is sometimes called a "collective" or "macro" language. The 439 following rules apply to the extended language subtags: 441 1. These subtags were defined in the IANA registry according to 442 assignments found in ISO 639 Part 3. 444 2. A sequence of up to three extended language subtags MAY appear in 445 a language tag. This sequence MUST follow the primary language 446 subtag and precede any other subtags. 448 3. Each extended language subtag MUST only be used with the exact 449 sequence of subtags that appears in the 'Prefix' field in its 450 registry record. 452 4. There MAY be up to three extended language subtags. 454 5. Other values MUST NOT be assigned to the extended language subtag 455 except by revision or update of this document. 457 Extended language subtag records MUST include exactly one 'Prefix' 458 field indicating an appropriate subtag or sequence of subtags for 459 that extended language subtag. 461 For example, the 'gan' subtag, representing the 'Gan' dialect of 462 Chinese, has a prefix of "zh" in its registry record. The 'cmn' 463 subtag, representing the 'Mandarin' dialect of Chinese has the same 464 prefix. Thus, the tags "zh-gan-Hant" or "zh-cmn-CN" are appropriate, 465 while the tag "zh-cmn-gan" is not. 467 Now suppose that 'xxx' is a subtag that represents a dialect of 468 'Gan'. It would have a 'Prefix' field of "zh-gan", making the tag 469 "zh-gan-xxx" appropriate, while the tags "zh-xxx" and "zh-xxx-gan" 470 would not be appropriate. 472 2.2.3. Script Subtag 474 Script subtags are used to indicate the script or writing system 475 variations that distinguish the written forms of a language or its 476 dialects. The following rules apply to the script subtags: 478 1. All four-character subtags were defined according to 479 [ISO15924]--"Codes for the representation of the names of 480 scripts": alpha-4 script codes, or subsequently assigned by the 481 ISO 15924 maintenance agency or governing standardization bodies, 482 denoting the script or writing system used in conjunction with 483 this language. 485 2. Script subtags MUST immediately follow the primary language 486 subtag and all extended language subtags and MUST occur before 487 any other type of subtag described below. 489 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 490 use in language tags. These subtags correspond to codes reserved 491 by ISO 15924 for private use. These codes MAY be used for non- 492 registered script values. Please refer to Section 4.5 for more 493 information on private use subtags. 495 4. Script subtags MUST NOT be registered using the process in 496 Section 3.5 of this document. Variant subtags MAY be considered 497 for registration for that purpose. 499 5. There MUST be at most one script subtag in a language tag, and 500 the script subtag SHOULD be omitted when it adds no 501 distinguishing value to the tag or when the primary language 502 subtag's record includes a Suppress-Script field listing the 503 applicable script subtag. 505 Example: "sr-Latn" represents Serbian written using the Latin script. 507 2.2.4. Region Subtag 509 Region subtags are used to indicate linguistic variations associated 510 with or appropriate to a specific country, territory, or region. 511 Typically, a region subtag is used to indicate regional dialects or 512 usage, or region-specific spelling conventions. A region subtag can 513 also be used to indicate that content is expressed in a way that is 514 appropriate for use throughout a region, for instance, Spanish 515 content tailored to be useful throughout Latin America. 517 The following rules apply to the region subtags: 519 1. Region subtags MUST follow any language, extended language, or 520 script subtags and MUST precede all other subtags. 522 2. All two-character subtags following the primary subtag were 523 defined in the IANA registry according to the assignments found 524 in [ISO3166-1] ("Codes for the representation of names of 525 countries and their subdivisions -- Part 1: Country codes") using 526 the list of alpha-2 country codes, or using assignments 527 subsequently made by the ISO 3166 maintenance agency or governing 528 standardization bodies. 530 3. All three-character subtags consisting of digit (numeric) 531 characters following the primary subtag were defined in the IANA 532 registry according to the assignments found in UN Standard 533 Country or Area Codes for Statistical Use [UN_M.49] or 534 assignments subsequently made by the governing standards body. 535 Note that not all of the UN M.49 codes are defined in the IANA 536 registry. The following rules define which codes are entered 537 into the registry as valid subtags: 539 A. UN numeric codes assigned to 'macro-geographical 540 (continental)' or sub-regions MUST be registered in the 541 registry. These codes are not associated with an assigned 542 ISO 3166 alpha-2 code and represent supra-national areas, 543 usually covering more than one nation, state, province, or 544 territory. 546 B. UN numeric codes for 'economic groupings' or 'other 547 groupings' MUST NOT be registered in the IANA registry and 548 MUST NOT be used to form language tags. 550 C. UN numeric codes for countries or areas with ambiguous ISO 551 3166 alpha-2 codes, when entered into the registry, MUST be 552 defined according to the rules in Section 3.4 and MUST be 553 used to form language tags that represent the country or 554 region for which they are defined. 556 D. UN numeric codes for countries or areas for which there is an 557 associated ISO 3166 alpha-2 code in the registry MUST NOT be 558 entered into the registry and MUST NOT be used to form 559 language tags. Note that the ISO 3166-based subtag in the 560 registry MUST actually be associated with the UN M.49 code in 561 question. 563 E. UN numeric codes and ISO 3166 alpha-2 codes for countries or 564 areas listed as eligible for registration in [RFC4645] but 565 not presently registered MAY be entered into the IANA 566 registry via the process described in Section 3.5. Once 567 registered, these codes MAY be used to form language tags. 569 F. All other UN numeric codes for countries or areas that do not 570 have an associated ISO 3166 alpha-2 code MUST NOT be entered 571 into the registry and MUST NOT be used to form language tags. 572 For more information about these codes, see Section 3.4. 574 4. Note: The alphanumeric codes in Appendix X of the UN document 575 MUST NOT be entered into the registry and MUST NOT be used to 576 form language tags. (At the time this document was created, 577 these values matched the ISO 3166 alpha-2 codes.) 579 5. There MUST be at most one region subtag in a language tag and the 580 region subtag MAY be omitted, as when it adds no distinguishing 581 value to the tag. 583 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 584 reserved for private use in language tags. These subtags 585 correspond to codes reserved by ISO 3166 for private use. These 586 codes MAY be used for private use region subtags (instead of 587 using a private use subtag sequence). Please refer to 588 Section 4.5 for more information on private use subtags. 590 "de-CH" represents German ('de') as used in Switzerland ('CH'). 592 "sr-Latn-CS" represents Serbian ('sr') written using Latin script 593 ('Latn') as used in Serbia and Montenegro ('CS'). 595 "es-419" represents Spanish ('es') appropriate to the UN-defined 596 Latin America and Caribbean region ('419'). 598 2.2.5. Variant Subtags 600 Variant subtags are used to indicate additional, well-recognized 601 variations that define a language or its dialects that are not 602 covered by other available subtags. The following rules apply to the 603 variant subtags: 605 1. Variant subtags are not associated with any external standard. 606 Variant subtags and their meanings are defined by the 607 registration process defined in Section 3.5. 609 2. Variant subtags MUST follow all of the other defined subtags, but 610 precede any extension or private use subtag sequences. 612 3. More than one variant MAY be used to form the language tag. 614 4. Variant subtags MUST be registered with IANA according to the 615 rules in Section 3.5 of this document before being used to form 616 language tags. In order to distinguish variants from other types 617 of subtags, registrations MUST meet the following length and 618 content restrictions: 620 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 621 at least five characters long. 623 2. Variant subtags that begin with a digit (0-9) MUST be at 624 least four characters long. 626 Variant subtag records in the language subtag registry MAY include 627 one or more 'Prefix' fields, which indicate the language tag or tags 628 that would make a suitable prefix (with other subtags, as 629 appropriate) in forming a language tag with the variant. For 630 example, the subtag 'nedis' has a Prefix of "sl", making it suitable 631 to form language tags such as "sl-nedis" and "sl-IT-nedis", but not 632 suitable for use in a tag such as "zh-nedis" or "it-IT-nedis". 634 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 636 "de-CH-1996" represents German as used in Switzerland and as written 637 using the spelling reform beginning in the year 1996 C.E. 639 Most variants that share a prefix are mutually exclusive. For 640 example, the German orthographic variations '1996' and '1901' SHOULD 641 NOT be used in the same tag, as they represent the dates of different 642 spelling reforms. A variant that can meaningfully be used in 643 combination with another variant SHOULD include a 'Prefix' field in 644 its registry record that lists that other variant. For example, if 645 another German variant 'example' were created that made sense to use 646 with '1996', then 'example' should include two Prefix fields: "de" 647 and "de-1996". 649 2.2.6. Extension Subtags 651 Extensions provide a mechanism for extending language tags for use in 652 various applications. See Section 3.7. The following rules apply to 653 extensions: 655 1. Extension subtags are separated from the other subtags defined 656 in this document by a single-character subtag ("singleton"). 657 The singleton MUST be one allocated to a registration authority 658 via the mechanism described in Section 3.7 and MUST NOT be the 659 letter 'x', which is reserved for private use subtag sequences. 661 2. Note: Private use subtag sequences starting with the singleton 662 subtag 'x' are described in Section 2.2.7 below. 664 3. An extension MUST follow at least a primary language subtag. 665 That is, a language tag cannot begin with an extension. 666 Extensions extend language tags, they do not override or replace 667 them. For example, "a-value" is not a well-formed language tag, 668 while "de-a-value" is. 670 4. Each singleton subtag MUST appear at most one time in each tag 671 (other than as a private use subtag). That is, singleton 672 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 673 ccc" is invalid because the subtag 'a' appears twice. Note that 674 the tag "en-a-bbb-x-a-ccc" is valid because the second 675 appearance of the singleton 'a' is in a private use sequence. 677 5. Extension subtags MUST meet all of the requirements for the 678 content and format of subtags defined in this document. 680 6. Extension subtags MUST meet whatever requirements are set by the 681 document that defines their singleton prefix and whatever 682 requirements are provided by the maintaining authority. 684 7. Each extension subtag MUST be from two to eight characters long 685 and consist solely of letters or digits, with each subtag 686 separated by a single '-'. 688 8. Each singleton MUST be followed by at least one extension 689 subtag. For example, the tag "tlh-a-b-foo" is invalid because 690 the first singleton 'a' is followed immediately by another 691 singleton 'b'. 693 9. Extension subtags MUST follow all language, extended language, 694 script, region, and variant subtags in a tag. 696 10. All subtags following the singleton and before another singleton 697 are part of the extension. Example: In the tag "fr-a-Latn", the 698 subtag 'Latn' does not represent the script subtag 'Latn' 699 defined in the IANA Language Subtag Registry. Its meaning is 700 defined by the extension 'a'. 702 11. In the event that more than one extension appears in a single 703 tag, the tag SHOULD be canonicalized as described in 704 Section 4.4. 706 For example, if the prefix singleton 'r' and the shown subtags were 707 defined, then the following tag would be a valid example: "en-Latn- 708 GB-boont-r-extended-sequence-x-private" 710 2.2.7. Private Use Subtags 712 Private use subtags are used to indicate distinctions in language 713 important in a given context by private agreement. The following 714 rules apply to private use subtags: 716 1. Private use subtags are separated from the other subtags defined 717 in this document by the reserved single-character subtag 'x'. 719 2. Private use subtags MUST conform to the format and content 720 constraints defined in the ABNF for all subtags. 722 3. Private use subtags MUST follow all language, extended language, 723 script, region, variant, and extension subtags in the tag. 724 Another way of saying this is that all subtags following the 725 singleton 'x' MUST be considered private use. Example: The 726 subtag 'US' in the tag "en-x-US" is a private use subtag. 728 4. A tag MAY consist entirely of private use subtags. 730 5. No source is defined for private use subtags. Use of private use 731 subtags is by private agreement only. 733 6. Private use subtags are NOT RECOMMENDED where alternatives exist 734 or for general interchange. See Section 4.5 for more information 735 on private use subtag choice. 737 For example: Users who wished to utilize codes from the Ethnologue 738 publication of SIL International for language identification might 739 agree to exchange tags such as "az-Arab-x-AZE-derbend". This example 740 contains two private use subtags. The first is 'AZE' and the second 741 is 'derbend'. 743 2.2.8. Grandfathered Registrations 745 Prior to RFC 4646, whole language tags were registered according to 746 the rules in RFC 1766 and/or RFC 3066. These registered tags 747 maintain their validity. Of those tags, those that were made 748 obsolete or redundant by the advent of RFC 4646 or by subsequent 749 registration of subtags are maintained in the registry in records as 750 "redundant" tag records. Those tags that would not be well-formed 751 according to the ABNF in this document or that contain subtags that 752 do not individually appear in the registry are maintained in the 753 registry in records of the "grandfathered" type. 755 Grandfathered tags contain one or more subtags that are not defined 756 in the Language Subtag Registry (see Section 3). Redundant tags 757 consist entirely of subtags defined above and whose independent 758 registration was superseded by [RFC4646]. For more information see 759 Section 3.8. 761 Some grandfathered tags are "well-formed" in that they match the 762 'langtag' production in Figure 1. In some cases, the tags could 763 become redundant if their unregistered subtags were to be registered 764 (as variants, for example). In other cases, although the subtags 765 match the language tag pattern, the meaning assigned to the various 766 subtags is prohibited by rules elsewhere in this document. Those 767 tags can never become redundant. 769 The remaining grandfathered tags, listed in the 'irregular' 770 production in Figure 1, do not match the language tag syntax and can 771 never become redundant. Many of these tags have been superseded by 772 other registrations: their record contains a Preferred-Value field 773 that really ought to be used to form language tags representing that 774 value. 776 2.2.9. Classes of Conformance 778 Implementations sometimes need to describe their capabilities with 779 regard to the rules and practices described in this document. There 780 are two classes of conforming implementations described by this 781 document: "well-formed" processors and "validating" processors. 782 Claims of conformance SHOULD explicitly reference one of these 783 definitions. 785 An implementation that claims to check for well-formed language tags 786 MUST: 788 o Check that the tag and all of its subtags, including extension and 789 private use subtags, conform to the ABNF or that the tag is on the 790 list of grandfathered tags. 792 o Check that singleton subtags that identify extensions do not 793 repeat. For example, the tag "en-a-xx-b-yy-a-zz" is not well- 794 formed. 796 Well-formed processors are strongly encouraged to implement the 797 canonicalization rules contained in Section 4.4. 799 An implementation that claims to be validating MUST: 801 o Check that the tag is well-formed. 803 o Specify the particular registry date for which the implementation 804 performs validation of subtags. 806 o Check that either the tag is a grandfathered tag, or that all 807 language, script, region, and variant subtags consist of valid 808 codes for use in language tags according to the IANA registry as 809 of the particular date specified by the implementation. 811 o Specify which, if any, extension RFCs as defined in Section 3.7 812 are supported, including version, revision, and date. 814 o For any such extensions supported, check that all subtags used in 815 that extension are valid. 817 o For extended language subtags, check that the tag matches the 818 'Prefix' field associated with the subtag. The tag matches if the 819 'Prefix' exactly matches the start of the tag. For example, the 820 prefix "sgn-ase" matches the tag "sgn-ase-US" but does not match 821 the tag "sgn-bvs-ase-US". 823 3. Registry Format and Maintenance 825 This section defines the Language Subtag Registry and the maintenance 826 and update procedures associated with it, as well as a registry for 827 extensions to language tags (Section 3.7). 829 The Language Subtag Registry contains a comprehensive list of all of 830 the subtags valid in language tags. This allows implementers a 831 straightforward and reliable way to validate language tags. The 832 Language Subtag Registry will be maintained so that, except for 833 extension subtags, it is possible to validate all of the subtags that 834 appear in a language tag under the provisions of this document or its 835 revisions or successors. In addition, the meaning of the various 836 subtags will be unambiguous and stable over time. (The meaning of 837 private use subtags, of course, is not defined by the IANA registry.) 839 3.1. Format of the IANA Language Subtag Registry 841 The IANA Language Subtag Registry ("the registry") consists of a text 842 file that is machine readable in the format described in this 843 section, plus copies of the registration forms approved in accordance 844 with the process described in Section 3.5. The existing registration 845 forms for grandfathered and redundant tags taken from RFC 3066 will 846 be maintained as part of the obsolete RFC 3066 registry. The 847 remaining set of initial subtags will not have registration forms 848 created for them. 850 3.1.1. File Format 852 The registry is in the text format described below. This format was 853 based on the record-jar format described in [record-jar]. 855 Each line of text is limited to 72 characters, including all 856 whitespace. Records are separated by lines containing only the 857 sequence "%%" (%x25.25). 859 Each field can be viewed as a single, logical line of ASCII 860 characters, comprising a field-name and a field-body separated by a 861 COLON character (%x3A). For convenience, the field-body portion of 862 this conceptual entity can be split into a multiple-line 863 representation; this is called "folding". The format of the registry 864 is described by the following ABNF (per [RFC4234]): 866 registry = record *("%%" CRLF record) 867 record = 1*( field-name *SP ":" *SP field-body CRLF ) 868 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] 869 field-body = *(ASCCHAR/LWSP) 870 ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 871 UNICHAR = "&#x" 2*6HEXDIG ";" 873 Figure 2: Registry Format ABNF 875 The sequence '..' (%x2E.2E) in a field-body denotes a range of 876 values. Such a range represents all subtags of the same length that 877 are in alphabetic or numeric order within that range, including the 878 values explicitly mentioned. For example 'a..c' denotes the values 879 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 880 '13'. 882 Characters from outside the US-ASCII [ISO646] repertoire, as well as 883 the AMPERSAND character ("&", %x26) when it occurs in a field-body, 884 are represented by a "Numeric Character Reference" using hexadecimal 885 notation in the style used by [XML10] (see 886 ). This consists of the 887 sequence "&#x" (%x26.23.78) followed by a hexadecimal representation 888 of the character's code point in [ISO10646] followed by a closing 889 semicolon (%x3B). For example, the EURO SIGN, U+20AC, would be 890 represented by the sequence "€". Note that the hexadecimal 891 notation MAY have between two and six digits. 893 All fields whose field-body contains a date value use the "full-date" 894 format specified in [RFC3339]. For example: "2004-06-28" represents 895 June 28, 2004, in the Gregorian calendar. 897 3.1.2. Record Definitions 899 There are three types of records in the registry: "File-Date", 900 "Subtag", and "Tag" records. 902 The first record in the registry is a "File-Date" record. This 903 record contains the single field whose field-name is "File-Date" (see 904 Figure 2). The field-body of this record contains the last 905 modification date of this copy of the registry, making it possible to 906 compare different versions of the registry. The registry on the IANA 907 website is the most current. Versions with an older date than that 908 one are not up-to-date. 910 File-Date: 2004-06-28 911 %% 913 Figure 3: Example of the File-Date Record 914 Subsequent records represent either subtags or tags in the registry. 915 "Subtag" records contain a field with a field-name of "Subtag", 916 while, unsurprisingly, "Tag" records contain a field with a field- 917 name of "Tag". Each of the fields in each record MUST occur no more 918 than once, unless otherwise noted below. Each record MUST contain 919 the following fields: 921 o 'Type' 923 * Type's field-body MUST consist of one of the following strings: 924 "language", "extlang", "script", "region", "variant", 925 "grandfathered", and "redundant" and denotes the type of tag or 926 subtag. 928 o Either 'Subtag' or 'Tag' 930 * Subtag's field-body contains the subtag being defined. This 931 field MUST only appear in records of whose 'Type' has one of 932 these values: "language", "extlang", "script", "region", or 933 "variant". 935 * Tag's field-body contains a complete language tag. This field 936 MUST only appear in records whose 'Type' has one of these 937 values: "grandfathered" or "redundant". Note that the field- 938 body will always follow the 'grandfathered' production in the 939 ABNF in Section 2.1 941 o Description 943 * Description's field-body contains a non-normative description 944 of the subtag or tag. 946 o Added 948 * Added's field-body contains the date the record was added to 949 the registry. 951 Each record MAY also contain the following fields: 953 o Preferred-Value 955 * For fields of type 'script', 'region', and 'variant', 956 'Preferred-Value' contains the subtag of the same 'Type' that 957 is preferred for forming the language tag. 959 * For fields of type 'language' and 'extlang', 'Preferred-Value' 960 contains the language production (see Figure 1) that is 961 preferred when forming the language tag. This can be simply a 962 'language' subtag, or it can be a 'language' subtag followed by 963 an extended language sequence. 965 * For fields of type 'grandfathered' and 'redundant', a canonical 966 mapping to a complete language tag. 968 o Deprecated 970 * Deprecated's field-body contains the date the record was 971 deprecated. 973 o Prefix 975 * Prefix's field-body contains a language tag with which this 976 subtag MAY be used to form a new language tag, perhaps with 977 other subtags as well. This field MUST only appear in records 978 whose 'Type' field-body is 'variant' or 'extlang'. For 979 example, the 'Prefix' for the variant 'nedis' is 'sl', meaning 980 that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate 981 while the tag "is-nedis" is not. 983 o Comments 985 * Comments contains additional information about the subtag, as 986 deemed appropriate for understanding the registry and 987 implementing language tags using the subtag or tag. 989 o Suppress-Script 991 * Suppress-Script contains a script subtag that SHOULD NOT be 992 used to form language tags with the associated primary language 993 subtag. This field MUST only appear in records whose 'Type' 994 field-body is 'language'. See Section 4.1. 996 3.1.3. Subtag and Tag Fields 998 The 'Subtag' field MUST use lowercase letters to form the subtag, 999 with two exceptions. Subtags whose 'Type' field is 'script' (in 1000 other words, subtags defined by ISO 15924) MUST use titlecase. 1001 Subtags whose 'Type' field is 'region' (in other words, subtags 1002 defined by ISO 3166) MUST use uppercase. These exceptions mirror the 1003 use of case in the underlying standards. 1005 Each subtag in the tags contained in a 'Tag' field MUST be formatted 1006 using the rules in the preceeding paragraph. That is, all subtags 1007 are lowercase except for subtags that represent script or region 1008 codes. 1010 3.1.4. Description Field 1012 The field 'Description' contains a description of the tag or subtag 1013 in the record. The 'Description' field MAY appear more than once per 1014 record, that is, there can be multiple descriptions for a given 1015 record. At least one of the 'Description' fields MUST be written or 1016 transcribed into the Latin script; additional 'Description' fields 1017 MAY also include a description in a non-Latin script. Each 1018 'Description' field MUST be unique, both within the record in which 1019 it appears and for the collection of records of the same type. 1020 Moreover, formatting variations of the same description MUST NOT 1021 occur in that specific record or in any other record of the same 1022 type. For example, while the ISO 639-1 code 'fy' contains both the 1023 descriptions "Western Frisian" and "Frisian, Western", only one of 1024 these descriptions appears in the registry. 1026 The 'Description' field is used for identification purposes and 1027 SHOULD NOT be taken to represent the actual native name of the 1028 language or variation or to be in any particular language. 1030 For records taken from a source standard (such as ISO 639 or ISO 1031 3166), the 'Description' value(s) SHOULD be taken from the source 1032 standard. Multiple descriptions in the source standard MUST be split 1033 into separate 'Description' fields. The source standard's 1034 descriptions MAY be edited, either prior to insertion or via the 1035 registration process. 1037 When creating a new registry entry, duplicate, redundant, 1038 conflicting, or otherwise problematic descriptions MUST either be 1039 corrected or omitted. Parenthetical comments, inverted names, and 1040 other irregularities SHOULD be regularized according to the 1041 guidelines used to update the registry in [registry-update]. 1043 Note: Descriptions in registry entries that correspond to ISO 639, 1044 ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate 1045 the meaning of that identifier as defined in the source standard at 1046 the time it was added to the registry. The description does not 1047 replace the content of the source standard itself. The descriptions 1048 are not intended to be the English localized names for the subtags. 1049 Localization or translation of language tag and subtag descriptions 1050 is out of scope of this document. 1052 3.1.5. Deprecated Field 1054 The field 'Deprecated' MAY be added to any record via the maintenance 1055 process described in Section 3.3 or via the registration process 1056 described in Section 3.5. Usually, the addition of a 'Deprecated' 1057 field is due to the action of one of the standards bodies, such as 1058 ISO 3166, withdrawing a code. In some historical cases, it might not 1059 have been possible to reconstruct the original deprecation date. For 1060 these cases, an approximate date appears in the registry. Although 1061 valid in language tags, subtags and tags with a 'Deprecated' field 1062 are deprecated and validating processors SHOULD NOT generate these 1063 subtags. Note that a record that contains a 'Deprecated' field and 1064 no corresponding 'Preferred-Value' field has no replacement mapping. 1066 3.1.6. Preferred-Value Field 1068 The field 'Preferred-Value' contains a mapping between the record in 1069 which it appears and another tag or subtag. The value in this field 1070 is strongly RECOMMENDED as the best choice to represent the value of 1071 this record when selecting a language tag. These values form three 1072 groups: 1074 1. ISO 639 language codes that were later withdrawn in favor of 1075 other codes. These values are mostly a historical curiosity. 1077 2. ISO 3166 region codes that have been withdrawn in favor of a new 1078 code. This sometimes happens when a country changes its name or 1079 administration in such a way that warrants a new region code. 1081 3. Grandfathered or redundant tags from RFC 3066. In many cases, 1082 these tags have become obsolete because the values they represent 1083 were later encoded by ISO 639. 1085 Records that contain a 'Preferred-Value' field MUST also have a 1086 'Deprecated' field. This field contains a date of deprecation. 1087 Thus, a language tag processor can use the registry to construct the 1088 valid, non-deprecated set of subtags for a given date. In addition, 1089 for any given tag, a processor can construct the set of valid 1090 language tags that correspond to that tag for all dates up to the 1091 date of the registry. The ability to do these mappings MAY be 1092 beneficial to applications that are matching, selecting, for 1093 filtering content based on its language tags. 1095 Note that 'Preferred-Value' mappings in records of type 'region' 1096 sometimes do not represent exactly the same meaning as the original 1097 value. There are many reasons for a country code to be changed, and 1098 the effect this has on the formation of language tags will depend on 1099 the nature of the change in question. 1101 In particular, the 'Preferred-Value' field does not imply retagging 1102 content that uses the affected subtag. 1104 The field 'Preferred-Value' MUST NOT be modified once created in the 1105 registry. The field MAY be added to records according to the rules 1106 in Section 3.3. 1108 The 'Preferred-Value' field in records of type "grandfathered" and 1109 "redundant" contains whole language tags that are strongly 1110 RECOMMENDED for use in place of the record's value. In many cases, 1111 the mappings were created by deprecation of the tags during the 1112 period before this document was adopted. For example, the tag "no- 1113 nyn" was deprecated in favor of the ISO 639-1-defined language code 1114 'nn'. 1116 3.1.7. Prefix Field 1118 The field of type 'Prefix' MUST NOT be removed from any record. The 1119 field-body for this type of field MUST NOT be modified. 1121 The field-body of the 'Prefix' field consists of a language tag whose 1122 subtags are appropriate to use with this subtag. For example, the 1123 variant subtag '1996' has a 'Prefix' field of "de". This means that 1124 tags starting with the sequence "de-" are appropriate with this 1125 subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while 1126 the tag "fr-1996" is an inappropriate choice. 1128 Records of type 'variant' MAY have more than one field of type 1129 'Prefix'. Additional fields of this type MAY be added to a 'variant' 1130 record via the registration process. 1132 Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. 1134 3.1.8. Comments Field 1136 The field 'Comments' MAY appear more than once per record. This 1137 field MAY be inserted or changed via the registration process and no 1138 guarantee of stability is provided. The content of this field is not 1139 restricted, except by the need to register the information, the 1140 suitability of the request, and by reasonable practical size 1141 limitations. 1143 3.1.9. Suppress-Script Field 1145 The field 'Suppress-Script' MUST only appear in records whose 'Type' 1146 field-body is 'language'. This field MUST NOT appear more than one 1147 time in a record. This field indicates a script used to write the 1148 overwhelming majority of documents for the given language and that 1149 therefore adds no distinguishing information to a language tag. It 1150 helps ensure greater compatibility between the language tags 1151 generated according to the rules in this document and language tags 1152 and tag processors or consumers based on RFC 3066. For example, 1153 virtually all Icelandic documents are written in the Latin script, 1154 making the subtag 'Latn' redundant in the tag "is-Latn". 1156 3.2. Language Subtag Reviewer 1158 The Language Subtag Reviewer is appointed by the IESG for an 1159 indefinite term, subject to removal or replacement at the IESG's 1160 discretion. The Language Subtag Reviewer moderates the ietf- 1161 languages mailing list, responds to requests for registration, and 1162 performs the other registry maintenance duties described in 1163 Section 3.3. Only the Language Subtag Reviewer is permitted to 1164 request IANA to change, update, or add records to the Language Subtag 1165 Registry. 1167 The performance or decisions of the Language Subtag Reviewer MAY be 1168 appealed to the IESG under the same rules as other IETF decisions 1169 (see [RFC2026]). The IESG can reverse or overturn the decision of 1170 the Language Subtag Reviewer, provide guidance, or take other 1171 appropriate actions. 1173 3.3. Maintenance of the Registry 1175 Maintenance of the registry requires that as codes are assigned or 1176 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1177 Subtag Reviewer MUST evaluate each change, determine whether it 1178 conflicts with existing registry entries, and submit the information 1179 to IANA for inclusion in the registry. If a change takes place and 1180 the Language Subtag Reviewer does not do this in a timely manner, 1181 then any interested party MAY use the procedure in Section 3.5 to 1182 register the appropriate update. 1184 Note: The redundant and grandfathered entries together are the 1185 complete list of tags registered under [RFC3066]. The redundant tags 1186 are those that can now be formed using the subtags defined in the 1187 registry together with the rules of Section 2.2. The grandfathered 1188 entries include those that can never be legal under those same 1189 provisions plus those tags that contain subtags not yet registered 1190 or, perhaps, inappropriate for registration. 1192 The set of redundant and grandfathered tags is permanent and stable: 1193 new entries in this section MUST NOT be added and existing entries 1194 MUST NOT be removed. Records of type 'grandfathered' MAY have their 1195 type converted to 'redundant'; see item 12 in Section 3.6 for more 1196 information. The decision-making process about which tags were 1197 initially grandfathered and which were made redundant is described in 1198 [RFC4645]. 1200 RFC 3066 tags that were deprecated prior to the adoption of [RFC4646] 1201 are part of the list of grandfathered tags, and their component 1202 subtags were not included as registered variants (although they 1203 remain eligible for registration). For example, the tag "art-lojban" 1204 was deprecated in favor of the language subtag 'jbo'. 1206 The Language Subtag Reviewer MUST ensure that new subtags meet the 1207 requirements in Section 4.1 or submit an appropriate alternate subtag 1208 as described in that section. When either a change or addition to 1209 the registry is needed, the Language Subtag Reviewer MUST prepare the 1210 complete record, including all fields, and forward it to IANA for 1211 insertion into the registry. Each record being modified or inserted 1212 MUST be forwarded in a separate message. 1214 If a record represents a new subtag that does not currently exist in 1215 the registry, then the message's subject line MUST include the word 1216 "INSERT". If the record represents a change to an existing subtag, 1217 then the subject line of the message MUST include the word "MODIFY". 1218 The message MUST contain both the record for the subtag being 1219 inserted or modified and the new File-Date record. Here is an 1220 example of what the body of the message might contain: 1222 LANGUAGE SUBTAG MODIFICATION 1223 File-Date: 2005-01-02 1224 %% 1225 Type: variant 1226 Subtag: nedis 1227 Description: Natisone dialect 1228 Description: Nadiza dialect 1229 Added: 2003-10-09 1230 Prefix: sl 1231 Comments: This is a comment shown 1232 as an example. 1233 %% 1235 Figure 4: Example of a Language Subtag Modification Form 1237 Whenever an entry is created or modified in the registry, the 'File- 1238 Date' record at the start of the registry is updated to reflect the 1239 most recent modification date in the [RFC3339] "full-date" format. 1241 Before forwarding a new registration to IANA, the Language Subtag 1242 Reviewer MUST ensure that values in the 'Subtag' field match case 1243 according to the description in Section 3.1. 1245 3.4. Stability of IANA Registry Entries 1247 The stability of entries and their meaning in the registry is 1248 critical to the long-term stability of language tags. The rules in 1249 this section guarantee that a specific language tag's meaning is 1250 stable over time and will not change. 1252 These rules specifically deal with how changes to codes (including 1253 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1254 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1255 Subtag Registry. Assignments to the IANA Language Subtag Registry 1256 MUST follow the following stability rules: 1258 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1259 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1260 guaranteed to be stable over time. 1262 2. Values in the 'Description' field MUST NOT be changed in a way 1263 that would invalidate previously-existing tags. They MAY be 1264 broadened somewhat in scope, changed to add information, or 1265 adapted to the most common modern usage. For example, countries 1266 occasionally change their official names; a historical example 1267 of this would be "Upper Volta" changing to "Burkina Faso". 1269 3. Values in the field 'Prefix' MAY be added to records of type 1270 'variant' via the registration process. If a prefix is added to 1271 a variant record, 'Comment' fields SHOULD be used to explain 1272 different usages with the various prefixes. 1274 4. Values in the field 'Prefix' in records of type 'variant' MAY be 1275 modified, so long as the modifications broaden the set of 1276 prefixes. That is, a prefix MAY be replaced by one of its own 1277 prefixes. For example, the prefix "en-US" could be replaced by 1278 "en", but not by the prefixes "en-Latn", "fr", or "en-US-boont". 1279 If one of those prefixes were needed, a new Prefix SHOULD be 1280 registered. 1282 5. Values in the field 'Prefix' in records of type 'extlang' MUST 1283 NOT be modified. 1285 6. Values in the field 'Prefix' MUST NOT be removed. 1287 7. The field 'Comments' MAY be added, changed, modified, or removed 1288 via the registration process or any of the processes or 1289 considerations described in this section. 1291 8. The field 'Suppress-Script' MAY be added or removed via the 1292 registration process. 1294 9. Codes assigned by ISO 639-1 that do not conflict with existing 1295 two-letter primary language subtags and which have no 1296 corresponding three-letter primary or extended language subtags 1297 defined in the registry are entered into the IANA registry as 1298 new records of type 'language'. 1300 10. Codes assigned by ISO 639-2 that do not conflict with existing 1301 three-letter primary or extended language subtags are entered 1302 into the IANA registry as new records of type 'language'. 1304 11. Codes assigned by ISO 639-3 that do not conflict with existing 1305 three-letter primary or extended language subtags are entered 1306 into the IANA registry as new records. 1308 1. Codes that have a defined "macro-language" mapping at the 1309 time of their registration MUST be entered into the registry 1310 as records of type 'extlang' with a 'Prefix' field 1311 containing the appropriate prefix tag. 1313 2. Codes that represent sign languages MUST be entered into the 1314 registry as record of type 'extlang' with a 'Prefix' field 1315 that matches the Basic Language Range "sgn" (see Section 1316 3.3.1 "Basic Filtering" in [RFC4647]). 1318 3. All other codes MUST be entered into the registry as records 1319 of type 'language'. 1321 12. A record of type 'language' or 'extlang' MUST NOT be registered 1322 if there exists a record of either type with the same subtag 1323 value. For example, if an 'extlang' subtag 'foo' exists in the 1324 registry, all attempts to register a 'language' subtag 'foo' 1325 will be rejected. 1327 13. Codes assigned by ISO 15924 and ISO 3166 that do not conflict 1328 with existing subtags of the associated type and whose meaning 1329 is not the same as an existing subtag of the same type are 1330 entered into the IANA registry as new records. 1332 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1333 withdrawn by their respective maintenance or registration 1334 authority remain valid in language tags. A 'Deprecated' field 1335 containing the date of withdrawal MUST be added to the record. 1336 If a new record of the same type is added that represents a 1337 replacement value, then a 'Preferred-Value' field MAY also be 1338 added. The registration process MAY be used to add comments 1339 about the withdrawal of the code by the respective standard. 1341 Example The region code 'TL' was assigned to the country 'Timor- 1342 Leste', replacing the code 'TP' (which was assigned to 'East 1343 Timor' when it was under administration by Portugal). The 1344 subtag 'TP' remains valid in language tags, but its record 1345 contains the a 'Preferred-Value' of 'TL' and its field 1346 'Deprecated' contains the date the new code was assigned 1347 ('2004-07-06'). 1349 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1350 with existing subtags of the associated type, including subtags 1351 that are deprecated, MUST NOT be entered into the registry. The 1352 following additional considerations apply to subtag values that 1353 are reassigned: 1355 A. For ISO 639 codes, if the newly assigned code's meaning is 1356 not represented by a subtag in the IANA registry, the 1357 Language Subtag Reviewer, as described in Section 3.5, SHALL 1358 prepare a proposal for entering in the IANA registry as soon 1359 as practical a registered language subtag as an alternate 1360 value for the new code. The form of the registered language 1361 subtag will be at the discretion of the Language Subtag 1362 Reviewer and MUST conform to other restrictions on language 1363 subtags in this document. 1365 B. For all subtags whose meaning is derived from an external 1366 standard (that is, by ISO 639, ISO 15924, ISO 3166, or UN 1367 M.49), if a new meaning is assigned to an existing code and 1368 the new meaning broadens the meaning of that code, then the 1369 meaning for the associated subtag MAY be changed to match. 1370 The meaning of a subtag MUST NOT be narrowed, however, as 1371 this can result in an unknown proportion of the existing 1372 uses of a subtag becoming invalid. Note: ISO 639 1373 maintenance agency/registration authority (MA/RA) has 1374 adopted a similar stability policy. 1376 C. For ISO 15924 codes, if the newly assigned code's meaning is 1377 not represented by a subtag in the IANA registry, the 1378 Language Subtag Reviewer, as described in Section 3.5, SHALL 1379 prepare a proposal for entering in the IANA registry as soon 1380 as practical a registered variant subtag as an alternate 1381 value for the new code. The form of the registered variant 1382 subtag will be at the discretion of the Language Subtag 1383 Reviewer and MUST conform to other restrictions on variant 1384 subtags in this document. 1386 D. For ISO 3166 codes, if the newly assigned code's meaning is 1387 associated with the same UN M.49 code as another 'region' 1388 subtag, then the existing region subtag remains as the 1389 preferred value for that region and no new entry is created. 1390 A comment MAY be added to the existing region subtag 1391 indicating the relationship to the new ISO 3166 code. 1393 E. For ISO 3166 codes, if the newly assigned code's meaning is 1394 associated with a UN M.49 code that is not represented by an 1395 existing region subtag, then the Language Subtag Reviewer, 1396 as described in Section 3.5, SHALL prepare a proposal for 1397 entering the appropriate UN M.49 country code as an entry in 1398 the IANA registry. 1400 F. For ISO 3166 codes, if there is no associated UN numeric 1401 code, then the Language Subtag Reviewer SHALL petition the 1402 UN to create one. If there is no response from the UN 1403 within ninety days of the request being sent, the Language 1404 Subtag Reviewer SHALL prepare a proposal for entering in the 1405 IANA registry as soon as practical a registered variant 1406 subtag as an alternate value for the new code. The form of 1407 the registered variant subtag will be at the discretion of 1408 the Language Subtag Reviewer and MUST conform to other 1409 restrictions on variant subtags in this document. This 1410 situation is very unlikely to ever occur. 1412 16. UN M.49 has codes for both countries and areas (such as '276' 1413 for Germany) and geographical regions and sub-regions (such as 1414 '150' for Europe). UN M.49 country or area codes for which 1415 there is no corresponding ISO 3166 code SHOULD NOT be 1416 registered, except as a surrogate for an ISO 3166 code that is 1417 blocked from registration by an existing subtag. If such a code 1418 becomes necessary, then the registration authority for ISO 3166 1419 SHOULD first be petitioned to assign a code to the region. If 1420 the petition for a code assignment by ISO 3166 is refused or not 1421 acted on in a timely manner, the registration process described 1422 in Section 3.5 MAY then be used to register the corresponding UN 1423 M.49 code. This way, UN M.49 codes remain available as the 1424 value of last resort in cases where ISO 3166 reassigns a 1425 deprecated value in the registry. 1427 17. Stability provisions apply to grandfathered tags with this 1428 exception: should it be possible to compose one of the 1429 grandfathered tags from registered subtags, then the field 1430 'Type' in that record is changed from 'grandfathered' to 1431 'redundant'. Note that this will not affect language tags that 1432 match the grandfathered tag, since these tags will now match 1433 valid generative subtag sequences. For example, this document 1434 caused the ISO 639-3 code 'gan', used in the redundant tag "zh- 1435 gan", to be registered as an extended language subtag. The 1436 formerly-grandfathered tag "zh-gan" became a redundant tag as a 1437 result (but existing content or implementations that use "zh- 1438 gan" remain valid). 1440 3.5. Registration Procedure for Subtags 1442 The procedure given here MUST be used by anyone who wants to use a 1443 subtag not currently in the IANA Language Subtag Registry. 1445 Only subtags of type 'language' and 'variant' will be considered for 1446 independent registration of new subtags. Handling of subtags needed 1447 for stability and subtags necessary to keep the registry synchronized 1448 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1449 defined by this document are described in Section 3.3. Stability 1450 provisions are described in Section 3.4. 1452 This procedure MAY also be used to register or alter the information 1453 for the 'Description', 'Comments', 'Deprecated', 'Prefix', or 1454 'Suppress-Script' fields in a subtag's record as described in 1455 Section 3.4. Changes to all other fields in the IANA registry are 1456 NOT permitted. 1458 Registering a new subtag or requesting modifications to an existing 1459 tag or subtag starts with the requester filling out the registration 1460 form reproduced below. Note that each response is not limited in 1461 size so that the request can adequately describe the registration. 1462 The fields in the "Record Requested" section SHOULD follow the 1463 requirements in Section 3.1. 1465 LANGUAGE SUBTAG REGISTRATION FORM 1466 1. Name of requester: 1467 2. E-mail address of requester: 1468 3. Record Requested: 1470 Type: 1471 Subtag: 1472 Description: 1473 Prefix: 1474 Preferred-Value: 1475 Deprecated: 1476 Suppress-Script: 1477 Comments: 1479 4. Intended meaning of the subtag: 1480 5. Reference to published description 1481 of the language (book or article): 1482 6. Any other relevant information: 1484 Figure 5: The Language Subtag Registration Form 1486 The subtag registration form MUST be sent to 1487 for a two-week review period before it can 1488 be submitted to IANA. (This is an open list and can be joined by 1489 sending a request to .) 1491 Variant subtags are usually registered for use with a particular 1492 range of language tags. For example, the subtag 'rozaj' is intended 1493 for use with language tags that start with the primary language 1494 subtag "sl", since Resian is a dialect of Slovenian. Thus, the 1495 subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" 1496 or "sl-IT-rozaj". This information is stored in the 'Prefix' field 1497 in the registry. Variant registration requests SHOULD include at 1498 least one 'Prefix' field in the registration form. 1500 Extended language subtags MUST include exactly one 'Prefix' field. 1502 The 'Prefix' field for a given registered subtag exists in the IANA 1503 registry as a guide to usage. Additional prefixes MAY be added by 1504 filing an additional registration form. In that form, the "Any other 1505 relevant information:" field MUST indicate that it is the addition of 1506 a prefix. 1508 Requests to add a prefix to a variant subtag that imply a different 1509 semantic meaning will probably be rejected. For example, a request 1510 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1511 nedis" represented some German dialect would be rejected. The 1512 'nedis' subtag represents a particular Slovenian dialect and the 1513 additional registration would change the semantic meaning assigned to 1514 the subtag. A separate subtag SHOULD be proposed instead. 1516 The 'Description' field MUST contain a description of the tag being 1517 registered written or transcribed into the Latin script; it MAY also 1518 include a description in a non-Latin script. Non-ASCII characters 1519 MUST be escaped using the syntax described in Section 3.1. The 1520 'Description' field is used for identification purposes and doesn't 1521 necessarily represent the actual native name of the language or 1522 variation or to be in any particular language. 1524 While the 'Description' field itself is not guaranteed to be stable 1525 and errata corrections MAY be undertaken from time to time, attempts 1526 to provide translations or transcriptions of entries in the registry 1527 itself will probably be frowned upon by the community or rejected 1528 outright, as changes of this nature have an impact on the provisions 1529 in Section 3.4. 1531 When the two-week period has passed, the Language Subtag Reviewer 1532 either forwards the record to be inserted or modified to 1533 iana@iana.org according to the procedure described in Section 3.3, or 1534 rejects the request because of significant objections raised on the 1535 list or due to problems with constraints in this document (which MUST 1536 be explicitly cited). The Language Subtag Reviewer MAY also extend 1537 the review period in two-week increments to permit further 1538 discussion. The Language Subtag Reviewer MUST indicate on the list 1539 whether the registration has been accepted, rejected, or extended 1540 following each two-week period. 1542 Note that the Language Subtag Reviewer MAY raise objections on the 1543 list if he or she so desires. The important thing is that the 1544 objection MUST be made publicly. 1546 The applicant is free to modify a rejected application with 1547 additional information and submit it again; this restarts the two- 1548 week comment period. 1550 Decisions made by the Language Subtag Reviewer MAY be appealed to the 1551 IESG [RFC2028] under the same rules as other IETF decisions 1552 [RFC2026]. 1554 All approved registration forms are available online in the directory 1555 http://www.iana.org/numbers.html under "languages". 1557 Updates or changes to existing records follow the same procedure as 1558 new registrations. The Language Subtag Reviewer decides whether 1559 there is consensus to update the registration following the two week 1560 review period; normally, objections by the original registrant will 1561 carry extra weight in forming such a consensus. 1563 Registrations are permanent and stable. Once registered, subtags 1564 will not be removed from the registry and will remain a valid way in 1565 which to specify a specific language or variant. 1567 Note: The purpose of the "Reference to published description" section 1568 in the registration form is to aid in verifying whether a language is 1569 registered or what language or language variation a particular subtag 1570 refers to. In most cases, reference to an authoritative grammar or 1571 dictionary of that language will be useful; in cases where no such 1572 work exists, other well-known works describing that language or in 1573 that language MAY be appropriate. The Language Subtag Reviewer 1574 decides what constitutes "good enough" reference material. This 1575 requirement is not intended to exclude particular languages or 1576 dialects due to the size of the speaker population or lack of a 1577 standardized orthography. Minority languages will be considered 1578 equally on their own merits. 1580 3.6. Possibilities for Registration 1582 Possibilities for registration of subtags or information about 1583 subtags include: 1585 o Primary language subtags for languages not listed in ISO 639 that 1586 are not variants of any listed or registered language MAY be 1587 registered. At the time this document was created, there were no 1588 examples of this form of subtag. Before attempting to register a 1589 language subtag, there MUST be an attempt to register the language 1590 with ISO 639. Subtags MUST NOT be registered for languages 1591 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3, 1592 or that are under consideration by the ISO 639 registration 1593 authorities, or that have never been attempted for registration 1594 with those authorities. If ISO 639 has previously rejected a 1595 language for registration, it is reasonable to assume that there 1596 must be additional, very compelling evidence of need before it 1597 will be registered as a primary language subtag in the IANA 1598 registry (to the extent that it is very unlikely that any subtags 1599 will be registered of this type). 1601 o Dialect or other divisions or variations within a language, its 1602 orthography, writing system, regional or historical usage, 1603 transliteration or other transformation, or distinguishing 1604 variation MAY be registered as variant subtags. An example is the 1605 'rozaj' subtag (the Resian dialect of Slovenian). 1607 o The addition or maintenance of fields (generally of an 1608 informational nature) in Tag or Subtag records as described in 1609 Section 3.1 and subject to the stability provisions in 1610 Section 3.4. This includes descriptions, comments, deprecation 1611 and preferred values for obsolete or withdrawn codes, or the 1612 addition of script or extlang information to primary language 1613 subtags. 1615 o The addition of records and related field value changes necessary 1616 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1617 UN M.49 as described in Section 3.4. 1619 Subtags proposed for registration that would cause all or part of a 1620 grandfathered tag to become redundant but whose meaning conflicts 1621 with or alters the meaning of the grandfathered tag MUST be rejected. 1623 This document leaves the decision on what subtags or changes to 1624 subtags are appropriate (or not) to the registration process 1625 described in Section 3.5. 1627 Note: four-character primary language subtags are reserved to allow 1628 for the possibility of alpha4 codes in some future addition to the 1629 ISO 639 family of standards. 1631 ISO 639 defines a maintenance agency for additions to and changes in 1632 the list of languages in ISO 639. This agency is: 1634 International Information Centre for Terminology (Infoterm) 1635 Aichholzgasse 6/12, AT-1120 1636 Wien, Austria 1637 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1639 ISO 639-2 defines a maintenance agency for additions to and changes 1640 in the list of languages in ISO 639-2. This agency is: 1642 Library of Congress 1643 Network Development and MARC Standards Office 1644 Washington, D.C. 20540 USA 1645 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1646 URL: http://www.loc.gov/standards/iso639-2 1648 ISO 639-3 defines a maintenance agency for additions to and changes 1649 in the list of languages in ISO 639-3. This agency is: 1651 SIL International 1652 ISO 639-3 Registrar 1653 7500 W. Camp Wisdom Rd. 1654 Dallas, TX 75236 USA 1655 Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546 1656 Email: iso639-3@sil.org 1657 URL: http://www.sil.org/iso639-3 1659 The maintenance agency for ISO 3166 (country codes) is: 1661 ISO 3166 Maintenance Agency 1662 c/o International Organization for Standardization 1663 Case postale 56 1664 CH-1211 Geneva 20 Switzerland 1665 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1666 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1668 The registration authority for ISO 15924 (script codes) is: 1670 Unicode Consortium Box 391476 1671 Mountain View, CA 94039-1476, USA 1672 URL: http://www.unicode.org/iso15924 1674 The Statistics Division of the United Nations Secretariat maintains 1675 the Standard Country or Area Codes for Statistical Use and can be 1676 reached at: 1678 Statistical Services Branch 1679 Statistics Division 1680 United Nations, Room DC2-1620 1681 New York, NY 10017, USA 1682 Fax: +1-212-963-0623 1683 E-mail: statistics@un.org 1684 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1686 3.7. Extensions and Extensions Registry 1688 Extension subtags are those introduced by single-character subtags 1689 ("singletons") other than 'x'. They are reserved for the generation 1690 of identifiers that contain a language component and are compatible 1691 with applications that understand language tags. 1693 The structure and form of extensions are defined by this document so 1694 that implementations can be created that are forward compatible with 1695 applications that might be created using singletons in the future. 1696 In addition, defining a mechanism for maintaining singletons will 1697 lend stability to this document by reducing the likely need for 1698 future revisions or updates. 1700 Single-character subtags are assigned by IANA using the "IETF 1701 Consensus" policy defined by [RFC2434]. This policy requires the 1702 development of an RFC, which SHALL define the name, purpose, 1703 processes, and procedures for maintaining the subtags. The 1704 maintaining or registering authority, including name, contact email, 1705 discussion list email, and URL location of the registry, MUST be 1706 indicated clearly in the RFC. The RFC MUST specify or include each 1707 of the following: 1709 o The specification MUST reference the specific version or revision 1710 of this document that governs its creation and MUST reference this 1711 section of this document. 1713 o The specification and all subtags defined by the specification 1714 MUST follow the ABNF and other rules for the formation of tags and 1715 subtags as defined in this document. In particular, it MUST 1716 specify that case is not significant and that subtags MUST NOT 1717 exceed eight characters in length. 1719 o The specification MUST specify a canonical representation. 1721 o The specification of valid subtags MUST be available over the 1722 Internet and at no cost. 1724 o The specification MUST be in the public domain or available via a 1725 royalty-free license acceptable to the IETF and specified in the 1726 RFC. 1728 o The specification MUST be versioned, and each version of the 1729 specification MUST be numbered, dated, and stable. 1731 o The specification MUST be stable. That is, extension subtags, 1732 once defined by a specification, MUST NOT be retracted or change 1733 in meaning in any substantial way. 1735 o The specification MUST include in a separate section the 1736 registration form reproduced in this section (below) to be used in 1737 registering the extension upon publication as an RFC. 1739 o IANA MUST be informed of changes to the contact information and 1740 URL for the specification. 1742 IANA will maintain a registry of allocated single-character 1743 (singleton) subtags. This registry MUST use the record-jar format 1744 described by the ABNF in Section 3.1. Upon publication of an 1745 extension as an RFC, the maintaining authority defined in the RFC 1746 MUST forward this registration form to iesg@ietf.org, who MUST 1747 forward the request to iana@iana.org. The maintaining authority of 1748 the extension MUST maintain the accuracy of the record by sending an 1749 updated full copy of the record to iana@iana.org with the subject 1750 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only 1751 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY 1752 be modified in these updates. 1754 Failure to maintain this record, maintain the corresponding registry, 1755 or meet other conditions imposed by this section of this document MAY 1756 be appealed to the IESG [RFC2028] under the same rules as other IETF 1757 decisions (see [RFC2026]) and MAY result in the authority to maintain 1758 the extension being withdrawn or reassigned by the IESG. 1759 %% 1760 Identifier: 1761 Description: 1762 Comments: 1763 Added: 1764 RFC: 1765 Authority: 1766 Contact_Email: 1767 Mailing_List: 1768 URL: 1769 %% 1771 Figure 6: Format of Records in the Language Tag Extensions Registry 1773 'Identifier' contains the single-character subtag (singleton) 1774 assigned to the extension. The Internet-Draft submitted to define 1775 the extension SHOULD specify which letter or digit to use, although 1776 the IESG MAY change the assignment when approving the RFC. 1778 'Description' contains the name and description of the extension. 1780 'Comments' is an OPTIONAL field and MAY contain a broader description 1781 of the extension. 1783 'Added' contains the date the RFC was published in the "full-date" 1784 format specified in [RFC3339]. For example: 2004-06-28 represents 1785 June 28, 2004, in the Gregorian calendar. 1787 'RFC' contains the RFC number assigned to the extension. 1789 'Authority' contains the name of the maintaining authority for the 1790 extension. 1792 'Contact_Email' contains the email address used to contact the 1793 maintaining authority. 1795 'Mailing_List' contains the URL or subscription email address of the 1796 mailing list used by the maintaining authority. 1798 'URL' contains the URL of the registry for this extension. 1800 The determination of whether an Internet-Draft meets the above 1801 conditions and the decision to grant or withhold such authority rests 1802 solely with the IESG and is subject to the normal review and appeals 1803 process associated with the RFC process. 1805 Extension authors are strongly cautioned that many (including most 1806 well-formed) processors will be unaware of any special relationships 1807 or meaning inherent in the order of extension subtags. Extension 1808 authors SHOULD avoid subtag relationships or canonicalization 1809 mechanisms that interfere with matching or with length restrictions 1810 that sometimes exist in common protocols where the extension is used. 1811 In particular, applications MAY truncate the subtags in doing 1812 matching or in fitting into limited lengths, so it is RECOMMENDED 1813 that the most significant information be in the most significant 1814 (left-most) subtags and that the specification gracefully handle 1815 truncated subtags. 1817 When a language tag is to be used in a specific, known, protocol, it 1818 is RECOMMENDED that that the language tag not contain extensions not 1819 supported by that protocol. In addition, note that some protocols 1820 MAY impose upper limits on the length of the strings used to store or 1821 transport the language tag. 1823 3.8. Update of the Language Subtag Registry 1825 Upon adoption of this document the IANA Language Subtag Registry will 1826 need an update so that it contains the complete set of subtags valid 1827 in a language tag. This collection of subtags, along with a 1828 description of the process used to create it, is described by 1829 [registry-update]. IANA will publish the updated version of the 1830 registry described by this document using the instructions and 1831 content of [registry-update]. Once published by IANA, the 1832 maintenance procedures, rules, and registration processes described 1833 in this document will be available for new registrations or updates. 1835 Registrations that are in process under the rules defined in 1836 [RFC4646] when this document is adopted MUST be completed under the 1837 rules contained in this document. 1839 4. Formation and Processing of Language Tags 1841 This section addresses how to use the information in the registry 1842 with the tag syntax to choose, form, and process language tags. 1844 4.1. Choice of Language Tag 1846 One is sometimes faced with the choice between several possible tags 1847 for the same body of text. 1849 Interoperability is best served when all users use the same language 1850 tag in order to represent the same language. If an application has 1851 requirements that make the rules here inapplicable, then that 1852 application risks damaging interoperability. It is strongly 1853 RECOMMENDED that users not define their own rules for language tag 1854 choice. 1856 Subtags SHOULD only be used where they add useful distinguishing 1857 information; extraneous subtags interfere with the meaning, 1858 understanding, and processing of language tags. In particular, users 1859 and implementations SHOULD follow the 'Prefix' and 'Suppress-Script' 1860 fields in the registry (defined in Section 3.1): these fields provide 1861 guidance on when specific additional subtags SHOULD (and SHOULD NOT) 1862 be used in a language tag. 1864 Of particular note, many applications can benefit from the use of 1865 script subtags in language tags, as long as the use is consistent for 1866 a given context. Script subtags were not formally defined in RFC 1867 3066 and their use can affect matching and subtag identification by 1868 implementations of RFC 3066, as these subtags appear between the 1869 primary language and region subtags. For example, if a user requests 1870 content in an implementation of Section 2.5 of [RFC3066] using the 1871 language range "en-US", content labeled "en-Latn-US" will not match 1872 the request. Therefore, it is important to know when script subtags 1873 will customarily be used and when they ought not be used. In the 1874 registry, the Suppress-Script field helps ensure greater 1875 compatibility between the language tags generated according to the 1876 rules in this document and language tags and tag processors or 1877 consumers based on RFC 3066 by defining when users SHOULD NOT include 1878 a script subtag with a particular primary language subtag. 1880 Extended language subtags (type 'extlang' in the registry; see 1881 Section 3.1) also appear between the primary language and region 1882 subtags. Applications might benefit from their judicious use in 1883 forming language tags. [[ guidelines here?? ]] 1885 Standards, protocols, and applications that reference this document 1886 normatively but apply different rules to the ones given in this 1887 section MUST specify how the procedure varies from the one given 1888 here. 1890 The choice of subtags used to form a language tag SHOULD be guided by 1891 the following rules: 1893 1. Use as precise a tag as possible, but no more specific than is 1894 justified. Avoid using subtags that are not important for 1895 distinguishing content in an application. 1897 * For example, 'de' might suffice for tagging an email written 1898 in German, while "de-CH-1996" is probably unnecessarily 1899 precise for such a task. 1901 2. The script subtag SHOULD NOT be used to form language tags unless 1902 the script adds some distinguishing information to the tag. The 1903 field 'Suppress-Script' in the primary language record in the 1904 registry indicates script subtags that do not add distinguishing 1905 information for most applications. 1907 * For example, the subtag 'Latn' should not be used with the 1908 primary language 'en' because nearly all English documents are 1909 written in the Latin script and it adds no distinguishing 1910 information. However, if a document were written in English 1911 mixing Latin script with another script such as Braille 1912 ('Brai'), then it might be appropriate to choose to indicate 1913 both scripts to aid in content selection, such as the 1914 application of a style sheet. 1916 3. Use specific language subtags or subtag sequences in preference 1917 to subtags for language collections. A "language collection" is 1918 a subtag derived from one of the ISO 639-2 codes that represents 1919 multiple related languages. For example, the code 'nai' 1920 represents "North American languages". The registry contains 1921 values for the specific languages represented by this collective 1922 code. For example 'xxx' (language1) and 'yyy' (language2). Note 1923 that the languages contained in a collection (such as the two 1924 examples shown) are often unrelated except for their inclusion in 1925 the collection. 1927 4. If a tag or subtag has a 'Preferred-Value' field in its registry 1928 entry, then the value of that field SHOULD be used to form the 1929 language tag in preference to the tag or subtag in which the 1930 preferred value appears. 1932 * For example, use 'he' for Hebrew in preference to 'iw'. 1934 5. The 'und' (Undetermined) primary language subtag SHOULD NOT be 1935 used to label content, even if the language is unknown. Omitting 1936 the language tag altogether is preferred to using a tag with a 1937 primary language subtag of 'und'. The 'und' subtag MAY be useful 1938 for protocols that require a language tag to be provided. The 1939 'und' subtag MAY also be useful when matching language tags in 1940 certain situations. 1942 6. The 'mul' (Multiple) primary language subtag SHOULD NOT be used 1943 whenever the protocol allows the separate tags for multiple 1944 languages, as is the case for the Content-Language header in 1945 HTTP. The 'mul' subtag conveys little useful information: 1946 content in multiple languages SHOULD individually tag the 1947 languages where they appear or otherwise indicate the actual 1948 language in preference to the 'mul' subtag. 1950 7. The same variant subtag SHOULD NOT be used more than once within 1951 a language tag. 1953 * For example, do not use "de-DE-1901-1901". 1955 To ensure consistent backward compatibility, this document contains 1956 several provisions to account for potential instability in the 1957 standards used to define the subtags that make up language tags. 1958 These provisions mean that no language tag created under the rules in 1959 this document will become obsolete. 1961 4.2. Meaning of the Language Tag 1963 The relationship between the tag and the information it relates to is 1964 defined by the context in which the tag appears. Accordingly, this 1965 section gives only possible examples of its usage. 1967 o For a single information object, the associated language tags 1968 might be interpreted as the set of languages that is necessary for 1969 a complete comprehension of the complete object. Example: Plain 1970 text documents. 1972 o For an aggregation of information objects, the associated language 1973 tags could be taken as the set of languages used inside components 1974 of that aggregation. Examples: Document stores and libraries. 1976 o For information objects whose purpose is to provide alternatives, 1977 the associated language tags could be regarded as a hint that the 1978 content is provided in several languages and that one has to 1979 inspect each of the alternatives in order to find its language or 1980 languages. In this case, the presence of multiple tags might not 1981 mean that one needs to be multi-lingual to get complete 1982 understanding of the document. Example: MIME multipart/ 1983 alternative. 1985 o In markup languages, such as HTML and XML, language information 1986 can be added to each part of the document identified by the markup 1987 structure (including the whole document itself). For example, one 1988 could write C'est la vie. inside a 1989 Norwegian document; the Norwegian-speaking user could then access 1990 a French-Norwegian dictionary to find out what the marked section 1991 meant. If the user were listening to that document through a 1992 speech synthesis interface, this formation could be used to signal 1993 the synthesizer to appropriately apply French text-to-speech 1994 pronunciation rules to that span of text, instead of applying the 1995 inappropriate Norwegian rules. 1997 Language tags are related when they contain a similar sequence of 1998 subtags. For example, if a language tag B contains language tag A as 1999 a prefix, then B is typically "narrower" or "more specific" than A. 2000 Thus, "zh-Hant-TW" is more specific than "zh-Hant". 2002 This relationship is not guaranteed in all cases: specifically, 2003 languages that begin with the same sequence of subtags are NOT 2004 guaranteed to be mutually intelligible, although they might be. For 2005 example, the tag "az" shares a prefix with both "az-Latn" 2006 (Azerbaijani written using the Latin script) and "az-Cyrl" 2007 (Azerbaijani written using the Cyrillic script). A person fluent in 2008 one script might not be able to read the other, even though the text 2009 might be identical. Content tagged as "az" most probably is written 2010 in just one script and thus might not be intelligible to a reader 2011 familiar with the other script. 2013 4.3. Length Considerations 2015 There is no defined upper limit on the size of language tags. While 2016 historically most language tags have consisted of language and region 2017 subtags with a combined total length of up to six characters, larger 2018 tags have always been both possible and actually appeared in use. 2020 Neither the language tag syntax nor other requirements in this 2021 document impose a fixed upper limit on the number of subtags in a 2022 language tag (and thus an upper bound on the size of a tag). The 2023 language tag syntax suggests that, depending on the specific 2024 language, more subtags (and thus a longer tag) are sometimes 2025 necessary to completely identify the language for certain 2026 applications; thus, it is possible to envision long or complex subtag 2027 sequences. 2029 4.3.1. Working with Limited Buffer Sizes 2031 Some applications and protocols are forced to allocate fixed buffer 2032 sizes or otherwise limit the length of a language tag. A conformant 2033 implementation or specification MAY refuse to support the storage of 2034 language tags that exceed a specified length. Any such limitation 2035 SHOULD be clearly documented, and such documentation SHOULD include 2036 what happens to longer tags (for example, whether an error value is 2037 generated or the language tag is truncated). A protocol that allows 2038 tags to be truncated at an arbitrary limit, without giving any 2039 indication of what that limit is, has the potential for causing harm 2040 by changing the meaning of tags in substantial ways. 2042 In practice, most language tags do not require more than a few 2043 subtags and will not approach reasonably sized buffer limitations; 2044 see Section 4.1. 2046 Some specifications or protocols have limits on tag length but do not 2047 have a fixed length limitation. For example, [RFC2231] has no 2048 explicit length limitation: the length available for the language tag 2049 is constrained by the length of other header components (such as the 2050 charset's name) coupled with the 76-character limit in [RFC2047]. 2051 Thus, the "limit" might be 50 or more characters, but it could 2052 potentially be quite small. 2054 The considerations for assigning a buffer limit are: 2056 Implementations SHOULD NOT truncate language tags unless the 2057 meaning of the tag is purposefully being changed, or unless the 2058 tag does not fit into a limited buffer size specified by a 2059 protocol for storage or transmission. 2061 Implementations SHOULD warn the user when a tag is truncated since 2062 truncation changes the semantic meaning of the tag. 2064 Implementations of protocols or specifications that are space 2065 constrained but do not have a fixed limit SHOULD use the longest 2066 possible tag in preference to truncation. 2068 Protocols or specifications that specify limited buffer sizes for 2069 language tags MUST allow for language tags of up to 33 characters. 2071 Protocols or specifications that specify limited buffer sizes for 2072 language tags SHOULD allow for language tags of at least 42 2073 characters. 2075 The following illustration shows how the 42-character recommendation 2076 was derived. The combination of language and extended language 2077 subtags was chosen for future compatibility. At up to 15 characters, 2078 this combination is longer than the longest possible primary language 2079 subtag (8 characters): 2081 language = 3 (ISO 639-2; ISO 639-1 requires 2) 2082 extlang1 = 4 (each subsequent subtag includes '-') 2083 extlang2 = 4 (unlikely: needs prefix="language-extlang1") 2084 extlang3 = 4 (extremely unlikely) 2085 script = 5 (if not suppressed: see Section 4.1) 2086 region = 4 (UN M.49; ISO 3166 requires 3) 2087 variant1 = 9 (MUST have language as a prefix) 2088 variant2 = 9 (MUST have language-variant1 as a prefix) 2090 total = 42 characters 2092 Figure 7: Derivation of the Limit on Tag Length 2094 4.3.2. Truncation of Language Tags 2096 Truncation of a language tag alters the meaning of the tag, and thus 2097 SHOULD be avoided. However, truncation of language tags is sometimes 2098 necessary due to limited buffer sizes. Such truncation MUST NOT 2099 permit a subtag to be chopped off in the middle or the formation of 2100 invalid tags (for example, one ending with the "-" character). 2102 This means that applications or protocols that truncate tags MUST do 2103 so by progressively removing subtags along with their preceding "-" 2104 from the right side of the language tag until the tag is short enough 2105 for the given buffer. If the resulting tag ends with a single- 2106 character subtag, that subtag and its preceding "-" MUST also be 2107 removed. For example: 2109 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 2110 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 2111 2. zh-Latn-CN-variant1-a-extend1 2112 3. zh-Latn-CN-variant1 2113 4. zh-Latn-CN 2114 5. zh-Latn 2115 6. zh 2117 Figure 8: Example of Tag Truncation 2119 4.4. Canonicalization of Language Tags 2121 Since a particular language tag is sometimes used by many processes, 2122 language tags SHOULD always be created or generated in a canonical 2123 form. 2125 A language tag is in canonical form when: 2127 1. The tag is well-formed according the rules in Section 2.1 and 2128 Section 2.2. 2130 2. Subtags of type 'Region' that have a Preferred-Value mapping in 2131 the IANA registry (see Section 3.1) SHOULD be replaced with their 2132 mapped value. Note: In rare cases, the mapped value will also 2133 have a Preferred-Value. 2135 3. Redundant or grandfathered tags that have a Preferred-Value 2136 mapping in the IANA registry (see Section 3.1) MUST be replaced 2137 with their mapped value. These items either are deprecated 2138 mappings created before the adoption of this document (such as 2139 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 2140 the result of later registrations or additions to this document 2141 (for example, "zh-guoyu" might be mapped to a language-extlang 2142 combination such as "zh-cmn" by some future update of this 2143 document). 2145 4. Other subtags that have a Preferred-Value mapping in the IANA 2146 registry (see Section 3.1) MUST be replaced with their mapped 2147 value. These items consist entirely of clerical corrections to 2148 ISO 639-1 in which the deprecated subtags have been maintained 2149 for compatibility purposes. 2151 5. If more than one extension subtag sequence exists, the extension 2152 sequences are ordered into case-insensitive ASCII order by 2153 singleton subtag. 2155 Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical 2156 form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in 2157 canonical form. 2159 Example: The language tag "en-BU" (English as used in Burma) is not 2160 canonical because the 'BU' subtag has a canonical mapping to 'MM' 2161 (Myanmar), although the tag "en-BU" maintains its validity. 2163 Canonicalization of language tags does not imply anything about the 2164 use of upper or lowercase letters when processing or comparing 2165 subtags (and as described in Section 2.1). All comparisons MUST be 2166 performed in a case-insensitive manner. 2168 When performing canonicalization of language tags, processors MAY 2169 regularize the case of the subtags (that is, this process is 2170 OPTIONAL), following the case used in the registry. Note that this 2171 corresponds to the following casing rules: uppercase all non-initial 2172 two-letter subtags; titlecase all non-initial four-letter subtags; 2173 lowercase everything else. 2175 Note: Case folding of ASCII letters in certain locales, unless 2176 carefully handled, sometimes produces non-ASCII character values. 2177 The Unicode Character Database file "SpecialCasing.txt" defines the 2178 specific cases that are known to cause problems with this. In 2179 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2180 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2181 Implementers SHOULD specify a locale-neutral casing operation to 2182 ensure that case folding of subtags does not produce this value, 2183 which is illegal in language tags. For example, if one were to 2184 uppercase the region subtag 'in' using Turkish locale rules, the 2185 sequence U+0130 U+004E would result instead of the expected 'IN'. 2187 Note: if the field 'Deprecated' appears in a registry record without 2188 an accompanying 'Preferred-Value' field, then that tag or subtag is 2189 deprecated without a replacement. Validating processors SHOULD NOT 2190 generate tags that include these values, although the values are 2191 canonical when they appear in a language tag. 2193 An extension MUST define any relationships that exist between the 2194 various subtags in the extension and thus MAY define an alternate 2195 canonicalization scheme for the extension's subtags. Extensions MAY 2196 define how the order of the extension's subtags are interpreted. For 2197 example, an extension could define that its subtags are in canonical 2198 order when the subtags are placed into ASCII order: that is, "en-a- 2199 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2200 define that the order of the subtags influences their semantic 2201 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2202 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2203 so that they are tolerant of the typical processes described in 2204 Section 3.7. 2206 4.5. Considerations for Private Use Subtags 2208 Private use subtags, like all other subtags, MUST conform to the 2209 format and content constraints in the ABNF. Private use subtags have 2210 no meaning outside the private agreement between the parties that 2211 intend to use or exchange language tags that employ them. The same 2212 subtags MAY be used with a different meaning under a separate private 2213 agreement. They SHOULD NOT be used where alternatives exist and 2214 SHOULD NOT be used in content or protocols intended for general use. 2216 Private use subtags are simply useless for information exchange 2217 without prior arrangement. The value and semantic meaning of private 2218 use tags and of the subtags used within such a language tag are not 2219 defined by this document. 2221 Subtags defined in the IANA registry as having a specific private use 2222 meaning convey more information that a purely private use tag 2223 prefixed by the singleton subtag 'x'. For applications, this 2224 additional information MAY be useful. 2226 For example, the region subtags 'AA', 'ZZ', and in the ranges 2227 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2228 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2229 great deal of public, interchangeable information about the language 2230 material (that it is Chinese in the simplified Chinese script and is 2231 suitable for some geographic region 'XQ'). While the precise 2232 geographic region is not known outside of private agreement, the tag 2233 conveys far more information than an opaque tag such as "x-someLang", 2234 which contains no information about the language subtag or script 2235 subtag outside of the private agreement. 2237 However, in some cases content tagged with private use subtags MAY 2238 interact with other systems in a different and possibly unsuitable 2239 manner compared to tags that use opaque, privately defined subtags, 2240 so the choice of the best approach sometimes depends on the 2241 particular domain in question. 2243 5. IANA Considerations 2245 This section deals with the processes and requirements necessary for 2246 IANA to undertake to maintain the subtag and extension registries as 2247 defined by this document and in accordance with the requirements of 2248 [RFC2434]. 2250 The impact on the IANA maintainers of the two registries defined by 2251 this document will be a small increase in the frequency of new 2252 entries or updates. 2254 5.1. Language Subtag Registry 2256 Upon adoption of this document, IANA will update the registry using 2257 instructions and content provided in a companion document: [registry- 2258 update]. The criteria and process for selecting the updated set of 2259 records are described in that document. The updated set of records 2260 represents no impact on IANA, since the work to create it will be 2261 performed externally. 2263 Future work on the Language Subtag Registry has been limited to 2264 inserting or replacing whole records preformatted for IANA by the 2265 Language Subtag Reviewer as described in Section 3.3 of this document 2266 and archiving the forwarded registration form. 2268 Each record MUST be sent to iana@iana.org with a subject line 2269 indicating whether the enclosed record is an insertion of a new 2270 record (indicated by the word "INSERT" in the subject line) or a 2271 replacement of an existing record (indicated by the word "MODIFY" in 2272 the subject line). Records MUST NOT be deleted from the registry. 2273 IANA MUST place any inserted or modified records into the appropriate 2274 section of the language subtag registry, grouping the records by 2275 their 'Type' field. Inserted records MAY be placed anywhere in the 2276 appropriate section; there is no guarantee of the order of the 2277 records beyond grouping them together by 'Type'. Modified records 2278 MUST overwrite the record they replace. 2280 Included in any request to insert or modify records MUST be a new 2281 File-Date record. This record MUST be placed first in the registry. 2282 In the event that the File-Date record present in the registry has a 2283 later date than the record being inserted or modified, the existing 2284 record MUST be preserved. 2286 5.2. Extensions Registry 2288 The Language Tag Extensions Registry can contain at most 35 records 2289 and thus changes to this registry are expected to be very infrequent. 2291 Future work by IANA on the Language Tag Extensions Registry is 2292 limited to two cases. First, the IESG MAY request that new records 2293 be inserted into this registry from time to time. These requests 2294 MUST include the record to insert in the exact format described in 2295 Section 3.7. In addition, there MAY be occasional requests from the 2296 maintaining authority for a specific extension to update the contact 2297 information or URLs in the record. These requests MUST include the 2298 complete, updated record. IANA is not responsible for validating the 2299 information provided, only that it is properly formatted. It should 2300 reasonably be seen to come from the maintaining authority named in 2301 the record present in the registry. 2303 6. Security Considerations 2305 Language tags used in content negotiation, like any other information 2306 exchanged on the Internet, might be a source of concern because they 2307 might be used to infer the nationality of the sender, and thus 2308 identify potential targets for surveillance. 2310 This is a special case of the general problem that anything sent is 2311 visible to the receiving party and possibly to third parties as well. 2312 It is useful to be aware that such concerns can exist in some cases. 2314 The evaluation of the exact magnitude of the threat, and any possible 2315 countermeasures, is left to each application protocol (see BCP 72 2316 [RFC3552] for best current practice guidance on security threats and 2317 defenses). 2319 The language tag associated with a particular information item is of 2320 no consequence whatsoever in determining whether that content might 2321 contain possible homographs. The fact that a text is tagged as being 2322 in one language or using a particular script subtag provides no 2323 assurance whatsoever that it does not contain characters from scripts 2324 other than the one(s) associated with or specified by that language 2325 tag. 2327 Since there is no limit to the number of variant, private use, and 2328 extension subtags, and consequently no limit on the possible length 2329 of a tag, implementations need to guard against buffer overflow 2330 attacks. See Section 4.3 for details on language tag truncation, 2331 which can occur as a consequence of defenses against buffer overflow. 2333 Although the specification of valid subtags for an extension (see 2334 Section 3.7) MUST be available over the Internet, implementations 2335 SHOULD NOT mechanically depend on it being always accessible, to 2336 prevent denial-of-service attacks. 2338 7. Character Set Considerations 2340 The syntax in this document requires that language tags use only the 2341 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2342 character sets, so the composition of language tags should not have 2343 any character set issues. 2345 Rendering of characters based on the content of a language tag is not 2346 addressed in this memo. Historically, some languages have relied on 2347 the use of specific character sets or other information in order to 2348 infer how a specific character should be rendered (notably this 2349 applies to language- and culture-specific variations of Han 2350 ideographs as used in Japanese, Chinese, and Korean). When language 2351 tags are applied to spans of text, rendering engines sometimes use 2352 that information in deciding which font to use in the absence of 2353 other information, particularly where languages with distinct writing 2354 traditions use the same characters. 2356 8. Changes from RFC 4646 2358 The main goal for this revision of this document was to incorporate 2359 ISO 639-3 and its attendent set of language codes into the IANA 2360 Language Subtag Registry, permitting the identification of many more 2361 languages and dialects than previously supported. 2363 The specific changes in this document to meet these goals are: 2365 o Defines the incorporation of ISO 639-3 codes as language and 2366 extlang subtags. Extlangs are now permitted in language tags. 2367 The changes necessary to achieve this were: 2369 * something 2371 o Changed the ABNF related to grandfathered tags. The irregular 2372 tags are now listed. Users of RFC 4646 sometimes made the mistake 2373 of implementing the grandfathered ABNF without checking the actual 2374 list of tags, thus allowing some illegal tags. Also: added 2375 description of both types of grandfathered tags to Section 2.2.8. 2377 o Added the paragraph on "collections" to Section 4.1. 2379 o Changed the capitalization rules for 'Tag' fields in Section 3.1. 2381 o Split section 3.1 up into subsections. 2383 o Modified section 3.5 to allow Suppress-Script fields to be added, 2384 modified, or removed via the registration process. This was an 2385 erratum from RFC 4646. 2387 o Modified examples that used region code 'CS' (formerly Serbia and 2388 Montenegro) to use 'RS' (Serbia) instead. 2390 o Modified the rules for creating and maintaining record 2391 'Description' fields to prevent duplicates, including inverted 2392 duplicates. 2394 o Removed the lengthy description of why RFC 4646 was created from 2395 this section, which also caused the removal of the reference to 2396 XML Schema. 2398 9. References 2400 9.1. Normative References 2402 [ISO10646] 2403 International Organization for Standardization, "ISO/IEC 2404 10646:2003. Information technology -- Universal Multiple- 2405 Octet Coded Character Set (UCS)", 2003. 2407 [ISO15924] 2408 International Organization for Standardization, "ISO 2409 15924:2004. Information and documentation -- Codes for the 2410 representation of names of scripts", January 2004. 2412 [ISO3166-1] 2413 International Organization for Standardization, "ISO 3166- 2414 1:1997. Codes for the representation of names of countries 2415 and their subdivisions -- Part 1: Country codes", 1997. 2417 [ISO639-1] 2418 International Organization for Standardization, "ISO 639- 2419 1:2002. Codes for the representation of names of languages 2420 -- Part 1: Alpha-2 code", 2002. 2422 [ISO639-2] 2423 International Organization for Standardization, "ISO 639- 2424 2:1998. Codes for the representation of names of languages 2425 -- Part 2: Alpha-3 code, first edition", 1998. 2427 [ISO646] International Organization for Standardization, "ISO/IEC 2428 646:1991, Information technology -- ISO 7-bit coded 2429 character set for information interchange.", 1991. 2431 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2432 3", BCP 9, RFC 2026, October 1996. 2434 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2435 the IETF Standards Process", BCP 11, RFC 2028, 2436 October 1996. 2438 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2439 Requirement Levels", BCP 14, RFC 2119, March 1997. 2441 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2442 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2443 October 1998. 2445 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2446 Understanding Concerning the Technical Work of the 2447 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2449 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2450 Timestamps", RFC 3339, July 2002. 2452 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 2453 Specifications: ABNF", RFC 4234, October 2005. 2455 [RFC4645] Ewell, D., Ed., "Initial Language Subtag Registry", 2456 September 2006, . 2458 [RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of Language 2459 Tags", September 2006, 2460 . 2462 [UN_M.49] Statistics Division, United Nations, "Standard Country or 2463 Area Codes for Statistical Use", UN Standard Country or 2464 Area Codes for Statistical Use, Revision 4 (United Nations 2465 publication, Sales No. 98.XVII.9, June 1999. 2467 9.2. Informative References 2469 [RFC1766] Alvestrand, H., "Tags for the Identification of 2470 Languages", RFC 1766, March 1995. 2472 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2473 Part Three: Message Header Extensions for Non-ASCII Text", 2474 RFC 2047, November 1996. 2476 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2477 Word Extensions: Character Sets, Languages, and 2478 Continuations", RFC 2231, November 1997. 2480 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2481 10646", RFC 2781, February 2000. 2483 [RFC3066] Alvestrand, H., "Tags for the Identification of 2484 Languages", BCP 47, RFC 3066, January 2001. 2486 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2487 Text on Security Considerations", BCP 72, RFC 3552, 2488 July 2003. 2490 [RFC4646] Phillips, A., Ed. and M. Davis, Ed., "Tags for the 2491 Identification of Languages", September 2006, 2492 . 2494 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2495 Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003. 2496 ISBN 0-321-49081-0)", January 2007. 2498 [XML10] Bray (et al), T., "Extensible Markup Language (XML) 1.0", 2499 02 2004. 2501 [iso639.prin] 2502 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2503 Committee: Working principles for ISO 639 maintenance", 2504 March 2000, 2505 . 2508 [record-jar] 2509 Raymond, E., "The Art of Unix Programming", 2003, 2510 . 2512 [registry-update] 2513 Ewell, D., Ed., "Update to the Language Subtag Registry", 2514 September 2006, . 2517 Appendix A. Acknowledgements 2519 Any list of contributors is bound to be incomplete; please regard the 2520 following as only a selection from the group of people who have 2521 contributed to make this document what it is today. 2523 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the 2524 precursors of this document, made enormous contributions directly or 2525 indirectly to this document and are generally responsible for the 2526 success of language tags. 2528 The following people contributed to this document: 2530 Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan, 2531 Martin Duerst, Frank Ellerman, Doug Ewell, Marion Gunn, Randy 2532 Presuhn, and many, many others. 2534 Very special thanks must go to Harald Tveit Alvestrand, who 2535 originated RFCs 1766 and 3066, and without whom this document would 2536 not have been possible. 2538 Special thanks go to Michael Everson, who served as the Language Tag 2539 Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as 2540 the Language Subtag Reviewer since the adoption of RFC 4646. 2542 Special thanks also to Doug Ewell, for his production of the first 2543 complete subtag registry, his work to support and maintain new 2544 registrations, and his careful editorship of both RFC 4645 and 2545 [draft-initial]. 2547 Appendix B. Examples of Language Tags (Informative) 2549 Simple language subtag: 2551 de (German) 2553 fr (French) 2555 ja (Japanese) 2557 i-enochian (example of a grandfathered tag) 2559 Language subtag plus Script subtag: 2561 zh-Hant (Chinese written using the Traditional Chinese script) 2563 zh-Hans (Chinese written using the Simplified Chinese script) 2565 sr-Cyrl (Serbian written using the Cyrillic script) 2567 sr-Latn (Serbian written using the Latin script) 2569 Language-Script-Region: 2571 zh-Hans-CN (Chinese written using the Simplified script as used in 2572 mainland China) 2574 sr-Latn-RS (Serbian written using the Latin script as used in 2575 Serbia) 2577 Language-Variant: 2579 sl-rozaj (Resian dialect of Slovenian 2581 sl-nedis (Nadiza dialect of Slovenian) 2583 Language-Region-Variant: 2585 de-CH-1901 (German as used in Switzerland using the 1901 variant 2586 [orthography]) 2588 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 2590 Language-Script-Region-Variant: 2592 sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the 2593 Latin script as used in Italy. Note that this tag is NOT 2594 RECOMMENDED because subtag 'sl' has a Suppress-Script value of 2595 'Latn') 2597 Language-Region: 2599 de-DE (German for Germany) 2601 en-US (English as used in the United States) 2603 es-419 (Spanish appropriate for the Latin America and Caribbean 2604 region using the UN region code) 2606 Private use subtags: 2608 de-CH-x-phonebk 2610 az-Arab-x-AZE-derbend 2612 Extended language subtags (examples ONLY: extended languages MUST be 2613 defined by revision or update to this document): 2615 zh-min 2617 zh-min-nan-Hant-CN 2619 Private use registry values: 2621 x-whatever (private use using the singleton 'x') 2623 qaa-Qaaa-QM-x-southern (all private tags) 2625 de-Qaaa (German, with a private script) 2627 sr-Latn-QM (Serbian, Latin-script, private region) 2629 sr-Qaaa-RS (Serbian, private script, for Serbia) 2631 Tags that use extensions (examples ONLY: extensions MUST be defined 2632 by revision or update to this document or by RFC): 2634 en-US-u-islamCal 2636 zh-CN-a-myExt-x-private 2637 en-a-myExt-b-another 2639 Some Invalid Tags: 2641 de-419-DE (two region tags) 2643 a-DE (use of a single-character subtag in primary position; note 2644 that there are a few grandfathered tags that start with "i-" that 2645 are valid) 2647 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter 2648 prefix) 2650 Authors' Addresses 2652 Addison Phillips (editor) 2653 Yahoo! Inc. 2655 Email: addison@inter-locale.com 2656 URI: http://www.inter-locale.com 2658 Mark Davis (editor) 2659 Google 2661 Email: mark.davis@macchiato.com or mark.davis@google.com 2663 Intellectual Property Statement 2665 The IETF takes no position regarding the validity or scope of any 2666 Intellectual Property Rights or other rights that might be claimed to 2667 pertain to the implementation or use of the technology described in 2668 this document or the extent to which any license under such rights 2669 might or might not be available; nor does it represent that it has 2670 made any independent effort to identify any such rights. Information 2671 on the procedures with respect to rights in RFC documents can be 2672 found in BCP 78 and BCP 79. 2674 Copies of IPR disclosures made to the IETF Secretariat and any 2675 assurances of licenses to be made available, or the result of an 2676 attempt made to obtain a general license or permission for the use of 2677 such proprietary rights by implementers or users of this 2678 specification can be obtained from the IETF on-line IPR repository at 2679 http://www.ietf.org/ipr. 2681 The IETF invites any interested party to bring to its attention any 2682 copyrights, patents or patent applications, or other proprietary 2683 rights that may cover technology that may be required to implement 2684 this standard. Please address the information to the IETF at 2685 ietf-ipr@ietf.org. 2687 Disclaimer of Validity 2689 This document and the information contained herein are provided on an 2690 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2691 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2692 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2693 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2694 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2695 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2697 Copyright Statement 2699 Copyright (C) The Internet Society (2006). This document is subject 2700 to the rights, licenses and restrictions contained in BCP 78, and 2701 except as set forth therein, the authors retain all their rights. 2703 Acknowledgment 2705 Funding for the RFC Editor function is currently provided by the 2706 Internet Society.