idnits 2.17.1 draft-ietf-ltru-4646bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2563. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2540. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2547. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2553. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC4646, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 11, 2006) is 6436 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC1766' is defined on line 2338, but no explicit reference was found in the text == Unused Reference: 'XMLSchema' is defined on line 2370, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2860 ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) -- Obsolete informational reference (is this intentional?): RFC 4646 (Obsoleted by RFC 5646) Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc. 4 Obsoletes: 4646 (if approved) M. Davis, Ed. 5 Expires: March 15, 2007 Google 6 September 11, 2006 8 Tags for Identifying Languages 9 draft-ietf-ltru-4646bis-00 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on March 15, 2007. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 This document describes the structure, content, construction, and 43 semantics of language tags for use in cases where it is desirable to 44 indicate the language used in an information object. It also 45 describes how to register values for use in language tags and the 46 creation of user-defined extensions for private interchange. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4 52 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 2.2. Language Subtag Sources and Interpretation . . . . . . . . 6 54 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . 8 55 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 10 56 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 11 57 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 11 58 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 13 59 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 14 60 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 16 61 2.2.8. Grandfathered Registrations . . . . . . . . . . . . . 16 62 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 17 63 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 18 64 3.1. Format of the IANA Language Subtag Registry . . . . . . . 18 65 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 23 66 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 24 67 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 25 68 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 29 69 3.6. Possibilities for Registration . . . . . . . . . . . . . . 32 70 3.7. Extensions and Extensions Registry . . . . . . . . . . . . 34 71 3.8. Update of the Language Subtag Registry . . . . . . . . . . 37 72 4. Formation and Processing of Language Tags . . . . . . . . . . 38 73 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 38 74 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 40 75 4.3. Length Considerations . . . . . . . . . . . . . . . . . . 41 76 4.3.1. Working with Limited Buffer Sizes . . . . . . . . . . 41 77 4.3.2. Truncation of Language Tags . . . . . . . . . . . . . 43 78 4.4. Canonicalization of Language Tags . . . . . . . . . . . . 43 79 4.5. Considerations for Private Use Subtags . . . . . . . . . . 45 80 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 47 81 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 47 82 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 47 83 6. Security Considerations . . . . . . . . . . . . . . . . . . . 49 84 7. Character Set Considerations . . . . . . . . . . . . . . . . . 50 85 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 51 86 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 52 87 9.1. Normative References . . . . . . . . . . . . . . . . . . . 52 88 9.2. Informative References . . . . . . . . . . . . . . . . . . 53 89 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 55 90 Appendix B. Examples of Language Tags (Informative) . . . . . . . 56 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 59 92 Intellectual Property and Copyright Statements . . . . . . . . . . 60 94 1. Introduction 96 Human beings on our planet have, past and present, used a number of 97 languages. There are many reasons why one would want to identify the 98 language used when presenting or requesting information. 100 A user's language preferences often need to be identified so that 101 appropriate processing can be applied. For example, the user's 102 language preferences in a Web browser can be used to select Web pages 103 appropriately. Language preferences can also be used to select among 104 tools (such as dictionaries) to assist in the processing or 105 understanding of content in different languages. 107 In addition, knowledge about the particular language used by some 108 piece of information content might be useful or even required by some 109 types of processing; for example, spell-checking, computer- 110 synthesized speech, Braille transcription, or high-quality print 111 renderings. 113 One means of indicating the language used is by labeling the 114 information content with an identifier or "tag". These tags can be 115 used to specify user preferences when selecting information content, 116 or for labeling additional attributes of content and associated 117 resources. 119 Tags can also be used to indicate additional language attributes of 120 content. For example, indicating specific information about the 121 dialect, writing system, or orthography used in a document or 122 resource may enable the user to obtain information in a form that 123 they can understand, or it can be important in processing or 124 rendering the given content into an appropriate form or style. 126 This document specifies a particular identifier mechanism (the 127 language tag) and a registration function for values to be used to 128 form tags. It also defines a mechanism for private use values and 129 future extension. 131 This document replaces [RFC4646], which replaced [RFC3066]. For a 132 list of changes in this document, see Section 8. 134 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 135 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 136 document are to be interpreted as described in [RFC2119]. 138 2. The Language Tag 140 Language tags are used to help identify languages, whether spoken, 141 written, signed, or otherwise signaled, for the purpose of 142 communication. This includes constructed and artificial languages, 143 but excludes languages not intended primarily for human 144 communication, such as programming languages. 146 2.1. Syntax 148 The language tag is composed of one or more parts, known as 149 "subtags". Each subtag consists of a sequence of alphanumeric 150 characters. Subtags are distinguished and separated from one another 151 by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a 152 "primary language" subtag and a (possibly empty) series of subsequent 153 subtags, each of which refines or narrows the range of languages 154 identified by the overall tag. 156 Usually, each type of subtag is distinguished by length, position in 157 the tag, and content: subtags can be recognized solely by these 158 features. The only exception to this is a fixed list of 159 grandfathered tags registered under RFC 3066 [RFC3066]. This makes 160 it possible to construct a parser that can extract and assign some 161 semantic information to the subtags, even if the specific subtag 162 values are not recognized. Thus, a parser need not have an up-to- 163 date copy (or any copy at all) of the subtag registry to perform most 164 searching and matching operations. 166 The syntax of the language tag in ABNF [RFC4234] is: 168 Language-Tag = langtag 169 / privateuse ; private use tag 170 / grandfathered ; grandfathered registrations 172 langtag = (language 173 ["-" script] 174 ["-" region] 175 *("-" variant) 176 *("-" extension) 177 ["-" privateuse]) 179 language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code 180 / 4ALPHA ; reserved for future use 181 / 5*8ALPHA ; registered language subtag 183 extlang = *3("-" 3ALPHA) ; specific ISO 639-3 codes 185 script = 4ALPHA ; ISO 15924 code 187 region = 2ALPHA ; ISO 3166 code 188 / 3DIGIT ; UN M.49 code 190 variant = 5*8alphanum ; registered variants 191 / (DIGIT 3alphanum) 193 extension = singleton 1*("-" (2*8alphanum)) 195 singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT 196 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" 197 ; Single letters: x/X is reserved for private use 199 privateuse = ("x"/"X") 1*("-" (1*8alphanum)) 201 grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum)) 202 ; grandfathered registration 203 ; Note: i is the only singleton 204 ; that starts a grandfathered tag 206 alphanum = (ALPHA / DIGIT) ; letters and numbers 208 Figure 1: Language Tag ABNF 210 Note: There is a subtlety in the ABNF for 'variant': variants 211 starting with a digit MAY be four characters long, while those 212 starting with a letter MUST be at least five characters long. 214 All subtags have a maximum length of eight characters and whitespace 215 is not permitted in a language tag. For examples of language tags, 216 see Appendix B. 218 Note that although [RFC4234] refers to octets, the language tags 219 described in this document are sequences of characters from the US- 220 ASCII [ISO646] repertoire. Language tags MAY be used in documents 221 and applications that use other encodings, so long as these encompass 222 the US-ASCII repertoire. An example of this would be an XML document 223 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 225 The tags and their subtags, including private use and extensions, are 226 to be treated as case insensitive: there exist conventions for the 227 capitalization of some of the subtags, but these MUST NOT be taken to 228 carry meaning. 230 For example: 232 o [ISO639-1] recommends that language codes be written in lowercase 233 ('mn' Mongolian). 235 o [ISO3166-1] recommends that country codes be capitalized ('MN' 236 Mongolia). 238 o [ISO15924] recommends that script codes use lowercase with the 239 initial letter capitalized ('Cyrl' Cyrillic). 241 However, in the tags defined by this document, the uppercase US-ASCII 242 letters in the range 'A' through 'Z' are considered equivalent and 243 mapped directly to their US-ASCII lowercase equivalents in the range 244 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from 245 "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of 246 these variations conveys the same meaning: Mongolian written in the 247 Cyrillic script as used in Mongolia. 249 Although case distinctions do not carry meaning in language tags, 250 consistent formatting and presentation of the tags will aid users. 251 The format of the tags and subtags in the registry is RECOMMENDED. 252 In this format, all non-initial two-letter subtags are uppercase, all 253 non-initial four-letter subtags are titlecase, and all other subtags 254 are lowercase. 256 2.2. Language Subtag Sources and Interpretation 258 The namespace of language tags and their subtags is administered by 259 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 260 the rules in Section 5 of this document. The Language Subtag 261 Registry maintained by IANA is the source for valid subtags: other 262 standards referenced in this section provide the source material for 263 that registry. 265 Terminology used in this document: 267 o Tag or tags refers to a complete language tag, such as 268 "fr-Latn-CA". Examples of tags in this document are enclosed in 269 double-quotes ("en-US"). 271 o Subtag refers to a specific section of a tag, delimited by hyphen, 272 such as the subtag 'Latn' in "fr-Latn-CA". Examples of subtags in 273 this document are enclosed in single quotes ('Latn'). 275 o Code or codes refers to values defined in external standards (and 276 which are used as subtags in this document). For example, 'Latn' 277 is an [ISO15924] script code that was used to define the 'Latn' 278 script subtag for use in a language tag. Examples of codes in 279 this document are enclosed in single quotes ('en', 'Latn'). 281 The definitions in this section apply to the various subtags within 282 the language tags defined by this document, excepting those 283 "grandfathered" tags defined in Section 2.2.8. 285 Language tags are designed so that each subtag type has unique length 286 and content restrictions. These make identification of the subtag's 287 type possible, even if the content of the subtag itself is 288 unrecognized. This allows tags to be parsed and processed without 289 reference to the latest version of the underlying standards or the 290 IANA registry and makes the associated exception handling when 291 parsing tags simpler. 293 Subtags in the IANA registry that do not come from an underlying 294 standard can only appear in specific positions in a tag. 295 Specifically, they can only occur as primary language subtags or as 296 variant subtags. 298 Note that sequences of private use and extension subtags MUST occur 299 at the end of the sequence of subtags and MUST NOT be interspersed 300 with subtags defined elsewhere in this document. 302 Single-letter and single-digit subtags are reserved for current or 303 future use. These include the following current uses: 305 o The single-letter subtag 'x' is reserved to introduce a sequence 306 of private use subtags. The interpretation of any private use 307 subtags is defined solely by private agreement and is not defined 308 by the rules in this section or in any standard or registry 309 defined in this document. 311 o All other single-letter subtags are reserved to introduce 312 standardized extension subtag sequences as described in 313 Section 3.7. 315 The single-letter subtag 'i' is used by some grandfathered tags, such 316 as "i-enochian", where it always appears in the first position and 317 cannot be confused with an extension. 319 2.2.1. Primary Language Subtag 321 The primary language subtag is the first subtag in a language tag 322 (with the exception of private use and certain grandfathered tags) 323 and cannot be omitted. The following rules apply to the primary 324 language subtag: 326 1. All two-character primary language subtags were defined in the 327 IANA registry according to the assignments found in the standard 328 ISO 639 Part 1, "ISO 639-1:2002, Codes for the representation of 329 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 330 assignments subsequently made by the ISO 639 Part 1 maintenance 331 agency or governing standardization bodies. 333 2. All three-character primary language subtags were defined in the 334 IANA registry according to the assignments found in either ISO 335 639 Part 2, "ISO 639-2:1998 - Codes for the representation of 336 names of languages -- Part 2: Alpha-3 code - edition 1" 337 [ISO639-2], ISO 639 Part 3, "ISO 639-3:200?, [[??missing official 338 title??]]", or assignments subsequently made by the relevant ISO 339 639 maintenance agengies or governing standardization bodies. 341 3. The subtags in the range 'qaa' through 'qtz' are reserved for 342 private use in language tags. These subtags correspond to codes 343 reserved by ISO 639-2 for private use. These codes MAY be used 344 for non-registered primary language subtags (instead of using 345 private use subtags following 'x-'). Please refer to Section 4.5 346 for more information on private use subtags. 348 4. All four-character language subtags are reserved for possible 349 future standardization. 351 5. All language subtags of 5 to 8 characters in length in the IANA 352 registry were defined via the registration process in Section 3.5 353 and MAY be used to form the primary language subtag. At the time 354 this document was created, there were no examples of this kind of 355 subtag and future registrations of this type will be discouraged: 356 primary languages are strongly RECOMMENDED for registration with 357 ISO 639, and proposals rejected by ISO 639/RA will be closely 358 scrutinized before they are registered with IANA. 360 6. The single-character subtag 'x' as the primary subtag indicates 361 that the language tag consists solely of subtags whose meaning is 362 defined by private agreement. For example, in the tag "x-fr-CH", 363 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 364 French language or the country of Switzerland (or any other value 365 in the IANA registry) unless there is a private agreement in 366 place to do so. See Section 4.5. 368 7. The single-character subtag 'i' is used by some grandfathered 369 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 370 grandfathered tags have a primary language subtag in their first 371 position.) 373 8. Other values MUST NOT be assigned to the primary subtag except by 374 revision or update of this document. 376 Note: For languages that have both an ISO 639-1 two-character code 377 and a three character code assigned by either ISO 639-2 or ISO 693-3, 378 only the ISO 639-1 two-character code is defined in the IANA 379 registry. 381 Note: For languages that have no ISO 639-1 two-character code and for 382 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 383 (Bibliographic) codes differ, only the Terminology code is defined in 384 the IANA registry. At the time this document was created, all 385 languages that had both kinds of three-character code were also 386 assigned a two-character code; it is not expected that future 387 assignments of this nature will occur. 389 Note: To avoid problems with versioning and subtag choice as 390 experienced during the transition between RFC 1766 and RFC 3066, as 391 well as the canonical nature of subtags defined by this document, the 392 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 393 RA-JAC) has included the following statement in [iso639.prin]: 395 "A language code already in ISO 639-2 at the point of freezing ISO 396 639-1 shall not later be added to ISO 639-1. This is to ensure 397 consistency in usage over time, since users are directed in Internet 398 applications to employ the alpha-3 code when an alpha-2 code for that 399 language is not available." 401 In order to avoid instability in the canonical form of tags, if a 402 two-character code is added to ISO 639-1 for a language for which a 403 three-character code was already included in either ISO 639-2 or ISO 404 639-3, the two-character code MUST NOT be registered. See 405 Section 3.4. 407 For example, if some content were tagged with 'haw' (Hawaiian), which 408 currently has no two-character code, the tag would not be invalidated 409 if ISO 639-1 were to assign a two-character code to the Hawaiian 410 language at a later date. 412 For example, one of the grandfathered IANA registrations is 413 "i-enochian". The subtag 'enochian' could be registered in the IANA 414 registry as a primary language subtag (assuming that ISO 639 does not 415 register this language first), making tags such as "enochian-AQ" and 416 "enochian-Latn" valid. 418 2.2.2. Extended Language Subtags 420 Extended language subtags are used to identify languages or dialects 421 that are subdivisions within another language. Such an enclosing 422 language is sometimes called a "collective" or "macro" language. The 423 following rules apply to the extended language subtags: 425 1. These subtags were defined in the IANA registry according to 426 assignments found in ISO 639 Part 3. 428 2. A sequence of up to three extended language subtags MAY appear in 429 a language tag. This sequence MUST follow the primary language 430 subtag and precede any other subtags. 432 3. Each extended language subtag MUST only be used with the exact 433 sequence of subtags that appears in the 'Prefix' field in its 434 registry record. 436 4. There MAY be up to three extended language subtags. 438 5. Other values MUST NOT be assigned to the extended language subtag 439 except by revision or update of this document. 441 Extended language subtag records MUST include exactly one 'Prefix' 442 field indicating an appropriate subtag or sequence of subtags for 443 that extended language subtag. 445 For example, the 'gan' subtag, representing the 'Gan' dialect of 446 Chinese, has a prefix of "zh" in its registry record. The 'cmn' 447 subtag, representing the 'Mandarin' dialect of Chinese has the same 448 prefix. Thus, the tags "zh-gan-Hant" or "zh-cmn-CN" are appropriate, 449 while the tag "zh-cmn-gan" is not. 451 Now suppose that 'xxx' is a subtag that represents a dialect of 452 'Gan'. It would have a 'Prefix' field of "zh-gan", making the tag 453 "zh-gan-xxx" appropriate, while the tags "zh-xxx" and "zh-xxx-gan" 454 would not be appropriate. 456 2.2.3. Script Subtag 458 Script subtags are used to indicate the script or writing system 459 variations that distinguish the written forms of a language or its 460 dialects. The following rules apply to the script subtags: 462 1. All four-character subtags were defined according to 463 [ISO15924]--"Codes for the representation of the names of 464 scripts": alpha-4 script codes, or subsequently assigned by the 465 ISO 15924 maintenance agency or governing standardization bodies, 466 denoting the script or writing system used in conjunction with 467 this language. 469 2. Script subtags MUST immediately follow the primary language 470 subtag and all extended language subtags and MUST occur before 471 any other type of subtag described below. 473 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 474 use in language tags. These subtags correspond to codes reserved 475 by ISO 15924 for private use. These codes MAY be used for non- 476 registered script values. Please refer to Section 4.5 for more 477 information on private use subtags. 479 4. Script subtags MUST NOT be registered using the process in 480 Section 3.5 of this document. Variant subtags MAY be considered 481 for registration for that purpose. 483 5. There MUST be at most one script subtag in a language tag, and 484 the script subtag SHOULD be omitted when it adds no 485 distinguishing value to the tag or when the primary language 486 subtag's record includes a Suppress-Script field listing the 487 applicable script subtag. 489 Example: "sr-Latn" represents Serbian written using the Latin script. 491 2.2.4. Region Subtag 493 Region subtags are used to indicate linguistic variations associated 494 with or appropriate to a specific country, territory, or region. 495 Typically, a region subtag is used to indicate regional dialects or 496 usage, or region-specific spelling conventions. A region subtag can 497 also be used to indicate that content is expressed in a way that is 498 appropriate for use throughout a region, for instance, Spanish 499 content tailored to be useful throughout Latin America. 501 The following rules apply to the region subtags: 503 1. Region subtags MUST follow any language, extended language, or 504 script subtags and MUST precede all other subtags. 506 2. All two-character subtags following the primary subtag were 507 defined in the IANA registry according to the assignments found 508 in [ISO3166-1] ("Codes for the representation of names of 509 countries and their subdivisions -- Part 1: Country codes") using 510 the list of alpha-2 country codes, or using assignments 511 subsequently made by the ISO 3166 maintenance agency or governing 512 standardization bodies. 514 3. All three-character subtags consisting of digit (numeric) 515 characters following the primary subtag were defined in the IANA 516 registry according to the assignments found in UN Standard 517 Country or Area Codes for Statistical Use [UN_M.49] or 518 assignments subsequently made by the governing standards body. 519 Note that not all of the UN M.49 codes are defined in the IANA 520 registry. The following rules define which codes are entered 521 into the registry as valid subtags: 523 A. UN numeric codes assigned to 'macro-geographical 524 (continental)' or sub-regions MUST be registered in the 525 registry. These codes are not associated with an assigned 526 ISO 3166 alpha-2 code and represent supra-national areas, 527 usually covering more than one nation, state, province, or 528 territory. 530 B. UN numeric codes for 'economic groupings' or 'other 531 groupings' MUST NOT be registered in the IANA registry and 532 MUST NOT be used to form language tags. 534 C. UN numeric codes for countries or areas with ambiguous ISO 535 3166 alpha-2 codes, when entered into the registry, MUST be 536 defined according to the rules in Section 3.4 and MUST be 537 used to form language tags that represent the country or 538 region for which they are defined. 540 D. UN numeric codes for countries or areas for which there is an 541 associated ISO 3166 alpha-2 code in the registry MUST NOT be 542 entered into the registry and MUST NOT be used to form 543 language tags. Note that the ISO 3166-based subtag in the 544 registry MUST actually be associated with the UN M.49 code in 545 question. 547 E. UN numeric codes and ISO 3166 alpha-2 codes for countries or 548 areas listed as eligible for registration in [initial- 549 registry] but not presently registered MAY be entered into 550 the IANA registry via the process described in Section 3.5. 552 Once registered, these codes MAY be used to form language 553 tags. 555 F. All other UN numeric codes for countries or areas that do not 556 have an associated ISO 3166 alpha-2 code MUST NOT be entered 557 into the registry and MUST NOT be used to form language tags. 558 For more information about these codes, see Section 3.4. 560 4. Note: The alphanumeric codes in Appendix X of the UN document 561 MUST NOT be entered into the registry and MUST NOT be used to 562 form language tags. (At the time this document was created, 563 these values matched the ISO 3166 alpha-2 codes.) 565 5. There MUST be at most one region subtag in a language tag and the 566 region subtag MAY be omitted, as when it adds no distinguishing 567 value to the tag. 569 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 570 reserved for private use in language tags. These subtags 571 correspond to codes reserved by ISO 3166 for private use. These 572 codes MAY be used for private use region subtags (instead of 573 using a private use subtag sequence). Please refer to 574 Section 4.5 for more information on private use subtags. 576 "de-CH" represents German ('de') as used in Switzerland ('CH'). 578 "sr-Latn-CS" represents Serbian ('sr') written using Latin script 579 ('Latn') as used in Serbia and Montenegro ('CS'). 581 "es-419" represents Spanish ('es') appropriate to the UN-defined 582 Latin America and Caribbean region ('419'). 584 2.2.5. Variant Subtags 586 Variant subtags are used to indicate additional, well-recognized 587 variations that define a language or its dialects that are not 588 covered by other available subtags. The following rules apply to the 589 variant subtags: 591 1. Variant subtags are not associated with any external standard. 592 Variant subtags and their meanings are defined by the 593 registration process defined in Section 3.5. 595 2. Variant subtags MUST follow all of the other defined subtags, but 596 precede any extension or private use subtag sequences. 598 3. More than one variant MAY be used to form the language tag. 600 4. Variant subtags MUST be registered with IANA according to the 601 rules in Section 3.5 of this document before being used to form 602 language tags. In order to distinguish variants from other types 603 of subtags, registrations MUST meet the following length and 604 content restrictions: 606 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 607 at least five characters long. 609 2. Variant subtags that begin with a digit (0-9) MUST be at 610 least four characters long. 612 Variant subtag records in the language subtag registry MAY include 613 one or more 'Prefix' fields, which indicate the language tag or tags 614 that would make a suitable prefix (with other subtags, as 615 appropriate) in forming a language tag with the variant. For 616 example, the subtag 'nedis' has a Prefix of "sl", making it suitable 617 to form language tags such as "sl-nedis" and "sl-IT-nedis", but not 618 suitable for use in a tag such as "zh-nedis" or "it-IT-nedis". 620 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 622 "de-CH-1996" represents German as used in Switzerland and as written 623 using the spelling reform beginning in the year 1996 C.E. 625 Most variants that share a prefix are mutually exclusive. For 626 example, the German orthographic variations '1996' and '1901' SHOULD 627 NOT be used in the same tag, as they represent the dates of different 628 spelling reforms. A variant that can meaningfully be used in 629 combination with another variant SHOULD include a 'Prefix' field in 630 its registry record that lists that other variant. For example, if 631 another German variant 'example' were created that made sense to use 632 with '1996', then 'example' should include two Prefix fields: "de" 633 and "de-1996". 635 2.2.6. Extension Subtags 637 Extensions provide a mechanism for extending language tags for use in 638 various applications. See Section 3.7. The following rules apply to 639 extensions: 641 1. Extension subtags are separated from the other subtags defined 642 in this document by a single-character subtag ("singleton"). 643 The singleton MUST be one allocated to a registration authority 644 via the mechanism described in Section 3.7 and MUST NOT be the 645 letter 'x', which is reserved for private use subtag sequences. 647 2. Note: Private use subtag sequences starting with the singleton 648 subtag 'x' are described in Section 2.2.7 below. 650 3. An extension MUST follow at least a primary language subtag. 651 That is, a language tag cannot begin with an extension. 652 Extensions extend language tags, they do not override or replace 653 them. For example, "a-value" is not a well-formed language tag, 654 while "de-a-value" is. 656 4. Each singleton subtag MUST appear at most one time in each tag 657 (other than as a private use subtag). That is, singleton 658 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 659 ccc" is invalid because the subtag 'a' appears twice. Note that 660 the tag "en-a-bbb-x-a-ccc" is valid because the second 661 appearance of the singleton 'a' is in a private use sequence. 663 5. Extension subtags MUST meet all of the requirements for the 664 content and format of subtags defined in this document. 666 6. Extension subtags MUST meet whatever requirements are set by the 667 document that defines their singleton prefix and whatever 668 requirements are provided by the maintaining authority. 670 7. Each extension subtag MUST be from two to eight characters long 671 and consist solely of letters or digits, with each subtag 672 separated by a single '-'. 674 8. Each singleton MUST be followed by at least one extension 675 subtag. For example, the tag "tlh-a-b-foo" is invalid because 676 the first singleton 'a' is followed immediately by another 677 singleton 'b'. 679 9. Extension subtags MUST follow all language, extended language, 680 script, region, and variant subtags in a tag. 682 10. All subtags following the singleton and before another singleton 683 are part of the extension. Example: In the tag "fr-a-Latn", the 684 subtag 'Latn' does not represent the script subtag 'Latn' 685 defined in the IANA Language Subtag Registry. Its meaning is 686 defined by the extension 'a'. 688 11. In the event that more than one extension appears in a single 689 tag, the tag SHOULD be canonicalized as described in 690 Section 4.4. 692 For example, if the prefix singleton 'r' and the shown subtags were 693 defined, then the following tag would be a valid example: "en-Latn- 694 GB-boont-r-extended-sequence-x-private" 696 2.2.7. Private Use Subtags 698 Private use subtags are used to indicate distinctions in language 699 important in a given context by private agreement. The following 700 rules apply to private use subtags: 702 1. Private use subtags are separated from the other subtags defined 703 in this document by the reserved single-character subtag 'x'. 705 2. Private use subtags MUST conform to the format and content 706 constraints defined in the ABNF for all subtags. 708 3. Private use subtags MUST follow all language, extended language, 709 script, region, variant, and extension subtags in the tag. 710 Another way of saying this is that all subtags following the 711 singleton 'x' MUST be considered private use. Example: The 712 subtag 'US' in the tag "en-x-US" is a private use subtag. 714 4. A tag MAY consist entirely of private use subtags. 716 5. No source is defined for private use subtags. Use of private use 717 subtags is by private agreement only. 719 6. Private use subtags are NOT RECOMMENDED where alternatives exist 720 or for general interchange. See Section 4.5 for more information 721 on private use subtag choice. 723 For example: Users who wished to utilize codes from the Ethnologue 724 publication of SIL International for language identification might 725 agree to exchange tags such as "az-Arab-x-AZE-derbend". This example 726 contains two private use subtags. The first is 'AZE' and the second 727 is 'derbend'. 729 2.2.8. Grandfathered Registrations 731 Prior to RFC 4646, whole language tags were registered according to 732 the rules in RFC 1766 and/or RFC 3066. These registered tags 733 maintain their validity. Those tags which were made obsolete or 734 redundant by the advent of RFC 4646 or by subsequent registration of 735 subtags are maintained in the registry in records as "redundant" tag 736 records. Those that would not be well-formed according to the ABNF 737 in this document or that contain subtags that do not individually 738 appear in the registry are maintained in the registry in record of 739 the "grandfathered" type. Grandfathered tags contain one or more 740 subtags that are not defined in the Language Subtag Registry (see 741 Section 3). Redundant tags consist entirely of subtags defined above 742 and whose independent registration is superseded by this document. 743 For more information see Section 3.8. 745 2.2.9. Classes of Conformance 747 Implementations sometimes need to describe their capabilities with 748 regard to the rules and practices described in this document. There 749 are two classes of conforming implementations described by this 750 document: "well-formed" processors and "validating" processors. 751 Claims of conformance SHOULD explicitly reference one of these 752 definitions. 754 An implementation that claims to check for well-formed language tags 755 MUST: 757 o Check that the tag and all of its subtags, including extension and 758 private use subtags, conform to the ABNF or that the tag is on the 759 list of grandfathered tags. 761 o Check that singleton subtags that identify extensions do not 762 repeat. For example, the tag "en-a-xx-b-yy-a-zz" is not well- 763 formed. 765 Well-formed processors are strongly encouraged to implement the 766 canonicalization rules contained in Section 4.4. 768 An implementation that claims to be validating MUST: 770 o Check that the tag is well-formed. 772 o Specify the particular registry date for which the implementation 773 performs validation of subtags. 775 o Check that either the tag is a grandfathered tag, or that all 776 language, script, region, and variant subtags consist of valid 777 codes for use in language tags according to the IANA registry as 778 of the particular date specified by the implementation. 780 o Specify which, if any, extension RFCs as defined in Section 3.7 781 are supported, including version, revision, and date. 783 o For any such extensions supported, check that all subtags used in 784 that extension are valid. 786 o For extended language subtags, check that the tag matches at least 787 one 'Prefix' field associated with the subtag. The tag matches if 788 all the subtags in the 'Prefix' also appear in the tag. For 789 example, the prefix "es-CO" matches the tag "es-Latn-CO-x-private" 790 because both the 'es' language subtag and 'CO' region subtag 791 appear in the tag. 793 3. Registry Format and Maintenance 795 This section defines the Language Subtag Registry and the maintenance 796 and update procedures associated with it, as well as a registry for 797 extensions to language tags (Section 3.7). 799 The Language Subtag Registry contains a comprehensive list of all of 800 the subtags valid in language tags. This allows implementers a 801 straightforward and reliable way to validate language tags. The 802 Language Subtag Registry will be maintained so that, except for 803 extension subtags, it is possible to validate all of the subtags that 804 appear in a language tag under the provisions of this document or its 805 revisions or successors. In addition, the meaning of the various 806 subtags will be unambiguous and stable over time. (The meaning of 807 private use subtags, of course, is not defined by the IANA registry.) 809 3.1. Format of the IANA Language Subtag Registry 811 The IANA Language Subtag Registry ("the registry") consists of a text 812 file that is machine readable in the format described in this 813 section, plus copies of the registration forms approved in accordance 814 with the process described in Section 3.5. The existing registration 815 forms for grandfathered and redundant tags taken from RFC 3066 will 816 be maintained as part of the obsolete RFC 3066 registry. The 817 remaining set of initial subtags will not have registration forms 818 created for them. 820 The registry is in the text format described below. This format was 821 based on the record-jar format described in [record-jar]. 823 Each line of text is limited to 72 characters, including all 824 whitespace. Records are separated by lines containing only the 825 sequence "%%" (%x25.25). 827 Each field can be viewed as a single, logical line of ASCII 828 characters, comprising a field-name and a field-body separated by a 829 COLON character (%x3A). For convenience, the field-body portion of 830 this conceptual entity can be split into a multiple-line 831 representation; this is called "folding". The format of the registry 832 is described by the following ABNF (per [RFC4234]): 834 registry = record *("%%" CRLF record) 835 record = 1*( field-name *SP ":" *SP field-body CRLF ) 836 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] 837 field-body = *(ASCCHAR/LWSP) 838 ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 839 UNICHAR = "&#x" 2*6HEXDIG ";" 840 Figure 2: Registry Format ABNF 842 The sequence '..' (%x2E.2E) in a field-body denotes a range of 843 values. Such a range represents all subtags of the same length that 844 are in alphabetic or numeric order within that range, including the 845 values explicitly mentioned. For example 'a..c' denotes the values 846 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 847 '13'. 849 Characters from outside the US-ASCII [ISO646] repertoire, as well as 850 the AMPERSAND character ("&", %x26) when it occurs in a field-body, 851 are represented by a "Numeric Character Reference" using hexadecimal 852 notation in the style used by [XML10] (see 853 ). This consists of the 854 sequence "&#x" (%x26.23.78) followed by a hexadecimal representation 855 of the character's code point in [ISO10646] followed by a closing 856 semicolon (%x3B). For example, the EURO SIGN, U+20AC, would be 857 represented by the sequence "€". Note that the hexadecimal 858 notation MAY have between two and six digits. 860 All fields whose field-body contains a date value use the "full-date" 861 format specified in [RFC3339]. For example: "2004-06-28" represents 862 June 28, 2004, in the Gregorian calendar. 864 The first record in the file contains the single field whose field- 865 name is "File-Date" (see Figure 2). The field-body of this record 866 contains the last modification date of this copy of the registry, 867 making it possible to compare different versions of the registry. 868 The registry on the IANA website is the most current. Versions with 869 an older date than that one are not up-to-date. 871 File-Date: 2004-06-28 872 %% 874 Figure 3: Example of the File-Date Record 876 Subsequent records represent subtags in the registry. Each of the 877 fields in each record MUST occur no more than once, unless otherwise 878 noted below. Each record MUST contain the following fields: 880 o 'Type' 882 * Type's field-value MUST consist of one of the following 883 strings: "language", "extlang", "script", "region", "variant", 884 "grandfathered", and "redundant" and denotes the type of tag or 885 subtag. 887 o Either 'Subtag' or 'Tag' 889 * Subtag's field-value contains the subtag being defined. This 890 field MUST only appear in records of whose 'Type' has one of 891 these values: "language", "extlang", "script", "region", or 892 "variant". 894 * Tag's field-value contains a complete language tag. This field 895 MUST only appear in records whose 'Type' has one of these 896 values: "grandfathered" or "redundant". Note that the field- 897 value will always follow the 'grandfathered' production in the 898 ABNF in Section 2.1 900 o Description 902 * Description's field-value contains a non-normative description 903 of the subtag or tag. 905 o Added 907 * Added's field-value contains the date the record was added to 908 the registry. 910 The 'Subtag' or 'Tag' field MUST use lowercase letters to form the 911 subtag or tag, with two exceptions. Subtags whose 'Type' field is 912 'script' (in other words, subtags defined by ISO 15924) MUST use 913 titlecase. Subtags whose 'Type' field is 'region' (in other words, 914 subtags defined by ISO 3166) MUST use uppercase. These exceptions 915 mirror the use of case in the underlying standards. 917 The field 'Description' MAY appear more than one time and contains a 918 description of the tag or subtag in the record. At least one of the 919 'Description' fields MUST be written or transcribed into the Latin 920 script; the same or additional fields MAY also include a description 921 in a non-Latin script. The 'Description' field is used for 922 identification purposes and SHOULD NOT be taken to represent the 923 actual native name of the language or variation or to be in any 924 particular language. Most descriptions are taken directly from 925 source standards such as ISO 639 or ISO 3166. 927 Note: Descriptions in registry entries that correspond to ISO 639, 928 ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate 929 the meaning of that identifier as defined in the source standard at 930 the time it was added to the registry. The description does not 931 replace the content of the source standard itself. The descriptions 932 are not intended to be the English localized names for the subtags. 933 Localization or translation of language tag and subtag descriptions 934 is out of scope of this document. 936 Each record MAY also contain the following fields: 938 o Preferred-Value 940 * For fields of type 'language', 'extlang', 'script', 'region', 941 and 'variant', 'Preferred-Value' contains the subtag of the 942 same 'Type' that is preferred for forming the language tag. 944 * For fields of type 'grandfathered' and 'redundant', a canonical 945 mapping to a complete language tag. 947 o Deprecated 949 * Deprecated's field-value contains the date the record was 950 deprecated. 952 o Prefix 954 * Prefix's field-value contains a language tag with which this 955 subtag MAY be used to form a new language tag, perhaps with 956 other subtags as well. This field MUST only appear in records 957 whose 'Type' field-value is 'variant' or 'extlang'. For 958 example, the 'Prefix' for the variant 'nedis' is 'sl', meaning 959 that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate 960 while the tag "is-nedis" is not. 962 o Comments 964 * Comments contains additional information about the subtag, as 965 deemed appropriate for understanding the registry and 966 implementing language tags using the subtag or tag. 968 o Suppress-Script 970 * Suppress-Script contains a script subtag that SHOULD NOT be 971 used to form language tags with the associated primary language 972 subtag. This field MUST only appear in records whose 'Type' 973 field-value is 'language'. See Section 4.1. 975 The field 'Deprecated' MAY be added to any record via the maintenance 976 process described in Section 3.3 or via the registration process 977 described in Section 3.5. Usually, the addition of a 'Deprecated' 978 field is due to the action of one of the standards bodies, such as 979 ISO 3166, withdrawing a code. In some historical cases, it might not 980 have been possible to reconstruct the original deprecation date. For 981 these cases, an approximate date appears in the registry. Although 982 valid in language tags, subtags and tags with a 'Deprecated' field 983 are deprecated and validating processors SHOULD NOT generate these 984 subtags. Note that a record that contains a 'Deprecated' field and 985 no corresponding 'Preferred-Value' field has no replacement mapping. 987 The field 'Preferred-Value' contains a mapping between the record in 988 which it appears and another tag or subtag. The value in this field 989 is STRONGLY RECOMMENDED as the best choice to represent the value of 990 this record when selecting a language tag. These values form three 991 groups: 993 1. ISO 639 language codes that were later withdrawn in favor of 994 other codes. These values are mostly a historical curiosity. 996 2. ISO 3166 region codes that have been withdrawn in favor of a new 997 code. This sometimes happens when a country changes its name or 998 administration in such a way that warrants a new region code. 1000 3. Tags grandfathered from RFC 3066. In many cases, these tags have 1001 become obsolete because the values they represent were later 1002 encoded by ISO 639. 1004 Records that contain a 'Preferred-Value' field MUST also have a 1005 'Deprecated' field. This field contains a date of deprecation. 1006 Thus, a language tag processor can use the registry to construct the 1007 valid, non-deprecated set of subtags for a given date. In addition, 1008 for any given tag, a processor can construct the set of valid 1009 language tags that correspond to that tag for all dates up to the 1010 date of the registry. The ability to do these mappings MAY be 1011 beneficial to applications that are matching, selecting, for 1012 filtering content based on its language tags. 1014 Note that 'Preferred-Value' mappings in records of type 'region' 1015 sometimes do not represent exactly the same meaning as the original 1016 value. There are many reasons for a country code to be changed, and 1017 the effect this has on the formation of language tags will depend on 1018 the nature of the change in question. 1020 In particular, the 'Preferred-Value' field does not imply retagging 1021 content that uses the affected subtag. 1023 The field 'Preferred-Value' MUST NOT be modified once created in the 1024 registry. The field MAY be added to records of type "grandfathered" 1025 and "region" according to the rules in Section 3.3. Otherwise the 1026 field MUST NOT be added to any record already in the registry. 1028 The 'Preferred-Value' field in records of type "grandfathered" and 1029 "redundant" contains whole language tags that are strongly 1030 RECOMMENDED for use in place of the record's value. In many cases, 1031 the mappings were created by deprecation of the tags during the 1032 period before this document was adopted. For example, the tag "no- 1033 nyn" was deprecated in favor of the ISO 639-1-defined language code 1034 'nn'. 1036 Records of type 'variant' MAY have more than one field of type 1037 'Prefix'. Additional fields of this type MAY be added to a 'variant' 1038 record via the registration process. 1040 Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. 1042 The field-value of the 'Prefix' field consists of a language tag 1043 whose subtags are appropriate to use with this subtag. For example, 1044 the variant subtag '1996' has a 'Prefix' field of "de". This means 1045 that tags starting with the sequence "de-" are appropriate with this 1046 subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while 1047 the tag "fr-1996" is an inappropriate choice. 1049 The field of type 'Prefix' MUST NOT be removed from any record. The 1050 field-value for this type of field MUST NOT be modified. 1052 The field 'Comments' MAY appear more than once per record. This 1053 field MAY be inserted or changed via the registration process and no 1054 guarantee of stability is provided. The content of this field is not 1055 restricted, except by the need to register the information, the 1056 suitability of the request, and by reasonable practical size 1057 limitations. 1059 The field 'Suppress-Script' MUST only appear in records whose 'Type' 1060 field-value is 'language'. This field MUST NOT appear more than one 1061 time in a record. This field indicates a script used to write the 1062 overwhelming majority of documents for the given language and that 1063 therefore adds no distinguishing information to a language tag. It 1064 helps ensure greater compatibility between the language tags 1065 generated according to the rules in this document and language tags 1066 and tag processors or consumers based on RFC 3066. For example, 1067 virtually all Icelandic documents are written in the Latin script, 1068 making the subtag 'Latn' redundant in the tag "is-Latn". 1070 3.2. Language Subtag Reviewer 1072 The Language Subtag Reviewer is appointed by the IESG for an 1073 indefinite term, subject to removal or replacement at the IESG's 1074 discretion. The Language Subtag Reviewer moderates the ietf- 1075 languages mailing list, responds to requests for registration, and 1076 performs the other registry maintenance duties described in 1077 Section 3.3. Only the Language Subtag Reviewer is permitted to 1078 request IANA to change, update, or add records to the Language Subtag 1079 Registry. 1081 The performance or decisions of the Language Subtag Reviewer MAY be 1082 appealed to the IESG under the same rules as other IETF decisions 1083 (see [RFC2026]). The IESG can reverse or overturn the decision of 1084 the Language Subtag Reviewer, provide guidance, or take other 1085 appropriate actions. 1087 3.3. Maintenance of the Registry 1089 Maintenance of the registry requires that as codes are assigned or 1090 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1091 Subtag Reviewer MUST evaluate each change, determine whether it 1092 conflicts with existing registry entries, and submit the information 1093 to IANA for inclusion in the registry. If a change takes place and 1094 the Language Subtag Reviewer does not do this in a timely manner, 1095 then any interested party MAY use the procedure in Section 3.5 to 1096 register the appropriate update. 1098 Note: The redundant and grandfathered entries together are the 1099 complete list of tags registered under [RFC3066]. The redundant tags 1100 are those that can now be formed using the subtags defined in the 1101 registry together with the rules of Section 2.2. The grandfathered 1102 entries include those that can never be legal under those same 1103 provisions plus those tags that contain subtags not yet registered 1104 or, perhaps, inappropriate for registration. 1106 The set of redundant and grandfathered tags is permanent and stable: 1107 new entries in this section MUST NOT be added and existing entries 1108 MUST NOT be removed. Records of type 'grandfathered' MAY have their 1109 type converted to 'redundant'; see item 12 in Section 3.6 for more 1110 information. The decision-making process about which tags were 1111 initially grandfathered and which were made redundant is described in 1112 [initial-registry]. 1114 RFC 3066 tags that were deprecated prior to the adoption of [RFC4646] 1115 are part of the list of grandfathered tags, and their component 1116 subtags were not included as registered variants (although they 1117 remain eligible for registration). For example, the tag "art-lojban" 1118 was deprecated in favor of the language subtag 'jbo'. 1120 The Language Subtag Reviewer MUST ensure that new subtags meet the 1121 requirements in Section 4.1 or submit an appropriate alternate subtag 1122 as described in that section. When either a change or addition to 1123 the registry is needed, the Language Subtag Reviewer MUST prepare the 1124 complete record, including all fields, and forward it to IANA for 1125 insertion into the registry. Each record being modified or inserted 1126 MUST be forwarded in a separate message. 1128 If a record represents a new subtag that does not currently exist in 1129 the registry, then the message's subject line MUST include the word 1130 "INSERT". If the record represents a change to an existing subtag, 1131 then the subject line of the message MUST include the word "MODIFY". 1132 The message MUST contain both the record for the subtag being 1133 inserted or modified and the new File-Date record. Here is an 1134 example of what the body of the message might contain: 1136 LANGUAGE SUBTAG MODIFICATION 1137 File-Date: 2005-01-02 1138 %% 1139 Type: variant 1140 Subtag: nedis 1141 Description: Natisone dialect 1142 Description: Nadiza dialect 1143 Added: 2003-10-09 1144 Prefix: sl 1145 Comments: This is a comment shown 1146 as an example. 1147 %% 1149 Figure 4: Example of a Language Subtag Modification Form 1151 Whenever an entry is created or modified in the registry, the 'File- 1152 Date' record at the start of the registry is updated to reflect the 1153 most recent modification date in the [RFC3339] "full-date" format. 1155 Before forwarding a new registration to IANA, the Language Subtag 1156 Reviewer MUST ensure that values in the 'Subtag' field match case 1157 according to the description in Section 3.1. 1159 3.4. Stability of IANA Registry Entries 1161 The stability of entries and their meaning in the registry is 1162 critical to the long-term stability of language tags. The rules in 1163 this section guarantee that a specific language tag's meaning is 1164 stable over time and will not change. 1166 These rules specifically deal with how changes to codes (including 1167 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1168 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1169 Subtag Registry. Assignments to the IANA Language Subtag Registry 1170 MUST follow the following stability rules: 1172 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1173 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1174 guaranteed to be stable over time. 1176 2. Values in the 'Description' field MUST NOT be changed in a way 1177 that would invalidate previously-existing tags. They MAY be 1178 broadened somewhat in scope, changed to add information, or 1179 adapted to the most common modern usage. For example, countries 1180 occasionally change their official names; a historical example 1181 of this would be "Upper Volta" changing to "Burkina Faso". 1183 3. Values in the field 'Prefix' MAY be added to records of type 1184 'variant' via the registration process. If a prefix is added to 1185 a record that does not contain the same primary language subtag 1186 as an existing prefix, one 'Comment' field per prefix SHOULD be 1187 added to record explaining the different usages. 1189 4. Values in the field 'Prefix' in records of type 'variant' MAY be 1190 modified, so long as the modifications broaden the set of 1191 prefixes. That is, a prefix MAY be replaced by one of its own 1192 prefixes. For example, the prefix "en-US" could be replaced by 1193 "en", but not by the prefixes "en-Latn", "fr", or "en-US-boont". 1194 If one of those prefixes were needed, a new Prefix SHOULD be 1195 registered. 1197 5. Values in the field 'Prefix' in records of type 'extlang' MUST 1198 NOT be modified. 1200 6. Values in the field 'Prefix' MUST NOT be removed. 1202 7. The field 'Comments' MAY be added, changed, modified, or removed 1203 via the registration process or any of the processes or 1204 considerations described in this section. 1206 8. The field 'Suppress-Script' MAY be added or removed via the 1207 registration process. 1209 9. Codes assigned by ISO 639-1 that do not conflict with existing 1210 two-letter primary language subtags and which have no 1211 corresponding three-letter primary or extended language subtags 1212 defined in the registry are entered into the IANA registry as 1213 new records of type 'language'. 1215 10. Codes assigned by ISO 639-2 that do not conflict with existing 1216 three-letter primary or extended language subtags are entered 1217 into the IANA registry as new records of type 'language'. 1219 11. Codes assigned by ISO 639-3 that do not conflict with existing 1220 three-letter primary or extended language subtags are entered 1221 into the IANA registry as new records. 1223 1. Codes that have a defined "macro-language" mapping at the 1224 time of their registration MUST be entered into the registry 1225 as records of type 'extlang' with a 'Prefix' field 1226 containing the appropriate prefix tag. 1228 2. Codes that represent sign languages MUST be entered into the 1229 registry as record of type 'extlang' with a 'Prefix' field 1230 that matches the Basic Language Range "sgn" (see Section 1231 3.3.1 "Basic Filtering" in [RFC4647]). 1233 3. All other codes MUST be entered into the registry as records 1234 of type 'language'. 1236 12. A record of type 'language' or 'extlang' MUST NOT be registered 1237 if there exists a record of either type with the same subtag 1238 value. For example, if an 'extlang' subtag 'foo' exists, all 1239 attempts to register a 'language' subtag 'foo' will be rejected. 1241 13. Codes assigned by ISO 15924 and ISO 3166 that do not conflict 1242 with existing subtags of the associated type and whose meaning 1243 is not the same as an existing subtag of the same type are 1244 entered into the IANA registry as new records. 1246 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1247 withdrawn by their respective maintenance or registration 1248 authority remain valid in language tags. A 'Deprecated' field 1249 containing the date of withdrawal MUST be added to the record. 1250 If a new record of the same type is added that represents a 1251 replacement value, then a 'Preferred-Value' field MAY also be 1252 added. The registration process MAY be used to add comments 1253 about the withdrawal of the code by the respective standard. 1255 Example The region code 'TL' was assigned to the country 'Timor- 1256 Leste', replacing the code 'TP' (which was assigned to 'East 1257 Timor' when it was under administration by Portugal). The 1258 subtag 'TP' remains valid in language tags, but its record 1259 contains the a 'Preferred-Value' of 'TL' and its field 1260 'Deprecated' contains the date the new code was assigned 1261 ('2004-07-06'). 1263 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1264 with existing subtags of the associated type, including subtags 1265 that are deprecated, MUST NOT be entered into the registry. The 1266 following additional considerations apply to subtag values that 1267 are reassigned: 1269 A. For ISO 639 codes, if the newly assigned code's meaning is 1270 not represented by a subtag in the IANA registry, the 1271 Language Subtag Reviewer, as described in Section 3.5, SHALL 1272 prepare a proposal for entering in the IANA registry as soon 1273 as practical a registered language subtag as an alternate 1274 value for the new code. The form of the registered language 1275 subtag will be at the discretion of the Language Subtag 1276 Reviewer and MUST conform to other restrictions on language 1277 subtags in this document. 1279 B. For all subtags whose meaning is derived from an external 1280 standard (that is, by ISO 639, ISO 15924, ISO 3166, or UN 1281 M.49), if a new meaning is assigned to an existing code and 1282 the new meaning broadens the meaning of that code, then the 1283 meaning for the associated subtag MAY be changed to match. 1284 The meaning of a subtag MUST NOT be narrowed, however, as 1285 this can result in an unknown proportion of the existing 1286 uses of a subtag becoming invalid. Note: ISO 639 1287 maintenance agency/registration authority (MA/RA) has 1288 adopted a similar stability policy. 1290 C. For ISO 15924 codes, if the newly assigned code's meaning is 1291 not represented by a subtag in the IANA registry, the 1292 Language Subtag Reviewer, as described in Section 3.5, SHALL 1293 prepare a proposal for entering in the IANA registry as soon 1294 as practical a registered variant subtag as an alternate 1295 value for the new code. The form of the registered variant 1296 subtag will be at the discretion of the Language Subtag 1297 Reviewer and MUST conform to other restrictions on variant 1298 subtags in this document. 1300 D. For ISO 3166 codes, if the newly assigned code's meaning is 1301 associated with the same UN M.49 code as another 'region' 1302 subtag, then the existing region subtag remains as the 1303 preferred value for that region and no new entry is created. 1304 A comment MAY be added to the existing region subtag 1305 indicating the relationship to the new ISO 3166 code. 1307 E. For ISO 3166 codes, if the newly assigned code's meaning is 1308 associated with a UN M.49 code that is not represented by an 1309 existing region subtag, then the Language Subtag Reviewer, 1310 as described in Section 3.5, SHALL prepare a proposal for 1311 entering the appropriate UN M.49 country code as an entry in 1312 the IANA registry. 1314 F. For ISO 3166 codes, if there is no associated UN numeric 1315 code, then the Language Subtag Reviewer SHALL petition the 1316 UN to create one. If there is no response from the UN 1317 within ninety days of the request being sent, the Language 1318 Subtag Reviewer SHALL prepare a proposal for entering in the 1319 IANA registry as soon as practical a registered variant 1320 subtag as an alternate value for the new code. The form of 1321 the registered variant subtag will be at the discretion of 1322 the Language Subtag Reviewer and MUST conform to other 1323 restrictions on variant subtags in this document. This 1324 situation is very unlikely to ever occur. 1326 16. UN M.49 has codes for both countries and areas (such as '276' 1327 for Germany) and geographical regions and sub-regions (such as 1328 '150' for Europe). UN M.49 country or area codes for which 1329 there is no corresponding ISO 3166 code SHOULD NOT be 1330 registered, except as a surrogate for an ISO 3166 code that is 1331 blocked from registration by an existing subtag. If such a code 1332 becomes necessary, then the registration authority for ISO 3166 1333 SHOULD first be petitioned to assign a code to the region. If 1334 the petition for a code assignment by ISO 3166 is refused or not 1335 acted on in a timely manner, the registration process described 1336 in Section 3.5 MAY then be used to register the corresponding UN 1337 M.49 code. This way, UN M.49 codes remain available as the 1338 value of last resort in cases where ISO 3166 reassigns a 1339 deprecated value in the registry. 1341 17. Stability provisions apply to grandfathered tags with this 1342 exception: should all of the subtags in a grandfathered tag 1343 become valid subtags in the IANA registry, then the field 'Type' 1344 in that record is changed from 'grandfathered' to 'redundant'. 1345 Note that this will not affect language tags that match the 1346 grandfathered tag, since these tags will now match valid 1347 generative subtag sequences. For example, if the subtag 'gan' 1348 in the language tag "zh-gan" were to be registered as an 1349 extended language subtag, then the grandfathered tag "zh-gan" 1350 would be deprecated (but existing content or implementations 1351 that use "zh-gan" would remain valid). 1353 3.5. Registration Procedure for Subtags 1355 The procedure given here MUST be used by anyone who wants to use a 1356 subtag not currently in the IANA Language Subtag Registry. 1358 Only subtags of type 'language' and 'variant' will be considered for 1359 independent registration of new subtags. Handling of subtags needed 1360 for stability and subtags necessary to keep the registry synchronized 1361 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1362 defined by this document are described in Section 3.3. Stability 1363 provisions are described in Section 3.4. 1365 This procedure MAY also be used to register or alter the information 1366 for the 'Description', 'Comments', 'Deprecated', or 'Prefix' fields 1367 in a subtag's record as described in Section 3.4. Changes to all 1368 other fields in the IANA registry are NOT permitted. 1370 Registering a new subtag or requesting modifications to an existing 1371 tag or subtag starts with the requester filling out the registration 1372 form reproduced below. Note that each response is not limited in 1373 size so that the request can adequately describe the registration. 1374 The fields in the "Record Requested" section SHOULD follow the 1375 requirements in Section 3.1. 1377 LANGUAGE SUBTAG REGISTRATION FORM 1378 1. Name of requester: 1379 2. E-mail address of requester: 1380 3. Record Requested: 1382 Type: 1383 Subtag: 1384 Description: 1385 Prefix: 1386 Preferred-Value: 1387 Deprecated: 1388 Suppress-Script: 1389 Comments: 1391 4. Intended meaning of the subtag: 1392 5. Reference to published description 1393 of the language (book or article): 1394 6. Any other relevant information: 1396 Figure 5: The Language Subtag Registration Form 1398 The subtag registration form MUST be sent to 1399 for a two-week review period before it can 1400 be submitted to IANA. (This is an open list and can be joined by 1401 sending a request to .) 1403 Variant subtags are usually registered for use with a particular 1404 range of language tags. For example, the subtag 'rozaj' is intended 1405 for use with language tags that start with the primary language 1406 subtag "sl", since Resian is a dialect of Slovenian. Thus, the 1407 subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" 1408 or "sl-IT-rozaj". This information is stored in the 'Prefix' field 1409 in the registry. Variant registration requests SHOULD include at 1410 least one 'Prefix' field in the registration form. 1412 Extended language subtags are reserved for future standardization. 1413 These subtags will be REQUIRED to include exactly one 'Prefix' field 1414 once they are allowed for registration. 1416 The 'Prefix' field for a given registered subtag exists in the IANA 1417 registry as a guide to usage. Additional prefixes MAY be added by 1418 filing an additional registration form. In that form, the "Any other 1419 relevant information:" field MUST indicate that it is the addition of 1420 a prefix. 1422 Requests to add a prefix to a variant subtag that imply a different 1423 semantic meaning will probably be rejected. For example, a request 1424 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1425 nedis" represented some German dialect would be rejected. The 1426 'nedis' subtag represents a particular Slovenian dialect and the 1427 additional registration would change the semantic meaning assigned to 1428 the subtag. A separate subtag SHOULD be proposed instead. 1430 The 'Description' field MUST contain a description of the tag being 1431 registered written or transcribed into the Latin script; it MAY also 1432 include a description in a non-Latin script. Non-ASCII characters 1433 MUST be escaped using the syntax described in Section 3.1. The 1434 'Description' field is used for identification purposes and doesn't 1435 necessarily represent the actual native name of the language or 1436 variation or to be in any particular language. 1438 While the 'Description' field itself is not guaranteed to be stable 1439 and errata corrections MAY be undertaken from time to time, attempts 1440 to provide translations or transcriptions of entries in the registry 1441 itself will probably be frowned upon by the community or rejected 1442 outright, as changes of this nature have an impact on the provisions 1443 in Section 3.4. 1445 When the two-week period has passed, the Language Subtag Reviewer 1446 either forwards the record to be inserted or modified to 1447 iana@iana.org according to the procedure described in Section 3.3, or 1448 rejects the request because of significant objections raised on the 1449 list or due to problems with constraints in this document (which MUST 1450 be explicitly cited). The Language Subtag Reviewer MAY also extend 1451 the review period in two-week increments to permit further 1452 discussion. The Language Subtag Reviewer MUST indicate on the list 1453 whether the registration has been accepted, rejected, or extended 1454 following each two-week period. 1456 Note that the Language Subtag Reviewer MAY raise objections on the 1457 list if he or she so desires. The important thing is that the 1458 objection MUST be made publicly. 1460 The applicant is free to modify a rejected application with 1461 additional information and submit it again; this restarts the two- 1462 week comment period. 1464 Decisions made by the Language Subtag Reviewer MAY be appealed to the 1465 IESG [RFC2028] under the same rules as other IETF decisions 1466 [RFC2026]. 1468 All approved registration forms are available online in the directory 1469 http://www.iana.org/numbers.html under "languages". 1471 Updates or changes to existing records follow the same procedure as 1472 new registrations. The Language Subtag Reviewer decides whether 1473 there is consensus to update the registration following the two week 1474 review period; normally, objections by the original registrant will 1475 carry extra weight in forming such a consensus. 1477 Registrations are permanent and stable. Once registered, subtags 1478 will not be removed from the registry and will remain a valid way in 1479 which to specify a specific language or variant. 1481 Note: The purpose of the "Description" in the registration form is to 1482 aid to people trying to verify whether a language is registered or 1483 what language or language variation a particular subtag refers to. 1484 In most cases, reference to an authoritative grammar or dictionary of 1485 that language will be useful; in cases where no such work exists, 1486 other well-known works describing that language or in that language 1487 MAY be appropriate. The Language Subtag Reviewer decides what 1488 constitutes "good enough" reference material. This requirement is 1489 not intended to exclude particular languages or dialects due to the 1490 size of the speaker population or lack of a standardized orthography. 1491 Minority languages will be considered equally on their own merits. 1493 3.6. Possibilities for Registration 1495 Possibilities for registration of subtags or information about 1496 subtags include: 1498 o Primary language subtags for languages not listed in ISO 639 that 1499 are not variants of any listed or registered language MAY be 1500 registered. At the time this document was created, there were no 1501 examples of this form of subtag. Before attempting to register a 1502 language subtag, there MUST be an attempt to register the language 1503 with ISO 639. Subtags MUST NOT be registered for languages 1504 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3, 1505 or that are under consideration by the ISO 639 maintenance or 1506 registration authorities, or that have never been attempted for 1507 registration with those authorities. If ISO 639 has previously 1508 rejected a language for registration, it is reasonable to assume 1509 that there must be additional, very compelling evidence of need 1510 before it will be registered as a primary language subtag in the 1511 IANA registry (to the extent that it is very unlikely that any 1512 subtags will be registered of this type). 1514 o Dialect or other divisions or variations within a language, its 1515 orthography, writing system, regional or historical usage, 1516 transliteration or other transformation, or distinguishing 1517 variation MAY be registered as variant subtags. An example is the 1518 'rozaj' subtag (the Resian dialect of Slovenian). 1520 o The addition or maintenance of fields (generally of an 1521 informational nature) in Tag or Subtag records as described in 1522 Section 3.1 and subject to the stability provisions in 1523 Section 3.4. This includes descriptions, comments, deprecation 1524 and preferred values for obsolete or withdrawn codes, or the 1525 addition of script or extlang information to primary language 1526 subtags. 1528 o The addition of records and related field value changes necessary 1529 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1530 UN M.49 as described in Section 3.4. 1532 Subtags proposed for registration that would cause all or part of a 1533 grandfathered tag to become redundant but whose meaning conflicts 1534 with or alters the meaning of the grandfathered tag MUST be rejected. 1536 This document leaves the decision on what subtags or changes to 1537 subtags are appropriate (or not) to the registration process 1538 described in Section 3.5. 1540 Note: four-character primary language subtags are reserved to allow 1541 for the possibility of alpha4 codes in some future addition to the 1542 ISO 639 family of standards. 1544 ISO 639 defines a maintenance agency for additions to and changes in 1545 the list of languages in ISO 639. This agency is: 1547 International Information Centre for Terminology (Infoterm) 1548 Aichholzgasse 6/12, AT-1120 1549 Wien, Austria 1550 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1552 ISO 639-2 defines a maintenance agency for additions to and changes 1553 in the list of languages in ISO 639-2. This agency is: 1555 Library of Congress 1556 Network Development and MARC Standards Office 1557 Washington, D.C. 20540 USA 1558 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1559 URL: http://www.loc.gov/standards/iso639-2 1560 ISO 639-3 defines a maintenance agency for additions to and changes 1561 in the list of languages in ISO 639-3. This agency is: 1563 SIL International 1564 ISO 639-3 Registrar 1565 7500 W. Camp Wisdom Rd. 1566 Dallas, TX 75236 USA 1567 Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546 1568 Email: iso639-3@sil.org 1569 URL: http://www.sil.org/iso639-3 1571 The maintenance agency for ISO 3166 (country codes) is: 1573 ISO 3166 Maintenance Agency 1574 c/o International Organization for Standardization 1575 Case postale 56 1576 CH-1211 Geneva 20 Switzerland 1577 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1578 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1580 The registration authority for ISO 15924 (script codes) is: 1582 Unicode Consortium Box 391476 1583 Mountain View, CA 94039-1476, USA 1584 URL: http://www.unicode.org/iso15924 1586 The Statistics Division of the United Nations Secretariat maintains 1587 the Standard Country or Area Codes for Statistical Use and can be 1588 reached at: 1590 Statistical Services Branch 1591 Statistics Division 1592 United Nations, Room DC2-1620 1593 New York, NY 10017, USA 1595 Fax: +1-212-963-0623 1596 E-mail: statistics@un.org 1597 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1599 3.7. Extensions and Extensions Registry 1601 Extension subtags are those introduced by single-character subtags 1602 ("singletons") other than 'x'. They are reserved for the generation 1603 of identifiers that contain a language component and are compatible 1604 with applications that understand language tags. 1606 The structure and form of extensions are defined by this document so 1607 that implementations can be created that are forward compatible with 1608 applications that might be created using singletons in the future. 1609 In addition, defining a mechanism for maintaining singletons will 1610 lend stability to this document by reducing the likely need for 1611 future revisions or updates. 1613 Single-character subtags are assigned by IANA using the "IETF 1614 Consensus" policy defined by [RFC2434]. This policy requires the 1615 development of an RFC, which SHALL define the name, purpose, 1616 processes, and procedures for maintaining the subtags. The 1617 maintaining or registering authority, including name, contact email, 1618 discussion list email, and URL location of the registry, MUST be 1619 indicated clearly in the RFC. The RFC MUST specify or include each 1620 of the following: 1622 o The specification MUST reference the specific version or revision 1623 of this document that governs its creation and MUST reference this 1624 section of this document. 1626 o The specification and all subtags defined by the specification 1627 MUST follow the ABNF and other rules for the formation of tags and 1628 subtags as defined in this document. In particular, it MUST 1629 specify that case is not significant and that subtags MUST NOT 1630 exceed eight characters in length. 1632 o The specification MUST specify a canonical representation. 1634 o The specification of valid subtags MUST be available over the 1635 Internet and at no cost. 1637 o The specification MUST be in the public domain or available via a 1638 royalty-free license acceptable to the IETF and specified in the 1639 RFC. 1641 o The specification MUST be versioned, and each version of the 1642 specification MUST be numbered, dated, and stable. 1644 o The specification MUST be stable. That is, extension subtags, 1645 once defined by a specification, MUST NOT be retracted or change 1646 in meaning in any substantial way. 1648 o The specification MUST include in a separate section the 1649 registration form reproduced in this section (below) to be used in 1650 registering the extension upon publication as an RFC. 1652 o IANA MUST be informed of changes to the contact information and 1653 URL for the specification. 1655 IANA will maintain a registry of allocated single-character 1656 (singleton) subtags. This registry MUST use the record-jar format 1657 described by the ABNF in Section 3.1. Upon publication of an 1658 extension as an RFC, the maintaining authority defined in the RFC 1659 MUST forward this registration form to iesg@ietf.org, who MUST 1660 forward the request to iana@iana.org. The maintaining authority of 1661 the extension MUST maintain the accuracy of the record by sending an 1662 updated full copy of the record to iana@iana.org with the subject 1663 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only 1664 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY 1665 be modified in these updates. 1667 Failure to maintain this record, maintain the corresponding registry, 1668 or meet other conditions imposed by this section of this document MAY 1669 be appealed to the IESG [RFC2028] under the same rules as other IETF 1670 decisions (see [RFC2026]) and MAY result in the authority to maintain 1671 the extension being withdrawn or reassigned by the IESG. 1672 %% 1673 Identifier: 1674 Description: 1675 Comments: 1676 Added: 1677 RFC: 1678 Authority: 1679 Contact_Email: 1680 Mailing_List: 1681 URL: 1682 %% 1684 Figure 6: Format of Records in the Language Tag Extensions Registry 1686 'Identifier' contains the single-character subtag (singleton) 1687 assigned to the extension. The Internet-Draft submitted to define 1688 the extension SHOULD specify which letter or digit to use, although 1689 the IESG MAY change the assignment when approving the RFC. 1691 'Description' contains the name and description of the extension. 1693 'Comments' is an OPTIONAL field and MAY contain a broader description 1694 of the extension. 1696 'Added' contains the date the RFC was published in the "full-date" 1697 format specified in [RFC3339]. For example: 2004-06-28 represents 1698 June 28, 2004, in the Gregorian calendar. 1700 'RFC' contains the RFC number assigned to the extension. 1702 'Authority' contains the name of the maintaining authority for the 1703 extension. 1705 'Contact_Email' contains the email address used to contact the 1706 maintaining authority. 1708 'Mailing_List' contains the URL or subscription email address of the 1709 mailing list used by the maintaining authority. 1711 'URL' contains the URL of the registry for this extension. 1713 The determination of whether an Internet-Draft meets the above 1714 conditions and the decision to grant or withhold such authority rests 1715 solely with the IESG and is subject to the normal review and appeals 1716 process associated with the RFC process. 1718 Extension authors are strongly cautioned that many (including most 1719 well-formed) processors will be unaware of any special relationships 1720 or meaning inherent in the order of extension subtags. Extension 1721 authors SHOULD avoid subtag relationships or canonicalization 1722 mechanisms that interfere with matching or with length restrictions 1723 that sometimes exist in common protocols where the extension is used. 1724 In particular, applications MAY truncate the subtags in doing 1725 matching or in fitting into limited lengths, so it is RECOMMENDED 1726 that the most significant information be in the most significant 1727 (left-most) subtags and that the specification gracefully handle 1728 truncated subtags. 1730 When a language tag is to be used in a specific, known, protocol, it 1731 is RECOMMENDED that that the language tag not contain extensions not 1732 supported by that protocol. In addition, note that some protocols 1733 MAY impose upper limits on the length of the strings used to store or 1734 transport the language tag. 1736 3.8. Update of the Language Subtag Registry 1738 Upon adoption of this document the IANA Language Subtag Registry will 1739 need an update so that it contains the complete set of subtags valid 1740 in a language tag. This collection of subtags, along with a 1741 description of the process used to create it, is described by 1742 [initial-registry]. IANA SHALL publish the updated version of the 1743 registry described by this document using the instructions and 1744 content of [initial-registry]. Once published by IANA, the 1745 maintenance procedures, rules, and registration processes described 1746 in this document will be available for new registrations or updates. 1748 Registrations that are in process under the rules defined in 1749 [RFC4646] when this document is adopted MUST be completed under the 1750 rules contained in this document. 1752 4. Formation and Processing of Language Tags 1754 This section addresses how to use the information in the registry 1755 with the tag syntax to choose, form, and process language tags. 1757 4.1. Choice of Language Tag 1759 One is sometimes faced with the choice between several possible tags 1760 for the same body of text. 1762 Interoperability is best served when all users use the same language 1763 tag in order to represent the same language. If an application has 1764 requirements that make the rules here inapplicable, then that 1765 application risks damaging interoperability. It is strongly 1766 RECOMMENDED that users not define their own rules for language tag 1767 choice. 1769 Subtags SHOULD only be used where they add useful distinguishing 1770 information; extraneous subtags interfere with the meaning, 1771 understanding, and processing of language tags. In particular, users 1772 and implementations SHOULD follow the 'Prefix' and 'Suppress-Script' 1773 fields in the registry (defined in Section 3.1): these fields provide 1774 guidance on when specific additional subtags SHOULD (and SHOULD NOT) 1775 be used in a language tag. 1777 Of particular note, many applications can benefit from the use of 1778 script subtags in language tags, as long as the use is consistent for 1779 a given context. Script subtags were not formally defined in RFC 1780 3066 and their use can affect matching and subtag identification by 1781 implementations of RFC 3066, as these subtags appear between the 1782 primary language and region subtags. For example, if a user requests 1783 content in an implementation of Section 2.5 of [RFC3066] using the 1784 language range "en-US", content labeled "en-Latn-US" will not match 1785 the request. Therefore, it is important to know when script subtags 1786 will customarily be used and when they ought not be used. In the 1787 registry, the Suppress-Script field helps ensure greater 1788 compatibility between the language tags generated according to the 1789 rules in this document and language tags and tag processors or 1790 consumers based on RFC 3066 by defining when users SHOULD NOT include 1791 a script subtag with a particular primary language subtag. 1793 Extended language subtags (type 'extlang' in the registry; see 1794 Section 3.1) also appear between the primary language and region 1795 subtags. Applications might benefit from their judicious use in 1796 forming language tags. [[ guidelines here?? ]] 1798 Standards, protocols, and applications that reference this document 1799 normatively but apply different rules to the ones given in this 1800 section MUST specify how the procedure varies from the one given 1801 here. 1803 The choice of subtags used to form a language tag SHOULD be guided by 1804 the following rules: 1806 1. Use as precise a tag as possible, but no more specific than is 1807 justified. Avoid using subtags that are not important for 1808 distinguishing content in an application. 1810 * For example, 'de' might suffice for tagging an email written 1811 in German, while "de-CH-1996" is probably unnecessarily 1812 precise for such a task. 1814 2. The script subtag SHOULD NOT be used to form language tags unless 1815 the script adds some distinguishing information to the tag. The 1816 field 'Suppress-Script' in the primary language record in the 1817 registry indicates which script subtags do not add distinguishing 1818 information for most applications. 1820 * For example, the subtag 'Latn' should not be used with the 1821 primary language 'en' because nearly all English documents are 1822 written in the Latin script and it adds no distinguishing 1823 information. However, if a document were written in English 1824 mixing Latin script with another script such as Braille 1825 ('Brai'), then it might be appropriate to choose to indicate 1826 both scripts to aid in content selection, such as the 1827 application of a style sheet. 1829 3. If a tag or subtag has a 'Preferred-Value' field in its registry 1830 entry, then the value of that field SHOULD be used to form the 1831 language tag in preference to the tag or subtag in which the 1832 preferred value appears. 1834 * For example, use 'he' for Hebrew in preference to 'iw'. 1836 4. The 'und' (Undetermined) primary language subtag SHOULD NOT be 1837 used to label content, even if the language is unknown. Omitting 1838 the language tag altogether is preferred to using a tag with a 1839 primary language subtag of 'und'. The 'und' subtag MAY be useful 1840 for protocols that require a language tag to be provided. The 1841 'und' subtag MAY also be useful when matching language tags in 1842 certain situations. 1844 5. The 'mul' (Multiple) primary language subtag SHOULD NOT be used 1845 whenever the protocol allows the separate tags for multiple 1846 languages, as is the case for the Content-Language header in 1847 HTTP. The 'mul' subtag conveys little useful information: 1849 content in multiple languages SHOULD individually tag the 1850 languages where they appear or otherwise indicate the actual 1851 language in preference to the 'mul' subtag. 1853 6. The same variant subtag SHOULD NOT be used more than once within 1854 a language tag. 1856 * For example, do not use "de-DE-1901-1901". 1858 To ensure consistent backward compatibility, this document contains 1859 several provisions to account for potential instability in the 1860 standards used to define the subtags that make up language tags. 1861 These provisions mean that no language tag created under the rules in 1862 this document will become obsolete. 1864 4.2. Meaning of the Language Tag 1866 The relationship between the tag and the information it relates to is 1867 defined by the context in which the tag appears. Accordingly, this 1868 section gives only possible examples of its usage. 1870 o For a single information object, the associated language tags 1871 might be interpreted as the set of languages that is necessary for 1872 a complete comprehension of the complete object. Example: Plain 1873 text documents. 1875 o For an aggregation of information objects, the associated language 1876 tags could be taken as the set of languages used inside components 1877 of that aggregation. Examples: Document stores and libraries. 1879 o For information objects whose purpose is to provide alternatives, 1880 the associated language tags could be regarded as a hint that the 1881 content is provided in several languages and that one has to 1882 inspect each of the alternatives in order to find its language or 1883 languages. In this case, the presence of multiple tags might not 1884 mean that one needs to be multi-lingual to get complete 1885 understanding of the document. Example: MIME multipart/ 1886 alternative. 1888 o In markup languages, such as HTML and XML, language information 1889 can be added to each part of the document identified by the markup 1890 structure (including the whole document itself). For example, one 1891 could write C'est la vie. inside a 1892 Norwegian document; the Norwegian-speaking user could then access 1893 a French-Norwegian dictionary to find out what the marked section 1894 meant. If the user were listening to that document through a 1895 speech synthesis interface, this formation could be used to signal 1896 the synthesizer to appropriately apply French text-to-speech 1897 pronunciation rules to that span of text, instead of applying the 1898 inappropriate Norwegian rules. 1900 Language tags are related when they contain a similar sequence of 1901 subtags. For example, if a language tag B contains language tag A as 1902 a prefix, then B is typically "narrower" or "more specific" than A. 1903 Thus, "zh-Hant-TW" is more specific than "zh-Hant". 1905 This relationship is not guaranteed in all cases: specifically, 1906 languages that begin with the same sequence of subtags are NOT 1907 guaranteed to be mutually intelligible, although they might be. For 1908 example, the tag "az" shares a prefix with both "az-Latn" 1909 (Azerbaijani written using the Latin script) and "az-Cyrl" 1910 (Azerbaijani written using the Cyrillic script). A person fluent in 1911 one script might not be able to read the other, even though the text 1912 might be identical. Content tagged as "az" most probably is written 1913 in just one script and thus might not be intelligible to a reader 1914 familiar with the other script. 1916 4.3. Length Considerations 1918 There is no defined upper limit on the size of language tags. While 1919 historically most language tags have consisted of language and region 1920 subtags with a combined total length of up to six characters, larger 1921 tags have always been both possible and actually appeared in use. 1923 Neither the language tag syntax nor other requirements in this 1924 document impose a fixed upper limit on the number of subtags in a 1925 language tag (and thus an upper bound on the size of a tag). The 1926 language tag syntax suggests that, depending on the specific 1927 language, more subtags (and thus a longer tag) are sometimes 1928 necessary to completely identify the language for certain 1929 applications; thus, it is possible to envision long or complex subtag 1930 sequences. 1932 4.3.1. Working with Limited Buffer Sizes 1934 Some applications and protocols are forced to allocate fixed buffer 1935 sizes or otherwise limit the length of a language tag. A conformant 1936 implementation or specification MAY refuse to support the storage of 1937 language tags that exceed a specified length. Any such limitation 1938 SHOULD be clearly documented, and such documentation SHOULD include 1939 what happens to longer tags (for example, whether an error value is 1940 generated or the language tag is truncated). A protocol that allows 1941 tags to be truncated at an arbitrary limit, without giving any 1942 indication of what that limit is, has the potential for causing harm 1943 by changing the meaning of tags in substantial ways. 1945 In practice, most language tags do not require more than a few 1946 subtags and will not approach reasonably sized buffer limitations; 1947 see Section 4.1. 1949 Some specifications or protocols have limits on tag length but do not 1950 have a fixed length limitation. For example, [RFC2231] has no 1951 explicit length limitation: the length available for the language tag 1952 is constrained by the length of other header components (such as the 1953 charset's name) coupled with the 76-character limit in [RFC2047]. 1954 Thus, the "limit" might be 50 or more characters, but it could 1955 potentially be quite small. 1957 The considerations for assigning a buffer limit are: 1959 Implementations SHOULD NOT truncate language tags unless the 1960 meaning of the tag is purposefully being changed, or unless the 1961 tag does not fit into a limited buffer size specified by a 1962 protocol for storage or transmission. 1964 Implementations SHOULD warn the user when a tag is truncated since 1965 truncation changes the semantic meaning of the tag. 1967 Implementations of protocols or specifications that are space 1968 constrained but do not have a fixed limit SHOULD use the longest 1969 possible tag in preference to truncation. 1971 Protocols or specifications that specify limited buffer sizes for 1972 language tags MUST allow for language tags of up to 33 characters. 1974 Protocols or specifications that specify limited buffer sizes for 1975 language tags SHOULD allow for language tags of at least 42 1976 characters. 1978 The following illustration shows how the 42-character recommendation 1979 was derived. The combination of language and extended language 1980 subtags was chosen for future compatibility. At up to 15 characters, 1981 this combination is longer than the longest possible primary language 1982 subtag (8 characters): 1984 language = 3 (ISO 639-2; ISO 639-1 requires 2) 1985 extlang1 = 4 (each subsequent subtag includes '-') 1986 extlang2 = 4 (unlikely: needs prefix="language-extlang1") 1987 extlang3 = 4 (extremely unlikely) 1988 script = 5 (if not suppressed: see Section 4.1) 1989 region = 4 (UN M.49; ISO 3166 requires 3) 1990 variant1 = 9 (MUST have language as a prefix) 1991 variant2 = 9 (MUST have language-variant1 as a prefix) 1993 total = 42 characters 1995 Figure 7: Derivation of the Limit on Tag Length 1997 4.3.2. Truncation of Language Tags 1999 Truncation of a language tag alters the meaning of the tag, and thus 2000 SHOULD be avoided. However, truncation of language tags is sometimes 2001 necessary due to limited buffer sizes. Such truncation MUST NOT 2002 permit a subtag to be chopped off in the middle or the formation of 2003 invalid tags (for example, one ending with the "-" character). 2005 This means that applications or protocols that truncate tags MUST do 2006 so by progressively removing subtags along with their preceding "-" 2007 from the right side of the language tag until the tag is short enough 2008 for the given buffer. If the resulting tag ends with a single- 2009 character subtag, that subtag and its preceding "-" MUST also be 2010 removed. For example: 2012 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 2013 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 2014 2. zh-Latn-CN-variant1-a-extend1 2015 3. zh-Latn-CN-variant1 2016 4. zh-Latn-CN 2017 5. zh-Latn 2018 6. zh 2020 Figure 8: Example of Tag Truncation 2022 4.4. Canonicalization of Language Tags 2024 Since a particular language tag is sometimes used by many processes, 2025 language tags SHOULD always be created or generated in a canonical 2026 form. 2028 A language tag is in canonical form when: 2030 1. The tag is well-formed according the rules in Section 2.1 and 2031 Section 2.2. 2033 2. Subtags of type 'Region' that have a Preferred-Value mapping in 2034 the IANA registry (see Section 3.1) SHOULD be replaced with their 2035 mapped value. Note: In rare cases, the mapped value will also 2036 have a Preferred-Value. 2038 3. Redundant or grandfathered tags that have a Preferred-Value 2039 mapping in the IANA registry (see Section 3.1) MUST be replaced 2040 with their mapped value. These items either are deprecated 2041 mappings created before the adoption of this document (such as 2042 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 2043 the result of later registrations or additions to this document 2044 (for example, "zh-guoyu" might be mapped to a language-extlang 2045 combination such as "zh-cmn" by some future update of this 2046 document). 2048 4. Other subtags that have a Preferred-Value mapping in the IANA 2049 registry (see Section 3.1) MUST be replaced with their mapped 2050 value. These items consist entirely of clerical corrections to 2051 ISO 639-1 in which the deprecated subtags have been maintained 2052 for compatibility purposes. 2054 5. If more than one extension subtag sequence exists, the extension 2055 sequences are ordered into case-insensitive ASCII order by 2056 singleton subtag. 2058 Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical 2059 form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in 2060 canonical form. 2062 Example: The language tag "en-BU" (English as used in Burma) is not 2063 canonical because the 'BU' subtag has a canonical mapping to 'MM' 2064 (Myanmar), although the tag "en-BU" maintains its validity. 2066 Canonicalization of language tags does not imply anything about the 2067 use of upper or lowercase letters when processing or comparing 2068 subtags (and as described in Section 2.1). All comparisons MUST be 2069 performed in a case-insensitive manner. 2071 When performing canonicalization of language tags, processors MAY 2072 regularize the case of the subtags (that is, this process is 2073 OPTIONAL), following the case used in the registry. Note that this 2074 corresponds to the following casing rules: uppercase all non-initial 2075 two-letter subtags; titlecase all non-initial four-letter subtags; 2076 lowercase everything else. 2078 Note: Case folding of ASCII letters in certain locales, unless 2079 carefully handled, sometimes produces non-ASCII character values. 2080 The Unicode Character Database file "SpecialCasing.txt" defines the 2081 specific cases that are known to cause problems with this. In 2082 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2083 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2084 Implementers SHOULD specify a locale-neutral casing operation to 2085 ensure that case folding of subtags does not produce this value, 2086 which is illegal in language tags. For example, if one were to 2087 uppercase the region subtag 'in' using Turkish locale rules, the 2088 sequence U+0130 U+004E would result instead of the expected 'IN'. 2090 Note: if the field 'Deprecated' appears in a registry record without 2091 an accompanying 'Preferred-Value' field, then that tag or subtag is 2092 deprecated without a replacement. Validating processors SHOULD NOT 2093 generate tags that include these values, although the values are 2094 canonical when they appear in a language tag. 2096 An extension MUST define any relationships that exist between the 2097 various subtags in the extension and thus MAY define an alternate 2098 canonicalization scheme for the extension's subtags. Extensions MAY 2099 define how the order of the extension's subtags are interpreted. For 2100 example, an extension could define that its subtags are in canonical 2101 order when the subtags are placed into ASCII order: that is, "en-a- 2102 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2103 define that the order of the subtags influences their semantic 2104 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2105 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2106 so that they are tolerant of the typical processes described in 2107 Section 3.7. 2109 4.5. Considerations for Private Use Subtags 2111 Private use subtags, like all other subtags, MUST conform to the 2112 format and content constraints in the ABNF. Private use subtags have 2113 no meaning outside the private agreement between the parties that 2114 intend to use or exchange language tags that employ them. The same 2115 subtags MAY be used with a different meaning under a separate private 2116 agreement. They SHOULD NOT be used where alternatives exist and 2117 SHOULD NOT be used in content or protocols intended for general use. 2119 Private use subtags are simply useless for information exchange 2120 without prior arrangement. The value and semantic meaning of private 2121 use tags and of the subtags used within such a language tag are not 2122 defined by this document. 2124 Subtags defined in the IANA registry as having a specific private use 2125 meaning convey more information that a purely private use tag 2126 prefixed by the singleton subtag 'x'. For applications, this 2127 additional information MAY be useful. 2129 For example, the region subtags 'AA', 'ZZ', and in the ranges 2130 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2131 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2132 great deal of public, interchangeable information about the language 2133 material (that it is Chinese in the simplified Chinese script and is 2134 suitable for some geographic region 'XQ'). While the precise 2135 geographic region is not known outside of private agreement, the tag 2136 conveys far more information than an opaque tag such as "x-someLang", 2137 which contains no information about the language subtag or script 2138 subtag outside of the private agreement. 2140 However, in some cases content tagged with private use subtags MAY 2141 interact with other systems in a different and possibly unsuitable 2142 manner compared to tags that use opaque, privately defined subtags, 2143 so the choice of the best approach sometimes depends on the 2144 particular domain in question. 2146 5. IANA Considerations 2148 This section deals with the processes and requirements necessary for 2149 IANA to undertake to maintain the subtag and extension registries as 2150 defined by this document and in accordance with the requirements of 2151 [RFC2434]. 2153 The impact on the IANA maintainers of the two registries defined by 2154 this document will be a small increase in the frequency of new 2155 entries or updates. 2157 5.1. Language Subtag Registry 2159 Upon adoption of this document, IANA SHALL update the registry using 2160 instructions and content provided in a companion document: [initial- 2161 registry]. The criteria and process for selecting the updated set of 2162 records are described in that document. The updated set of records 2163 represents no impact on IANA, since the work to create it will be 2164 performed externally. 2166 Future work on the Language Subtag Registry SHALL be limited to 2167 inserting or replacing whole records preformatted for IANA by the 2168 Language Subtag Reviewer as described in Section 3.3 of this document 2169 and archiving the forwarded registration form. 2171 Each record MUST be sent to iana@iana.org with a subject line 2172 indicating whether the enclosed record is an insertion of a new 2173 record (indicated by the word "INSERT" in the subject line) or a 2174 replacement of an existing record (indicated by the word "MODIFY" in 2175 the subject line). Records MUST NOT be deleted from the registry. 2176 IANA MUST place any inserted or modified records into the appropriate 2177 section of the language subtag registry, grouping the records by 2178 their 'Type' field. Inserted records MAY be placed anywhere in the 2179 appropriate section; there is no guarantee of the order of the 2180 records beyond grouping them together by 'Type'. Modified records 2181 MUST overwrite the record they replace. 2183 Included in any request to insert or modify records MUST be a new 2184 File-Date record. This record MUST be placed first in the registry. 2185 In the event that the File-Date record present in the registry has a 2186 later date than the record being inserted or modified, the existing 2187 record MUST be preserved. 2189 5.2. Extensions Registry 2191 The Language Tag Extensions Registry can contain at most 35 records 2192 and thus changes to this registry are expected to be very infrequent. 2194 Future work by IANA on the Language Tag Extensions Registry is 2195 limited to two cases. First, the IESG MAY request that new records 2196 be inserted into this registry from time to time. These requests 2197 MUST include the record to insert in the exact format described in 2198 Section 3.7. In addition, there MAY be occasional requests from the 2199 maintaining authority for a specific extension to update the contact 2200 information or URLs in the record. These requests MUST include the 2201 complete, updated record. IANA is not responsible for validating the 2202 information provided, only that it is properly formatted. It should 2203 reasonably be seen to come from the maintaining authority named in 2204 the record present in the registry. 2206 6. Security Considerations 2208 Language tags used in content negotiation, like any other information 2209 exchanged on the Internet, might be a source of concern because they 2210 might be used to infer the nationality of the sender, and thus 2211 identify potential targets for surveillance. 2213 This is a special case of the general problem that anything sent is 2214 visible to the receiving party and possibly to third parties as well. 2215 It is useful to be aware that such concerns can exist in some cases. 2217 The evaluation of the exact magnitude of the threat, and any possible 2218 countermeasures, is left to each application protocol (see BCP 72 2219 [RFC3552] for best current practice guidance on security threats and 2220 defenses). 2222 The language tag associated with a particular information item is of 2223 no consequence whatsoever in determining whether that content might 2224 contain possible homographs. The fact that a text is tagged as being 2225 in one language or using a particular script subtag provides no 2226 assurance whatsoever that it does not contain characters from scripts 2227 other than the one(s) associated with or specified by that language 2228 tag. 2230 Since there is no limit to the number of variant, private use, and 2231 extension subtags, and consequently no limit on the possible length 2232 of a tag, implementations need to guard against buffer overflow 2233 attacks. See Section 4.3 for details on language tag truncation, 2234 which can occur as a consequence of defenses against buffer overflow. 2236 Although the specification of valid subtags for an extension (see 2237 Section 3.7) MUST be available over the Internet, implementations 2238 SHOULD NOT mechanically depend on it being always accessible, to 2239 prevent denial-of-service attacks. 2241 7. Character Set Considerations 2243 The syntax in this document requires that language tags use only the 2244 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2245 character sets, so the composition of language tags should not have 2246 any character set issues. 2248 Rendering of characters based on the content of a language tag is not 2249 addressed in this memo. Historically, some languages have relied on 2250 the use of specific character sets or other information in order to 2251 infer how a specific character should be rendered (notably this 2252 applies to language- and culture-specific variations of Han 2253 ideographs as used in Japanese, Chinese, and Korean). When language 2254 tags are applied to spans of text, rendering engines sometimes use 2255 that information in deciding which font to use in the absence of 2256 other information, particularly where languages with distinct writing 2257 traditions use the same characters. 2259 8. Changes from RFC 4646 2261 The main goal for this revision of this document was to incorporate 2262 ISO 639-3 and its attendent set of language codes into the IANA 2263 Language Subtag Registry, permitting the identification of many more 2264 languages and dialects than previously supported. 2266 The specific changes in this document to meet these goals are: 2268 o Defines the incorporation of ISO 639-3. 2270 9. References 2272 9.1. Normative References 2274 [ISO10646] 2275 International Organization for Standardization, "ISO/IEC 2276 10646:2003. Information technology -- Universal Multiple- 2277 Octet Coded Character Set (UCS)", 2003. 2279 [ISO15924] 2280 International Organization for Standardization, "ISO 2281 15924:2004. Information and documentation -- Codes for the 2282 representation of names of scripts", January 2004. 2284 [ISO3166-1] 2285 International Organization for Standardization, "ISO 3166- 2286 1:1997. Codes for the representation of names of countries 2287 and their subdivisions -- Part 1: Country codes", 1997. 2289 [ISO639-1] 2290 International Organization for Standardization, "ISO 639- 2291 1:2002. Codes for the representation of names of languages 2292 -- Part 1: Alpha-2 code", 2002. 2294 [ISO639-2] 2295 International Organization for Standardization, "ISO 639- 2296 2:1998. Codes for the representation of names of languages 2297 -- Part 2: Alpha-3 code, first edition", 1998. 2299 [ISO646] International Organization for Standardization, "ISO/IEC 2300 646:1991, Information technology -- ISO 7-bit coded 2301 character set for information interchange.", 1991. 2303 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2304 3", BCP 9, RFC 2026, October 1996. 2306 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2307 the IETF Standards Process", BCP 11, RFC 2028, 2308 October 1996. 2310 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2311 Requirement Levels", BCP 14, RFC 2119, March 1997. 2313 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2314 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2315 October 1998. 2317 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2318 Understanding Concerning the Technical Work of the 2319 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2321 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2322 Timestamps", RFC 3339, July 2002. 2324 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 2325 Specifications: ABNF", RFC 4234, October 2005. 2327 [RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of Language 2328 Tags", September 2006, 2329 . 2331 [UN_M.49] Statistics Division, United Nations, "Standard Country or 2332 Area Codes for Statistical Use", UN Standard Country or 2333 Area Codes for Statistical Use, Revision 4 (United Nations 2334 publication, Sales No. 98.XVII.9, June 1999. 2336 9.2. Informative References 2338 [RFC1766] Alvestrand, H., "Tags for the Identification of 2339 Languages", RFC 1766, March 1995. 2341 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2342 Part Three: Message Header Extensions for Non-ASCII Text", 2343 RFC 2047, November 1996. 2345 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2346 Word Extensions: Character Sets, Languages, and 2347 Continuations", RFC 2231, November 1997. 2349 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2350 10646", RFC 2781, February 2000. 2352 [RFC3066] Alvestrand, H., "Tags for the Identification of 2353 Languages", BCP 47, RFC 3066, January 2001. 2355 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2356 Text on Security Considerations", BCP 72, RFC 3552, 2357 July 2003. 2359 [RFC4646] Phillips, A., Ed. and M. Davis, Ed., "Tags for the 2360 Identification of Languages", September 2006, 2361 . 2363 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2364 Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003. 2365 ISBN 0-321-49081-0)", January 2007. 2367 [XML10] Bray (et al), T., "Extensible Markup Language (XML) 1.0", 2368 02 2004. 2370 [XMLSchema] 2371 Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part 2: 2372 Datatypes Second Edition", 10 2004, < 2373 http://www.w3.org/TR/xmlschema-2/>. 2375 [initial-registry] 2376 Ewell, D., Ed., "Initial Language Subtag Registry", 2377 June 2005, . 2380 [iso639.prin] 2381 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2382 Committee: Working principles for ISO 639 maintenance", 2383 March 2000, 2384 . 2387 [record-jar] 2388 Raymond, E., "The Art of Unix Programming", 2003, 2389 . 2391 Appendix A. Acknowledgements 2393 Any list of contributors is bound to be incomplete; please regard the 2394 following as only a selection from the group of people who have 2395 contributed to make this document what it is today. 2397 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the 2398 precursors of this document, made enormous contributions directly or 2399 indirectly to this document and are generally responsible for the 2400 success of language tags. 2402 The following people contributed to this document: 2404 Stephane Bortzmeyer, Peter Constable, John Cowan, Frank Ellerman, 2405 Randy Presuhn, and many, many others. 2407 Very special thanks must go to Harald Tveit Alvestrand, who 2408 originated RFCs 1766 and 3066, and without whom this document would 2409 not have been possible. Special thanks must go to Michael Everson, 2410 who has served as Language Subtag Reviewer for almost the complete 2411 period since the publication of RFC 1766. Special thanks to Doug 2412 Ewell, for his production of the first complete subtag registry, and 2413 his work in producing a test parser for verifying language tags. 2415 Appendix B. Examples of Language Tags (Informative) 2417 Simple language subtag: 2419 de (German) 2421 fr (French) 2423 ja (Japanese) 2425 i-enochian (example of a grandfathered tag) 2427 Language subtag plus Script subtag: 2429 zh-Hant (Chinese written using the Traditional Chinese script) 2431 zh-Hans (Chinese written using the Simplified Chinese script) 2433 sr-Cyrl (Serbian written using the Cyrillic script) 2435 sr-Latn (Serbian written using the Latin script) 2437 Language-Script-Region: 2439 zh-Hans-CN (Chinese written using the Simplified script as used in 2440 mainland China) 2442 sr-Latn-CS (Serbian written using the Latin script as used in 2443 Serbia and Montenegro) 2445 Language-Variant: 2447 sl-rozaj (Resian dialect of Slovenian 2449 sl-nedis (Nadiza dialect of Slovenian) 2451 Language-Region-Variant: 2453 de-CH-1901 (German as used in Switzerland using the 1901 variant 2454 [orthography]) 2456 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 2458 Language-Script-Region-Variant: 2460 sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the 2461 Latin script as used in Italy. Note that this tag is NOT 2462 RECOMMENDED because subtag 'sl' has a Suppress-Script value of 2463 'Latn') 2465 Language-Region: 2467 de-DE (German for Germany) 2469 en-US (English as used in the United States) 2471 es-419 (Spanish appropriate for the Latin America and Caribbean 2472 region using the UN region code) 2474 Private use subtags: 2476 de-CH-x-phonebk 2478 az-Arab-x-AZE-derbend 2480 Extended language subtags (examples ONLY: extended languages MUST be 2481 defined by revision or update to this document): 2483 zh-min 2485 zh-min-nan-Hant-CN 2487 Private use registry values: 2489 x-whatever (private use using the singleton 'x') 2491 qaa-Qaaa-QM-x-southern (all private tags) 2493 de-Qaaa (German, with a private script) 2495 sr-Latn-QM (Serbian, Latin-script, private region) 2497 sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro) 2499 Tags that use extensions (examples ONLY: extensions MUST be defined 2500 by revision or update to this document or by RFC): 2502 en-US-u-islamCal 2504 zh-CN-a-myExt-x-private 2505 en-a-myExt-b-another 2507 Some Invalid Tags: 2509 de-419-DE (two region tags) 2511 a-DE (use of a single-character subtag in primary position; note 2512 that there are a few grandfathered tags that start with "i-" that 2513 are valid) 2515 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter 2516 prefix) 2518 Authors' Addresses 2520 Addison Phillips (editor) 2521 Yahoo! Inc. 2523 Email: addison@inter-locale.com 2524 URI: http://www.inter-locale.com 2526 Mark Davis (editor) 2527 Google 2529 Email: mark.davis@macchiato.com or mark.davis@google.com 2531 Intellectual Property Statement 2533 The IETF takes no position regarding the validity or scope of any 2534 Intellectual Property Rights or other rights that might be claimed to 2535 pertain to the implementation or use of the technology described in 2536 this document or the extent to which any license under such rights 2537 might or might not be available; nor does it represent that it has 2538 made any independent effort to identify any such rights. Information 2539 on the procedures with respect to rights in RFC documents can be 2540 found in BCP 78 and BCP 79. 2542 Copies of IPR disclosures made to the IETF Secretariat and any 2543 assurances of licenses to be made available, or the result of an 2544 attempt made to obtain a general license or permission for the use of 2545 such proprietary rights by implementers or users of this 2546 specification can be obtained from the IETF on-line IPR repository at 2547 http://www.ietf.org/ipr. 2549 The IETF invites any interested party to bring to its attention any 2550 copyrights, patents or patent applications, or other proprietary 2551 rights that may cover technology that may be required to implement 2552 this standard. Please address the information to the IETF at 2553 ietf-ipr@ietf.org. 2555 Disclaimer of Validity 2557 This document and the information contained herein are provided on an 2558 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2559 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2560 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2561 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2562 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2563 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2565 Copyright Statement 2567 Copyright (C) The Internet Society (2006). This document is subject 2568 to the rights, licenses and restrictions contained in BCP 78, and 2569 except as set forth therein, the authors retain all their rights. 2571 Acknowledgment 2573 Funding for the RFC Editor function is currently provided by the 2574 Internet Society.