idnits 2.17.1 draft-ietf-ltru-registry-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2804. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2781. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2788. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2794. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The abstract seems to indicate that this document obsoletes RFC3066, but the header doesn't have an 'Obsoletes:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 139 has weird spacing: '... being spoke...' == Line 767 has weird spacing: '...logical line ...' == Line 768 has weird spacing: '...prising a fie...' == Line 769 has weird spacing: '...ld-body porti...' == Line 770 has weird spacing: '... this conce...' == (14 more instances...) == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The tags and their subtags, including private-use and extensions, are to be treated as case insensitive: there exist conventions for the capitalization of some of the subtags, but these MUST not be taken to carry meaning. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: Note that 'Preferred-Value' mappings in records of type 'region' MAY NOT represent exactly the same meaning as the original value. There are many reasons for a country code to be changed and the effect this has on the formation of language tags will depend on the nature of the change in question. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 23, 2005) is 6881 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC1766' is defined on line 2469, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2781 ** Downref: Normative reference to an Informational RFC: RFC 2860 -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) Summary: 7 errors (**), 0 flaws (~~), 12 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Quest Software 4 Expires: December 25, 2005 M. Davis, Ed. 5 IBM 6 June 23, 2005 8 Tags for Identifying Languages 9 draft-ietf-ltru-registry-06 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on December 25, 2005. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). 40 Abstract 42 This document describes the structure, content, construction, and 43 semantics of language tags for use in cases where it is desirable to 44 indicate the language used in an information object. It also 45 describes how to register values for use in language tags and the 46 creation of user defined extensions for private interchange. This 47 document obsoletes RFC 3066 (which replaced RFC 1766). 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4 53 2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.2 Language Subtag Sources and Interpretation . . . . . . . . 6 55 2.2.1 Primary Language Subtag . . . . . . . . . . . . . . . 7 56 2.2.2 Extended Language Subtags . . . . . . . . . . . . . . 9 57 2.2.3 Script Subtag . . . . . . . . . . . . . . . . . . . . 10 58 2.2.4 Region Subtag . . . . . . . . . . . . . . . . . . . . 11 59 2.2.5 Variant Subtags . . . . . . . . . . . . . . . . . . . 12 60 2.2.6 Extension Subtags . . . . . . . . . . . . . . . . . . 13 61 2.2.7 Private Use Subtags . . . . . . . . . . . . . . . . . 15 62 2.2.8 Pre-Existing RFC 3066 Registrations . . . . . . . . . 15 63 2.2.9 Classes of Conformance . . . . . . . . . . . . . . . . 15 64 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 17 65 3.1 Format of the IANA Language Subtag Registry . . . . . . . 17 66 3.2 Maintenance of the Registry . . . . . . . . . . . . . . . 22 67 3.3 Stability of IANA Registry Entries . . . . . . . . . . . . 23 68 3.4 Registration Procedure for Subtags . . . . . . . . . . . . 27 69 3.5 Possibilities for Registration . . . . . . . . . . . . . . 30 70 3.6 Extensions and Extensions Namespace . . . . . . . . . . . 31 71 3.7 Initialization of the Registry . . . . . . . . . . . . . . 34 72 4. Formation and Processing of Language Tags . . . . . . . . . . 38 73 4.1 Choice of Language Tag . . . . . . . . . . . . . . . . . . 38 74 4.2 Meaning of the Language Tag . . . . . . . . . . . . . . . 40 75 4.3 Length Considerations . . . . . . . . . . . . . . . . . . 41 76 4.3.1 Working with Limited Buffer Sizes . . . . . . . . . . 41 77 4.3.2 Truncation of Language Tags . . . . . . . . . . . . . 43 78 4.4 Canonicalization of Language Tags . . . . . . . . . . . . 43 79 4.5 Considerations for Private Use Subtags . . . . . . . . . . 45 80 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 47 81 6. Security Considerations . . . . . . . . . . . . . . . . . . . 48 82 7. Character Set Considerations . . . . . . . . . . . . . . . . . 49 83 8. Changes from RFC 3066 . . . . . . . . . . . . . . . . . . . . 50 84 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 54 85 9.1 Normative References . . . . . . . . . . . . . . . . . . . 54 86 9.2 Informative References . . . . . . . . . . . . . . . . . . 55 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 56 88 A. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 57 89 B. Examples of Language Tags (Informative) . . . . . . . . . . . 58 90 C. Example Registry . . . . . . . . . . . . . . . . . . . . . . . 61 91 Intellectual Property and Copyright Statements . . . . . . . . 64 93 1. Introduction 95 Human beings on our planet have, past and present, used a number of 96 languages. There are many reasons why one would want to identify the 97 language used when presenting or requesting information. 99 User's language preferences often need to be identified so that 100 appropriate processing can be applied. For example, the user's 101 language preferences in a Web browser can be used to select Web pages 102 appropriately. Language preferences can also be used to select among 103 tools (such as dictionaries) to assist in the processing or 104 understanding of content in different languages. 106 In addition, knowledge about the particular language used by some 107 piece of information content might be useful or even required by some 108 types of processing; for example spell-checking, computer-synthesized 109 speech, Braille transcription, or high-quality print renderings. 111 One means of indicating the language used is by labeling the 112 information content with an identifier or "tag". These tags can be 113 used to specify user preferences when selecting information content, 114 or for labeling additional attributes of content and associated 115 resources. 117 Tags can also be used to indicate additional language attributes of 118 content. For example, indicating specific information about the 119 dialect, writing system, or orthography used in a document or 120 resource may enable the user to obtain information in a form that 121 they can understand, or important in processing or rendering the 122 given content into an appropriate form or style. 124 This document specifies a particular identifier mechanism (the 125 language tag) and a registration function for values to be used to 126 form tags. It also defines a mechanism for private use values and 127 future extension. 129 This document replaces RFC 3066, which replaced RFC 1766. For a list 130 of changes in this document, see Section 8. 132 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 133 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 134 document are to be interpreted as described in [RFC2119]. 136 2. The Language Tag 138 The language tag always defines a language as used (which includes 139 being spoken, written, signed, or otherwise signaled) by human 140 beings for communication of information to other human beings. 141 Computer languages such as programming languages are explicitly 142 excluded. 144 2.1 Syntax 146 The language tag is composed of one or more parts or "subtags". Each 147 subtag consists of a sequence of alpha-numeric characters. Subtags 148 are distinguished and separated from one another by a hyphen ("-"). 149 A language tag consists of a "primary language" subtag and a 150 (possibly empty) series of subsequent subtags, each of which refines 151 or narrows the range of language identified by the overall tag. 153 Each type of subtag is distinguished by length, position in the tag, 154 and content: subtags can be recognized solely by these features. 155 This makes it possible to construct a parser that can extract and 156 assign some semantic information to the subtags, even if the specific 157 subtag values are not recognized. Thus a parser need not have an up- 158 to-date copy (or any copy at all) of the subtag registry to perform 159 most searching and matching operations. 161 The syntax of the language tag in ABNF [RFC2234bis] is: 163 Language-Tag = (lang 164 *3("-" extlang) 165 ["-" script] 166 ["-" region] 167 *("-" variant) 168 *("-" extension) 169 ["-" privateuse]) 170 / privateuse ; private-use tag 171 / grandfathered ; grandfathered registrations 173 lang = 2*4ALPHA ; shortest ISO 639 code 174 / registered-lang 175 extlang = 3ALPHA ; reserved for future use 176 script = 4ALPHA ; ISO 15924 code 177 region = 2ALPHA ; ISO 3166 code 178 / 3DIGIT ; UN country number 179 variant = 5*8alphanum ; registered variants 180 / ( DIGIT 3alphanum ) 181 extension = singleton 1*("-" (2*8alphanum)) 182 privateuse = ("x"/"X") 1*("-" (1*8alphanum)) 183 singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT 184 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" 185 ; Single letters: x/X is reserved for private use 186 registered-lang = 4*8ALPHA ; registered language subtag 187 grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum)) 188 ; grandfathered registration 189 ; Note: i is the only singleton 190 ; that starts a grandfathered tag 191 alphanum = (ALPHA / DIGIT) ; letters and numbers 193 Figure 1: Language Tag ABNF 195 The character "-" is HYPHEN-MINUS (ABNF: %x2D). All subtags have a 196 maximum length of eight characters. Note that there is a subtlety in 197 the ABNF for 'variant': variants starting with a digit MAY be four 198 characters long, while those starting with a letter MUST be at least 199 five characters long. 201 Whitespace is not permitted in a language tag. For examples of 202 language tags, see Appendix B. 204 Note that although [RFC2234bis] refers to octets, the language tags 205 described in this document are sequences of characters from the US- 206 ASCII repertoire. Language tags MAY be used in documents and 207 applications that use other encodings, so long as these encompass the 208 US-ASCII repertoire. An example of this would be an XML document 209 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 211 The tags and their subtags, including private-use and extensions, are 212 to be treated as case insensitive: there exist conventions for the 213 capitalization of some of the subtags, but these MUST not be taken to 214 carry meaning. 216 For example: 218 o [ISO639-1] recommends that language codes be written in lower case 219 ('mn' Mongolian). 221 o [ISO3166] recommends that country codes be capitalized ('MN' 222 Mongolia). 224 o [ISO15924] recommends that script codes use lower case with the 225 initial letter capitalized ('Cyrl' Cyrillic). 227 However, in the tags defined by this document, the uppercase US-ASCII 228 letters in the range 'A' through 'Z' are considered equivalent and 229 mapped directly to their US-ASCII lowercase equivalents in the range 230 'a' through 'z'. Thus the tag "mn-Cyrl-MN" is not distinct from "MN- 231 cYRL-mn" or "mN-cYrL-Mn" (or any other combination) and each of these 232 variations conveys the same meaning: Mongolian written in the 233 Cyrillic script as used in Mongolia. 235 2.2 Language Subtag Sources and Interpretation 237 The namespace of language tags and their subtags is administered by 238 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 239 the rules in Section 5 of this document. The registry maintained by 240 IANA is the source for valid subtags: other standards referenced in 241 this section provide the source material for that registry. 243 Terminology in this section: 245 o Tag or tags refers to a complete language tag, such as 246 "fr-Latn-CA". Examples of tags in this document are enclosed in 247 double-quotes ("en-US"). 249 o Subtag refers to a specific section of a tag, delimited by hyphen, 250 such as the subtag 'Latn' in "fr-Latn-CA". Examples of subtags in 251 this document are enclosed in single quotes ('Latn'). 253 o Code or codes refers to values defined in external standards (and 254 which are used as subtags in this document). For example, 'Latn' 255 is an [ISO15924] script code which was used to define the 'Latn' 256 script subtag for use in a language tag. Examples of codes in 257 this document are enclosed in single quotes ('en', 'Latn'). 259 The definitions in this section apply to the various subtags within 260 the language tags defined by this document, excepting those 261 "grandfathered" tags defined in Section 2.2.8. 263 Language tags are designed so that each subtag type has unique length 264 and content restrictions. These make identification of the subtag's 265 type possible, even if the content of the subtag itself is 266 unrecognized. This allows tags to be parsed and processed without 267 reference to the latest version of the underlying standards or the 268 IANA registry and makes the associated exception handling when 269 parsing tags simpler. 271 Subtags in the IANA registry that do not come from an underlying 272 standard can only appear in specific positions in a tag. 273 Specifically, they can only occur as primary language subtags or as 274 variant subtags. 276 Note that sequences of private-use and extension subtags MUST occur 277 at the end of the sequence of subtags and MUST NOT be interspersed 278 with subtags defined elsewhere in this document. 280 Single letter and digit subtags are reserved for current or future 281 use. These include the following current uses: 283 o The single letter subtag 'x' is reserved to introduce a sequence 284 of private-use subtags. The interpretation of any private-use 285 subtags is defined solely by private agreement and is not defined 286 by the rules in this section or in any standard or registry 287 defined in this document. 289 o All other single letter subtags are reserved to introduce 290 standardized extension subtag sequences as described in 291 Section 3.6. 293 The single letter subtag 'i' is used by some grandfathered tags, such 294 as "i-enochian", where it always appears in the first position and 295 cannot be confused with an extension. 297 2.2.1 Primary Language Subtag 299 The primary language subtag is the first subtag in a language tag 300 (with the exception of private-use and certain grandfathered tags) 301 and cannot be omitted. The following rules apply to the primary 302 language subtag: 304 1. All two character language subtags were defined in the IANA 305 registry according to the assignments found in the standard ISO 306 639 Part 1, "ISO 639-1:2002, Codes for the representation of 307 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 308 assignments subsequently made by the ISO 639 Part 1 maintenance 309 agency or governing standardization bodies. 311 2. All three character language subtags were defined in the IANA 312 registry according to the assignments found in ISO 639 Part 2, 313 "ISO 639-2:1998 - Codes for the representation of names of 314 languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2], or 315 assignments subsequently made by the ISO 639 Part 2 maintenance 316 agency or governing standardization bodies. 318 3. The subtags in the range 'qaa' through 'qtz' are reserved for 319 private use in language tags. These subtags correspond to codes 320 reserved by ISO 639-2 for private use. These codes MAY be used 321 for non-registered primary-language subtags (instead of using 322 private-use subtags following 'x-'). Please refer to Section 4.5 323 for more information on private use subtags. 325 4. All four character language subtags are reserved for possible 326 future standardization. 328 5. All language subtags of 5 to 8 characters in length in the IANA 329 registry were defined via the registration process in Section 3.4 330 and MAY be used to form the primary language subtag. At the time 331 this document was created, there were no examples of this kind of 332 subtag and future registrations of this type will be discouraged: 333 primary languages are strongly RECOMMENDED for registration with 334 ISO 639 and proposals rejected by ISO 639/RA will be closely 335 scrutinized before they are registered with IANA. 337 6. The single character subtag 'x' as the primary subtag indicates 338 that the language tag consists solely of subtags whose meaning is 339 defined by private agreement. For example, in the tag "x-fr-CH", 340 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 341 French language or the country of Switzerland (or any other value 342 in the IANA registry) unless there is a private agreement in 343 place to do so. See Section 4.5. 345 7. The single character subtag 'i' is used by some grandfathered 346 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 347 grandfathered tags have a primary language subtag in their first 348 position) 350 8. Other values MUST NOT be assigned to the primary subtag except by 351 revision or update of this document. 353 Note: For languages that have both an ISO 639-1 two character code 354 and an ISO 639-2 three character code, only the ISO 639-1 two 355 character code is defined in the IANA registry. 357 Note: For languages that have no ISO 639-1 two character code and for 358 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 359 (Bibliographic) codes differ, only the Terminology code is defined in 360 the IANA registry. At the time this document was created, all 361 languages that had both kinds of three character code were also 362 assigned a two character code; it is not expected that future 363 assignments of this nature will occur. 365 Note: To avoid problems with versioning and subtag choice as 366 experienced during the transition between RFC 1766 and RFC 3066, as 367 well as the canonical nature of subtags defined by this document, the 368 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 369 RA-JAC) has included the following statement in [iso639.principles]: 371 "A language code already in ISO 639-2 at the point of freezing ISO 372 639-1 shall not later be added to ISO 639-1. This is to ensure 373 consistency in usage over time, since users are directed in Internet 374 applications to employ the alpha-3 code when an alpha-2 code for that 375 language is not available." 377 In order to avoid instability of the canonical form of tags, if a two 378 character code is added to ISO 639-1 for a language for which a three 379 character code was already included in ISO 639-2, the two character 380 code will not be added as a subtag in the registry. See Section 3.3. 382 For example, if some content were tagged with 'haw' (Hawaiian), which 383 currently has no two character code, the tag would not be invalidated 384 if ISO 639-1 were to assign a two character code to the Hawaiian 385 language at a later date. 387 For example, one of the grandfathered IANA registrations is 388 "i-enochian". The subtag 'enochian' could be registered in the IANA 389 registry as a primary language subtag (assuming that ISO 639 does not 390 register this language first), making tags such as "enochian-AQ" and 391 "enochian-Latn" valid. 393 2.2.2 Extended Language Subtags 395 The following rules apply to the extended language subtags: 397 1. Three letter subtags immediately following the primary subtag are 398 reserved for future standardization, anticipating work that is 399 currently under way on ISO 639. 401 2. Extended language subtags MUST follow the primary subtag and 402 precede any other subtags. 404 3. There MAY be up to three extended language subtags. 406 4. Extended language subtags MUST NOT be registered or used to form 407 language tags. Their syntax is described here so that 408 implementations can be compatible with any future revision of 409 this document which does provide for their registration. 411 Extended language subtag records, once they appear in the registry, 412 MUST include exactly one 'Prefix' field indicating an appropriate 413 language subtag or sequence of subtags that MUST always appear as a 414 prefix to the extended language subtag. 416 Example: In a future revision or update of this document, the tag 417 "zh-gan" (registered under RFC 3066) might become a valid non- 418 grandfathered (that is, redundant) tag in which the subtag 'gan' 419 might represent the Chinese dialect 'Gan'. 421 2.2.3 Script Subtag 423 Script subtags are used to indicate the script or writing system 424 variations that distinguish the written forms of a language or its 425 dialects. The following rules apply to the script subtags: 427 1. All four character subtags were defined according to 428 [ISO15924]--"Codes for the representation of the names of 429 scripts": alpha-4 script codes, or subsequently assigned by the 430 ISO 15924 maintenance agency or governing standardization bodies, 431 denoting the script or writing system used in conjunction with 432 this language. 434 2. Script subtags MUST immediately follow the primary language 435 subtag and all extended language subtags and MUST occur before 436 any other type of subtag described below. 438 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 439 use in language tags. These subtags correspond to codes reserved 440 by ISO 15924 for private use. These codes MAY be used for non- 441 registered script values. Please refer to Section 4.5 for more 442 information on private-use subtags. 444 4. Script subtags cannot be registered using the process in 445 Section 3.4 of this document. Variant subtags MAY be considered 446 for registration for that purpose. 448 5. There MUST be at most one script subtag in a language tag and the 449 script subtag SHOULD be omitted when it adds no distinguishing 450 value to the tag or when the primary language subtag's record 451 includes a Supress-Script field listing the applicable script 452 subtag. 454 Example: "sr-Latn" represents Serbian written using the Latin script. 456 2.2.4 Region Subtag 458 Region subtags are used to indicate regional or geographical 459 variations that define a language or its dialects. The following 460 rules apply to the region subtags: 462 1. The region subtag defines language variations used in a specific 463 region, geographic, or political area. Region subtags MUST 464 follow any language, extended language, or script subtags and 465 MUST precede all other subtags. 467 2. All two character subtags following the primary subtag were 468 defined in the IANA registry according to the assignments found 469 in [ISO3166]--"Codes for the representation of names of countries 470 and their subdivisions - Part 1: Country codes"--alpha-2 country 471 codes or assignments subsequently made by the ISO 3166 472 maintenance agency or governing standardization bodies. 474 3. All three character subtags consisting of digit (numeric) 475 characters following the primary subtag were defined in the IANA 476 registry according to the assignments found in UN Standard 477 Country or Area Codes for Statistical Use [UN_M.49] or 478 assignments subsequently made by the governing standards body. 479 Note that not all of the UN M.49 codes are defined in the IANA 480 registry. The following rules define which codes are entered 481 into the registry as valid subtags: 483 A. UN numeric codes assigned to 'macro-geographical 484 (continental)' or sub-regions MUST be registered in the 485 registry. These codes are not associated with an assigned 486 ISO 3166 alpha-2 code and represent supra-national areas, 487 usually covering more than one nation, state, province, or 488 territory. 490 B. UN numeric codes for 'economic groupings' or 'other 491 groupings' MUST NOT be registered in the IANA registry and 492 MUST NOT be used to form language tags. 494 C. UN numeric codes for countries or areas with ambiguous ISO 495 3166 alpha-2 codes, when entered into the registry, MUST be 496 defined according to the rules in Section 3.3 and MUST be 497 used to form language tags that represent the country or 498 region for which they are defined. 500 D. UN numeric codes for countries or areas for which there is an 501 associated ISO 3166 alpha-2 code in the registry MUST NOT be 502 entered into the registry and MUST NOT be used to form 503 language tags. Note that the ISO 3166-based subtag in the 504 registry MUST actually be associated with the UN M.49 code in 505 question. 507 E. All other UN numeric codes for countries or areas which do 508 not have an associated ISO 3166 alpha-2 code MUST NOT be 509 entered into the registry and MUST NOT be used to form 510 language tags. For more information about these codes, see 511 Section 3.3. 513 4. Note: The alphanumeric codes in Appendix X of the UN document 514 MUST NOT be entered into the registry and MUST NOT be used to 515 form language tags. (At the time this document was created these 516 values match the ISO 3166 alpha-2 codes.) 518 5. There MUST be at most one region subtag in a language tag and the 519 region subtag MAY be omitted, as when it adds no distinguishing 520 value to the tag. 522 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 523 reserved for private use in language tags. These subtags 524 correspond to codes reserved by ISO 3166 for private use. These 525 codes MAY be used for private use region subtags (instead of 526 using a private-use subtag sequence). Please refer to 527 Section 4.5 for more information on private use subtags. 529 "de-CH" represents German ('de') as used in Switzerland ('CH'). 531 "sr-Latn-CS" represents Serbian ('sr') written using Latin script 532 ('Latn') as used in Serbia and Montenegro ('CS'). 534 "es-419" represents Spanish ('es') as used in the UN-defined Latin 535 America and Caribbean region ('419'). 537 2.2.5 Variant Subtags 539 Variant subtags are used to indicate additional, well-recognized 540 variations that define a language or its dialects which are not 541 covered by other available subtags. The following rules apply to the 542 variant subtags: 544 1. Variant subtags are not associated with any external standard. 545 Variant subtags and their meanings are defined by the 546 registration process defined in Section 3.4. 548 2. Variant subtags MUST follow all of the other defined subtags, but 549 precede any extension or private-use subtag sequences. 551 3. More than one variant MAY be used to form the language tag. 553 4. Variant subtags MUST be registered with IANA according to the 554 rules in Section 3.4 of this document before being used to form 555 language tags. In order to distinguish variants from other types 556 of subtags, registrations MUST meet the following length and 557 content restrictions: 559 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 560 at least five characters long. 562 2. Variant subtags that begin with a digit (0-9) MUST be at 563 least four characters long. 565 Variant subtag records in the language subtag registry MAY include 566 one or more 'Prefix' fields, which indicates the language tag or tags 567 that would make a suitable prefix (with other subtags, as 568 appropriate) in forming a language tag with the variant. For 569 example, the subtag 'nedis' has a Prefix of "sl", making it suitable 570 to form language tags such as "sl-nedis" and "sl-IT-nedis", but not 571 suitable for use in a tag such as "zh-nedis" or "it-IT-nedis". 573 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 575 "de-CH-1996" represents German as used in Switzerland and as written 576 using the spelling reform beginning in the year 1996 C.E. 578 Most variants that share a prefix are mutually exclusive. For 579 example, the German orthographic variations '1996' and '1901' SHOULD 580 NOT be used in the same tag, as they represent the dates of different 581 spelling reforms. A variant that can meaningfully be used in 582 combination with another variant SHOULD include a 'Prefix' field in 583 its registry record that lists that other variant. For example, if 584 another German variant 'example' were created that made sense to use 585 with '1996', then 'example' should include two Prefix fields: "de" 586 and "de-1996". 588 2.2.6 Extension Subtags 590 Extensions provide a mechanism for extending language tags for use in 591 various applications. See: Section 3.6. The following rules apply 592 to extensions: 594 1. Extension subtags are separated from the other subtags defined 595 in this document by a single-letter subtag ("singleton"). The 596 singleton MUST be one allocated to a registration authority via 597 the mechanism described in Section 3.6 and cannot be the letter 598 'x', which is reserved for private-use subtag sequences. 600 2. Note: Private-use subtag sequences starting with the singleton 601 subtag 'x' are described below. 603 3. An extension MUST follow at least a primary language subtag. 604 That is, a language tag cannot begin with an extension. 605 Extensions extend language tags, they do not override or replace 606 them. For example, "a-value" is not a well-formed language tag, 607 while "de-a-value" is. 609 4. Each singleton subtag MUST appear at most one time in each tag 610 (other than as a private-use subtag). That is, singleton 611 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 612 ccc" is invalid because the subtag 'a' appears twice. Note that 613 the tag "en-a-bbb-x-a-ccc" is valid because the second 614 appearance of the singleton 'a' is in a private use sequence. 616 5. Extension subtags MUST meet all of the requirements for the 617 content and format of subtags defined in this document. 619 6. Extension subtags MUST meet whatever requirements are set by the 620 document that defines their singleton prefix and whatever 621 requirements are provided by the maintaining authority. 623 7. Each extension subtag MUST be from two to eight characters long 624 and consist solely of letters or digits, with each subtag 625 separated by a single '-'. 627 8. Each singleton MUST be followed by at least one extension 628 subtag. For example, the tag "tlh-a-b-foo" is invalid because 629 the first singleton 'a' is followed immediately by another 630 singleton 'b'. 632 9. Extension subtags MUST follow all language, extended language, 633 script, region and variant subtags in a tag. 635 10. All subtags following the singleton and before another singleton 636 are part of the extension. Example: In the tag "fr-a-Latn", the 637 subtag 'Latn' does not represent the script subtag 'Latn' 638 defined in the IANA Language Subtag Registry. Its meaning is 639 defined by the extension 'a'. 641 11. In the event that more than one extension appears in a single 642 tag, the tag SHOULD be canonicalized as described in 643 Section 4.4. 645 For example, if the prefix singleton 'r' and the shown subtags were 646 defined, then the following tag would be a valid example: "en-Latn- 647 GB-boont-r-extended-sequence-x-private" 649 2.2.7 Private Use Subtags 651 Private use subtags are used to indicate distinctions in language 652 important in a given context by private agreement. The following 653 rules apply to private-use subtags: 655 1. Private-use subtags are separated from the other subtags defined 656 in this document by the reserved single-character subtag 'x'. 658 2. Private-use subtags MUST follow all language, extended language, 659 script, region, variant, and extension subtags in the tag. 660 Another way of saying this is that all subtags following the 661 singleton 'x' MUST be considered private use. Example: The 662 subtag 'US' in the tag "en-x-US" is a private use subtag. 664 3. A tag MAY consist entirely of private-use subtags. 666 4. No source is defined for private use subtags. Use of private use 667 subtags is by private agreement only. 669 For example: Users who wished to utilize SIL Ethnologue for 670 identification might agree to exchange tags such as "az-Arab-x-AZE- 671 derbend". This example contains two private-use subtags. The first 672 is 'AZE' and the second is 'derbend'. 674 2.2.8 Pre-Existing RFC 3066 Registrations 676 Existing IANA-registered language tags from RFC 1766 and/or RFC 3066 677 maintain their validity. IANA will maintain these tags in the 678 registry under either the "grandfathered" or "redundant" type. For 679 more information see Section 3.7. 681 It is important to note that all language tags formed under the 682 guidelines in this document were either legal, well-formed tags or 683 could have been registered under RFC 3066. 685 2.2.9 Classes of Conformance 687 Implementations sometimes need to describe their capabilities with 688 regard to the rules and practices described in this document. There 689 are two classes of conforming implementations described by this 690 document: "well-formed" processors and "validating" processors. 691 Claims of conformance SHOULD explicitly reference one of these 692 definitions. 694 An implementation that claims to check for well-formed language tags 695 MUST: 697 o Check that the tag and all of its subtags, including extension and 698 private-use subtags, conform to the ABNF or that the tag is on the 699 list of grandfathered tags. 701 o Check that singleton subtags that identify extensions do not 702 repeat. For example, the tag "en-a-xx-b-yy-a-zz" is not well- 703 formed. 705 Well-formed processors are strongly encouraged to implement the 706 canonicalization rules contained in Section 4.4. 708 An implementation that claims to be validating MUST: 710 o Check that the tag is well-formed. 712 o Specify the particular registry date for which the implementation 713 performs validation of subtags. 715 o Check that either the tag is a grandfathered tag, or that all 716 language, script, region, and variant subtags consist of valid 717 codes for use in language tags according to the IANA registry as 718 of the particular date specified by the implementation. 720 o Specify which, if any, extension RFCs as defined in Section 3.6 721 are supported, including version, revision, and date. 723 o For any such extensions supported, check that all subtags used in 724 that extension are valid. 726 o For variant and extended language subtags, if the registry 727 contains one or more 'Prefix' fields for that subtag, check that 728 the tag matches at least one prefix. The tag matches if all the 729 subtags in the 'Prefix' also appear in the tag. For example, the 730 prefix "es-CO" matches the tag "es-Latn-CO-x-private" because both 731 the 'es' language subtag and 'CO' region subtag appear in the tag. 733 3. Registry Format and Maintenance 735 This section defines the Language Subtag Registry and the maintenance 736 and update procedures associated with it. 738 The language subtag registry will be maintained so that, except for 739 extension subtags, it is possible to validate all of the subtags that 740 appear in a language tag under the provisions of this document or its 741 revisions or successors. In addition, the meaning of the various 742 subtags will be unambiguous and stable over time. (The meaning of 743 private-use subtags, of course, is not defined by the IANA registry.) 745 The registry defined under this document contains a comprehensive 746 list of all of the subtags valid in language tags. This allows 747 implementers a straightforward and reliable way to validate language 748 tags. 750 3.1 Format of the IANA Language Subtag Registry 752 The IANA Language Subtag Registry ("the registry") will consist of a 753 text file that is machine readable in the format described in this 754 section, plus copies of the registration forms approved by the 755 Language Subtag Reviewer in accordance with the process described in 756 Section 3.4. With the exception of the registration forms for 757 grandfathered and redundant tags, no registration records will be 758 maintained for the initial set of subtags. 760 The registry will be in a modified record-jar format text file 761 [record-jar]. Lines are limited to 72 characters, including all 762 whitespace. 764 Records are separated by lines containing only the sequence "%%" 765 (%x25.25). 767 Each field can be viewed as a single, logical line of ASCII 768 characters, comprising a field-name and a field-body separated by a 769 COLON character (%x3A). For convenience, the field-body portion of 770 this conceptual entity can be split into a multiple-line 771 representation; this is called "folding". The format of the registry 772 is described by the following ABNF (per [RFC2234bis]): 774 registry = record *("%%" CRLF record) 775 record = 1*( field-name *SP ":" *SP field-body CRLF ) 776 field-name = *(ALPHA / DIGIT / "-") 777 field-body = *(ASCCHAR/LWSP) 778 ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 779 UNICHAR = "&#x" 2*6HEXDIG ";" 780 The sequence '..' (%x2E.2E) in a field-body denotes a range of 781 values. Such a range represents all subtags of the same length that 782 are alphabetically within that range, including the values explicitly 783 mentioned. For example 'a..c' denotes the values 'a', 'b', and 'c'. 785 Characters from outside the US-ASCII repertoire, as well as the 786 AMPERSAND character ("&", %x26) when it occurs in a field-body are 787 represented by a "Numeric Character Reference" using hexadecimal 788 notation in the style used by [XML10] (see 789 ). This consists of the 790 sequence "&#x" (%x26.23.78) followed by a hexadecimal representation 791 of the character's code point in [ISO10646] followed by a closing 792 semicolon (%x3B). For example, the EURO SIGN, U+20AC, would be 793 represented by the sequence "€". Note that the hexadecimal 794 notation MAY have between two and six digits. 796 All fields whose field-body contains a date value use the "full-date" 797 format specified in [RFC3339]. For example: "2004-06-28" represents 798 June 28, 2004 in the Gregorian calendar. 800 The first record in the file contains the single field whose field- 801 name is "File-Date". The field-body of this record contains the last 802 modification date of this copy of the registry, making it possible to 803 compare different versions of the registry. The registry on the IANA 804 website is the most current. Versions with an older date than that 805 one are not up-to-date. 807 File-Date: 2004-06-28 808 %% 810 Subsequent records represent subtags in the registry. Each of the 811 fields in each record MUST occur no more than once, unless otherwise 812 noted below. Each record MUST contain the following fields: 814 o 'Type' 816 * Type's field-value MUST consist of one of the following 817 strings: "language", "extlang", "script", "region", "variant", 818 "grandfathered", and "redundant" and denotes the type of tag or 819 subtag. 821 o Either 'Subtag' or 'Tag' 823 * Subtag's field-value contains the subtag being defined. This 824 field MUST only appear in records of whose Type has one of 825 these values: "language", "extlang", "script", "region", or 826 "variant". 828 * Tag's field-value contains a complete language tag. This field 829 MUST only appear in records whose Type has one of these values: 830 "grandfathered" or "redundant". 832 o Description 834 * Description's field-value contains a non-normative description 835 of the subtag or tag. 837 o Added 839 * Added's field-value contains the date the record was added to 840 the registry. 842 The 'Subtag' or 'Tag' field MUST use lowercase letters to form the 843 subtag or tag, with two exceptions. Subtags whose 'Type' field is 844 'script' (in other words, subtags defined by ISO 15924) MUST use 845 titlecase. Subtags whose 'Type' field is 'region' (in other words, 846 subtags defined by ISO 3166) MUST use uppercase. These exceptions 847 mirror the use of case in the underlying standards. 849 The field 'Description' MAY appear more than one time. At least one 850 of the 'Description' fields MUST contain a description of the tag 851 being registered written or transcribed into the Latin script; the 852 same or additional fields MAY also include a description in a non- 853 Latin script. The 'Description' field is used for identification 854 purposes and SHOULD NOT be taken to represent the actual native name 855 of the language or variation or to be in any particular language. 856 Most descriptions are taken directly from source standards such as 857 ISO 639 or ISO 3166. 859 Note: Descriptions in registry entries that correspond to ISO 639, 860 ISO 15924, ISO 3166 or UN M.49 codes are intended only to indicate 861 the meaning of that identifier as defined in the source standard at 862 the time it was added to the registry. The description does not 863 replace the content of the source standard itself. The descriptions 864 are not intended to be the English localized names for the subtags. 865 Localization or translation of language tag and subtag descriptions 866 is out of scope of this document. 868 Each record MAY also contain the following fields: 870 o Preferred-Value 872 * For fields of type 'language', 'extlang', 'script', 'region', 873 and 'variant', 'Preferred-Value' contains a subtag of the same 874 'Type' which is preferred for forming the language tag. 876 * For fields of type 'grandfathered' and 'redundant', a canonical 877 mapping to a complete language tag. 879 o Deprecated 881 * Deprecated's field-value contains the date the record was 882 deprecated. 884 o Prefix 886 * Prefix's field-value contains a language tag with which this 887 subtag MAY be used to form a new language tag, perhaps with 888 other subtags as well. This field MUST only appear in records 889 whose 'Type' field-value is 'variant' or 'extlang'. For 890 example, the 'Prefix' for the variant 'nedis' is 'sl', meaning 891 that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate 892 while the tag "is-nedis" is not. 894 o Comments 896 * Comments contains additional information about the subtag, as 897 deemed appropriate for understanding the registry and 898 implementing language tags using the subtag or tag. 900 o Suppress-Script 902 * Suppress-Script contains a script subtag that SHOULD NOT be 903 used to form language tags with the associated primary language 904 subtag. This field MUST only appear in records whose 'Type' 905 field-value is 'language'. See Section 4.1. 907 The field 'Deprecated' MAY be added to any record via the maintenance 908 process described in Section 3.2 or via the registration process 909 described in Section 3.4. Usually the addition of a 'Deprecated' 910 field is due to the action of one of the standards bodies, such as 911 ISO 3166, withdrawing a code. In some historical cases it might not 912 have been possible to reconstruct the original deprecation date. 913 For these cases, an approximate date appears in the registry. 914 Although valid in language tags, subtags and tags with a 'Deprecated' 915 field are deprecated and validating processors SHOULD NOT generate 916 these subtags. Note that a record that contains a 'Deprecated' field 917 and no corresponding 'Preferred-Value' field has no replacement 918 mapping. 920 Thie field 'Preferred-Value' contains a mapping between the record in 921 which it appears and a tag or subtag which SHOULD be preferred when 922 selected language tags. These values form three groups: 924 ISO 639 language codes which were later withdrawn in favor of 925 other codes. These values are mostly a historical curiosity. 927 ISO 3166 region codes which have been withdrawn in favor of a new 928 code. This sometimes happens when a country changes its name or 929 administration in such a way that warrants a new region code. 931 Tags grandfathered from RFC 3066. In many cases these tags have 932 become obsolete because the values they represent were later 933 encoded by ISO 639. 935 Records that contain a 'Preferred-Value' field MUST also have a 936 'Deprecated' field. This field contains a date of deprecation. Thus 937 a language tag processor can use the registry to construct the valid, 938 non-deprecated set of subtags for a given date. In addition, for any 939 given tag, a processor can construct the set of valid language tags 940 that correspond to that tag for all dates up to the date of the 941 registry. The ability to do these mappings MAY be beneficial to 942 applications that are matching, selecting, for filtering content 943 based on its language tags. 945 Note that 'Preferred-Value' mappings in records of type 'region' MAY 946 NOT represent exactly the same meaning as the original value. There 947 are many reasons for a country code to be changed and the effect this 948 has on the formation of language tags will depend on the nature of 949 the change in question. 951 In particular, the 'Preferred-Value' field does not imply retagging 952 content that uses the affected subtag. 954 The field 'Preferred-Value' MUST NOT be modified once created in the 955 registry. The field MAY be added to records of type "grandfathered" 956 and "region" according to the rules in Section 3.2. Otherwise the 957 field MUST NOT be added to any record already in the registry. 959 The 'Preferred-Value' field in records of type "grandfathered" and 960 "redundant" contains whole language tags that are strongly 961 RECOMMENDED for use in place of the record's value. In many cases 962 the mappings were created by deprecation of the tags during the 963 period before this document was adopted. For example, the tag "no- 964 nyn" was deprecated in favor of the ISO 639-1 defined language code 965 'nn'. 967 Records of type 'variant' MAY have more than one field of type 968 'Prefix'. Additional fields of this type MAY be added to a 'variant' 969 record via the registration process. 971 Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. 973 The field-value of the 'Prefix' field consists of a language tag 974 whose subtags are appropriate to use with this subtag. For example, 975 the variant subtag '1996' has a Prefix field of "de". This means 976 that tags starting with the sequence "de-" are appropriate with this 977 subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while 978 the tag "fr-1996" is an inappropriate choice. 980 The field of type 'Prefix' MUST NOT be removed from any record. The 981 field-value for this type of field MUST NOT be modified. 983 The field 'Comments' MAY appear more than once per record. This 984 field MAY be inserted or changed via the registration process and no 985 guarantee of stability is provided. The content of this field is not 986 restricted, except by the need to register the information, the 987 suitability of the request, and by reasonable practical size 988 limitations. Long screeds about a particular subtag are frowned 989 upon. 991 The field 'Suppress-Script' MUST only appear in records whose 'Type' 992 field-value is 'language'. This field MAY appear at most one time in 993 a record. This field indicates a script used to write the 994 overwhelming majority of documents for the given language and which 995 therefore adds no distinguishing information to a language tag. It 996 helps ensure greater compatibility between the language tags 997 generated according to the rules in this document and language tags 998 and tag processors or consumers based on RFC 3066. For example, 999 virtually all Icelandic documents are written in the Latin script, 1000 making the subtag 'Latn' redundant in the tag "is-Latn". 1002 For examples of registry entries and their format, see Appendix C. 1004 3.2 Maintenance of the Registry 1006 Maintenance of the registry requires that as codes are assigned or 1007 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1008 Subtag Reviewer will evaluate each change, determine whether it 1009 conflicts with existing registry entries, and submit the information 1010 to IANA for inclusion in the registry. If an change takes place and 1011 the Language Subtag Reviewer does not do this in a timely manner, 1012 then any interested party MAY use the procedure in Section 3.4 to 1013 register the appropriate update. 1015 Note: The redundant and grandfathered entries together are the 1016 complete list of tags registered under [RFC3066]. The redundant tags 1017 are those that can now be formed using the subtags defined in the 1018 registry together with the rules of Section 2.2. The grandfathered 1019 entries are those that can never be legal under those same 1020 provisions. 1022 The set of redundant and grandfathered tags is permanent and stable: 1023 no new entries will be added and none of the entries will be removed. 1024 Records of type 'grandfathered' MAY have their type converted to 1025 'redundant': see Section 3.7 for more information. 1027 RFC 3066 tags that were deprecated prior to the adoption of this 1028 document are part of the list of grandfathered tags and their 1029 component subtags were not included as registered variants (although 1030 they remain eligible for registration). For example, the tag "art- 1031 lojban" was deprecated in favor of the language subtag 'jbo'. 1033 The Language Subtag Reviewer MUST ensure that new subtags meet the 1034 requirements in Section 4.1 or submit an appropriate alternate subtag 1035 as described in that section. If a change or addition to the 1036 registry is needed, the Language Subtag Reviewer will prepare the 1037 complete record, including all fields, and forward it to IANA for 1038 insertion into the registry. If this represents a new subtag, then 1039 the message will indicate that this represents an INSERTION of a 1040 record. If this represents a change to an existing subtag, then the 1041 message MUST indicate that this represents a MODIFICATION, as shown 1042 in the following example: 1044 LANGUAGE SUBTAG MODIFICATION 1045 File-Date: 2005-01-02 1046 %% 1047 Type: variant 1048 Subtag: nedis 1049 Description: Natisone dialect 1050 Description: Nadiza dialect 1051 Added: 2003-10-09 1052 Prefix: sl 1053 Comments: This is a comment shown 1054 as an example. 1055 %% 1057 Figure 4 1059 Whenever an entry is created or modified in the registry, the 'File- 1060 Date' record at the start of the registry is updated to reflect the 1061 most recent modification date in the [RFC3339] "full-date" format. 1063 Values in the 'Subtag' field MUST be lowercase except as provided for 1064 in Section 3.1. 1066 3.3 Stability of IANA Registry Entries 1068 The stability of entries and their meaning in the registry is 1069 critical to the long term stability of language tags. The rules in 1070 this section guarantee that a specific language tag's meaning is 1071 stable over time and will not change. 1073 These rules specifically deal with how changes to codes (including 1074 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1075 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1076 Subtag Registry. Assignments to the IANA Language Subtag Registry 1077 MUST follow the following stability rules: 1079 o Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1080 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1081 guaranteed to be stable over time. 1083 o Values in the 'Description' field MUST NOT be changed in a way 1084 that would invalidate previously-existing tags. They MAY be 1085 broadened somewhat in scope, changed to add information, or 1086 adapted to the most common modern usage. For example, countries 1087 occasionally change their official names: an historical example of 1088 this would be "Upper Volta" changing to "Burkina Faso". 1090 o Values in the field 'Prefix' MAY be added to records of type 1091 'variant' via the registration process. 1093 o Values in the field 'Prefix' MAY be modified, so long as the 1094 modifications broaden the set of prefixes. That is, a prefix MAY 1095 be replaced by one of its own prefixes. For example, the prefix 1096 "en-US" could be replaced by "en", but not by the prefixes "en- 1097 Latn", "fr", or "en-US-boont". If one of those prefixes were 1098 needed, a new Prefix SHOULD be registered. 1100 o Values in the field 'Prefix' MUST NOT be removed. 1102 o The field 'Comments' MAY be added, changed, modified, or removed 1103 via the registration process or any of the processes or 1104 considerations described in this section. 1106 o The field 'Suppress-Script' MAY be added or removed via the 1107 registration process. 1109 o Codes assigned by ISO 639, ISO 15924, and ISO 3166 that do not 1110 conflict with existing subtags of the associated type and whose 1111 meaning is not the same as an existing subtag of the same type are 1112 entered into the IANA registry as new records. 1114 o Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1115 withdrawn by their respective maintenance or registration 1116 authority remain valid in language tags. A 'Deprecated' field 1117 containing the date of withdrawal is added to the record. If a 1118 new record of the same type is added that represents a replacement 1119 value, then a 'Preferred-Value' field MAY also be added. The 1120 registration process MAY be used to add comments about the 1121 withdrawal of the code by the respective standard. 1123 * The region code 'TL' was assigned to the country 'Timor-Leste', 1124 replacing the code 'TP' (which was assigned to 'East Timor' 1125 when it was under administration by Portugal). The subtag 'TP' 1126 remains valid in language tags, but its record contains the a 1127 'Preferred-Value' of 'TL' and its field 'Deprecated' contains 1128 the date the new code was assigned ('2004-07-06'). 1130 o Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1131 with existing subtags of the associated type, including subtags 1132 that are deprecated, MUST NOT be entered into the registry. The 1133 following additional considerations apply to subtag values that 1134 are reassigned: 1136 * For ISO 639 codes, if the newly assigned code's meaning is not 1137 represented by a subtag in the IANA registry, the Language 1138 Subtag Reviewer, as described in Section 3.4, SHALL prepare a 1139 proposal for entering in the IANA registry as soon as practical 1140 a registered language subtag as an alternate value for the new 1141 code. The form of the registered language subtag will be at 1142 the discretion of the Language Subtag Reviewer and MUST conform 1143 to other restrictions on language subtags in this document. 1145 * For all subtags whose meaning is derived from an external 1146 standard (i.e. ISO 639, ISO 15924, ISO 3166, or UN M.49), if a 1147 new meaning is assigned to an existing code and the new meaning 1148 broadens the meaning of that code, then the meaning for the 1149 associated subtag MAY be changed to match. The meaning of a 1150 subtag MUST NOT be narrowed, however, as this can result in an 1151 unknown proportion of the existing uses of a subtag becoming 1152 invalid. Note: ISO 639 MA/RA has adopted a similar stability 1153 policy. 1155 * For ISO 15924 codes, if the newly assigned code's meaning is 1156 not represented by a subtag in the IANA registry, the Language 1157 Subtag Reviewer, as described in Section 3.4, SHALL prepare a 1158 proposal for entering in the IANA registry as soon as practical 1159 a registered variant subtag as an alternate value for the new 1160 code. The form of the registered variant subtag will be at the 1161 discretion of the Language Subtag Reviewer and MUST conform to 1162 other restrictions on variant subtags in this document. 1164 * For ISO 3166 codes, if the newly assigned code's meaning is 1165 associated with the same UN M.49 code as another 'region' 1166 subtag, then the existing region subtag remains as the 1167 preferred value for that region and no new entry is created. A 1168 comment MAY be added to the existing region subtag indicating 1169 the relationship to the new ISO 3166 code. 1171 * For ISO 3166 codes, if the newly assigned code's meaning is 1172 associated with a UN M.49 code that is not represented by an 1173 existing region subtag, then the Language Subtag Reviewer, as 1174 described in Section 3.4, SHALL prepare a proposal for entering 1175 the appropriate UN M.49 country code as an entry in the IANA 1176 registry. 1178 * Codes assigned by UN M.49 to countries or areas (as opposed to 1179 geographical regions and sub-regions) for which there is no 1180 corresponding ISO 3166 code MUST NOT be registered, except 1181 under the previous provision. If it is necessary to identify a 1182 region for which only a UN M.49 code exists in language tags, 1183 then the registration authority for ISO 3166 SHOULD be 1184 petitioned to assign a code, which can then be registered for 1185 use in language tags. At the time this document was written, 1186 there were only four such codes: 830 (Channel Islands), 831 1187 (Guernsey), 832 (Jersey), and 833 (Isle of Man). This rule 1188 exists so that UN M.49 codes remain available as the value of 1189 last resort in cases where ISO 3166 reassigns a deprecated 1190 value in the registry. 1192 * For ISO 3166 codes, if there is no associated UN numeric code, 1193 then the Language Subtag Reviewer SHALL petition the UN to 1194 create one. If there is no response from the UN within ninety 1195 days of the request being sent, the Language Subtag Reviewer 1196 SHALL prepare a proposal for entering in the IANA registry as 1197 soon as practical a registered variant subtag as an alternate 1198 value for the new code. The form of the registered variant 1199 subtag will be at the discretion of the Language Subtag 1200 Reviewer and MUST conform to other restrictions on variant 1201 subtags in this document. This situation is very unlikely to 1202 ever occur. 1204 o Stability provisions apply to grandfathered tags with this 1205 exception: should all of the subtags in a grandfathered tag become 1206 valid subtags in the IANA registry, then the field 'Type' in that 1207 record is changed from 'grandfathered' to 'redundant'. Note that 1208 this will not affect language tags that match the grandfathered 1209 tag, since these tags will now match valid generative subtag 1210 sequences. For example, if the subtag 'gan' in the language tag 1211 "zh-gan" were to be registered as an extended language subtag, 1212 then the grandfathered tag "zh-gan" would be deprecated (but 1213 existing content or implementations that use "zh-gan" would remain 1214 valid). 1216 3.4 Registration Procedure for Subtags 1218 The procedure given here MUST be used by anyone who wants to use a 1219 subtag not currently in the IANA Language Subtag Registry. 1221 Only subtags of type 'language' and 'variant' will be considered for 1222 independent registration of new subtags. Handling of subtags needed 1223 for stability and subtags necessary to keep the registry synchronized 1224 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1225 defined by this document are described in Section 3.2. Stability 1226 provisions are described in Section 3.3. 1228 This procedure MAY also be used to register or alter the information 1229 for the "Description", "Comments", "Deprecated", or "Prefix" fields 1230 in a subtag's record as described in Figure 9. Changes to all other 1231 fields in the IANA registry are NOT permitted. 1233 Registering a new subtag or requesting modifications to an existing 1234 tag or subtag starts with the requester filling out the registration 1235 form reproduced below. Note that each response is not limited in 1236 size so that the request can adequately describe the registration. 1237 The fields in the "Record Requested" section SHOULD follow the 1238 requirements in Section 3.1. 1240 LANGUAGE SUBTAG REGISTRATION FORM 1241 1. Name of requester: 1242 2. E-mail address of requester: 1243 3. Record Requested: 1245 Type: 1246 Subtag: 1247 Description: 1248 Prefix: 1249 Preferred-Value: 1250 Deprecated: 1251 Suppress-Script: 1252 Comments: 1254 4. Intended meaning of the subtag: 1255 5. Reference to published description 1256 of the language (book or article): 1257 6. Any other relevant information: 1259 Figure 5 1261 The subtag registration form MUST be sent to 1262 for a two week review period before it can 1263 be submitted to IANA. (This is an open list and can be joined by 1264 sending a request to .) 1266 Variant and extlang subtags are always registered for use with a 1267 particular range of language tags. For example, the subtag 'rozaj' 1268 is intended for use with language tags that start with the primary 1269 language subtag "sl", since Resian is a dialect of Slovenian. Thus 1270 the subtag 'rozaj' could be included in tags such as "sl-Latn-rozaj" 1271 or "sl-IT-rozaj". This information is stored in the "Prefix" field 1272 in the registry. Variant registration requests are REQUIRED to 1273 include at least one "Prefix" field in the registration form. 1275 The 'Prefix' field for a given registered subtag will be maintained 1276 in the IANA registry as a guide to usage. Additional prefixes MAY be 1277 added by filing an additional registration form. In that form, the 1278 "Any other relevant information:" field MUST indicate that it is the 1279 addition of a prefix. 1281 Requests to add a prefix to a variant subtag that imply a different 1282 semantic meaning will probably be rejected. For example, a request 1283 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1284 nedis" represented some German dialect would be rejected. The 1285 'nedis' subtag represents a particular Slovenian dialect and the 1286 additional registration would change the semantic meaning assigned to 1287 the subtag. A separate subtag SHOULD be proposed instead. 1289 The 'Description' field MUST contain a description of the tag being 1290 registered written or transcribed into the Latin script; it MAY also 1291 include a description in a non-Latin script. Non-ASCII characters 1292 MUST be escaped using the syntax described in Section 3.1. The 1293 'Description' field is used for identification purposes and doesn't 1294 necessarily represent the actual native name of the language or 1295 variation or to be in any particular language. 1297 While the 'Description' field itself is not guaranteed to be stable 1298 and errata corrections MAY be undertaken from time to time, attempts 1299 to provide translations or transcriptions of entries in the registry 1300 itself will probably be frowned upon by the community or rejected 1301 outright, as changes of this nature have an impact on the provisions 1302 in Section 3.3. 1304 The Language Subtag Reviewer is responsible for responding to 1305 requests for the registration of subtags through the registration 1306 process and is appointed by the IESG. 1308 When the two week period has passed the Language Subtag Reviewer 1309 either forwards the record to be inserted or modified to 1310 iana@iana.org according to the procedure described in Section 3.2, or 1311 rejects the request because of significant objections raised on the 1312 list or due to problems with constraints in this document (which MUST 1313 be explicitly cited). The reviewer MAY also extend the review period 1314 in two week increments to permit further discussion. The reviewer 1315 MUST indicate on the list whether the registration has been accepted, 1316 rejected, or extended following each two week period. 1318 Note that the reviewer can raise objections on the list if he or she 1319 so desires. The important thing is that the objection MUST be made 1320 publicly. 1322 The applicant is free to modify a rejected application with 1323 additional information and submit it again; this restarts the two 1324 week comment period. 1326 Decisions made by the reviewer MAY be appealed to the IESG [RFC2028] 1327 under the same rules as other IETF decisions [RFC2026]. 1329 All approved registration forms are available online in the directory 1330 http://www.iana.org/numbers.html under "languages". 1332 Updates or changes to existing records, including previous 1333 registrations, follow the same procedure as new registrations. The 1334 Language Subtag Reviewer decides whether there is consensus to update 1335 the registration following the two week review period; normally 1336 objections by the original registrant will carry extra weight in 1337 forming such a consensus. 1339 Registrations are permanent and stable. Once registered, subtags 1340 will not be removed from the registry and will remain a valid way in 1341 which to specify a specific language or variant. 1343 Note: The purpose of the "Description" in the registration form is 1344 intended as an aid to people trying to verify whether a language is 1345 registered or what language or language variation a particular subtag 1346 refers to. In most cases, reference to an authoritative grammar or 1347 dictionary of that language will be useful; in cases where no such 1348 work exists, other well known works describing that language or in 1349 that language MAY be appropriate. The subtag reviewer decides what 1350 constitutes "good enough" reference material. This requirement is 1351 not intended to exclude particular languages or dialects due to the 1352 size of the speaker population or lack of a standardized orthography. 1353 Minority languages will be considered equally on their own merits. 1355 3.5 Possibilities for Registration 1357 Possibilities for registration of subtags or information about 1358 subtags include: 1360 o Primary language subtags for languages not listed in ISO 639 that 1361 are not variants of any listed or registered language can be 1362 registered. At the time this document was created there were no 1363 examples of this form of subtag. Before attempting to register a 1364 language subtag, there MUST be an attempt to register the language 1365 with ISO 639. No language subtags will be registered for codes 1366 that exist in ISO 639-1 or ISO 639-2, which are under 1367 consideration by the ISO 639 maintenance or registration 1368 authorities, or which have never been attempted for registration 1369 with those authorities. If ISO 639 has previously rejected a 1370 language for registration, it is reasonable to assume that there 1371 must be additional very compelling evidence of need before it will 1372 be registered in the IANA registry (to the extent that it is very 1373 unlikely that any subtags will be registered of this type). 1375 o Dialect or other divisions or variations within a language, its 1376 orthography, writing system, regional or historical usage, 1377 transliteration or other transformation, or distinguishing 1378 variation MAY be registered as variant subtags. An example is the 1379 'rozaj' subtag (the Resian dialect of Slovenian). 1381 o The addition or maintenance of fields (generally of an 1382 informational nature) in Tag or Subtag records as described in 1383 Section 3.1 and subject to the stability provisions in 1384 Section 3.3. This includes descriptions; comments; deprecation 1385 and preferred values for obsolete or withdrawn codes; or the 1386 addition of script or extlang information to primary language 1387 subtags. 1389 o The addition of records and related field value changes necessary 1390 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1391 UN M.49 as described in Section 3.3. 1393 This document leaves the decision on what subtags or changes to 1394 subtags are appropriate (or not) to the registration process 1395 described in Section 3.4. 1397 Note: four character primary language subtags are reserved to allow 1398 for the possibility of alpha4 codes in some future addition to the 1399 ISO 639 family of standards. 1401 ISO 639 defines a maintenance agency for additions to and changes in 1402 the list of languages in ISO 639. This agency is: 1404 International Information Centre for Terminology (Infoterm) 1405 Aichholzgasse 6/12, AT-1120 1406 Wien, Austria 1407 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1409 ISO 639-2 defines a maintenance agency for additions to and changes 1410 in the list of languages in ISO 639-2. This agency is: 1412 Library of Congress 1413 Network Development and MARC Standards Office 1414 Washington, D.C. 20540 USA 1415 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1416 URL: http://www.loc.gov/standards/iso639 1418 The maintenance agency for ISO 3166 (country codes) is: 1420 ISO 3166 Maintenance Agency 1421 c/o International Organization for Standardization 1422 Case postale 56 1423 CH-1211 Geneva 20 Switzerland 1424 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1425 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1427 The registration authority for ISO 15924 (script codes) is: 1429 Unicode Consortium Box 391476 1430 Mountain View, CA 94039-1476, USA 1431 URL: http://www.unicode.org/iso15924 1433 The Statistics Division of the United Nations Secretariat maintains 1434 the Standard Country or Area Codes for Statistical Use and can be 1435 reached at: 1437 Statistical Services Branch 1438 Statistics Division 1439 United Nations, Room DC2-1620 1440 New York, NY 10017, USA 1442 Fax: +1-212-963-0623 1443 E-mail: statistics@un.org 1444 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1446 3.6 Extensions and Extensions Namespace 1448 Extension subtags are those introduced by single-letter subtags other 1449 than 'x'. They are reserved for the generation of identifiers which 1450 contain a language component, and are compatible with applications 1451 that understand language tags. For example, they might be used to 1452 define locale identifiers, which are generally based on language. 1454 The structure and form of extensions are defined by this document so 1455 that implementations can be created that are forward compatible with 1456 applications that might be created using single-letter subtags in the 1457 future. In addition, defining a mechanism for maintaining single- 1458 letter subtags will lend to the stability of this document by 1459 reducing the likely need for future revisions or updates. 1461 Allocation of a single-letter subtag SHALL take the form of an RFC 1462 defining the name, purpose, processes, and procedures for maintaining 1463 the subtags. The maintaining or registering authority, including 1464 name, contact email, discussion list email, and URL location of the 1465 registry MUST be indicated clearly in the RFC. The RFC MUST specify 1466 or include each of the following: 1468 o The specification MUST reference the specific version or revision 1469 of this document that governs its creation and MUST reference this 1470 section of this document. 1472 o The specification and all subtags defined by the specification 1473 MUST follow the ABNF and other rules for the formation of tags and 1474 subtags as defined in this document. In particular it MUST 1475 specify that case is not significant and that subtags MUST NOT 1476 exceed eight characters in length. 1478 o The specification MUST specify a canonical representation. 1480 o The specification of valid subtags MUST be available over the 1481 Internet and at no cost. 1483 o The specification MUST be in the public domain or available via a 1484 royalty-free license acceptable to the IETF and specified in the 1485 RFC. 1487 o The specification MUST be versioned and each version of the 1488 specification MUST be numbered, dated, and stable. 1490 o The specification MUST be stable. That is, extension subtags, 1491 once defined by a specification, MUST NOT be retracted or change 1492 in meaning in any substantial way. 1494 o The specification MUST include in a separate section the 1495 registration form reproduced in this section (below) to be used in 1496 registering the extension upon publication as an RFC. 1498 o IANA MUST be informed of changes to the contact information and 1499 URL for the specification. 1501 IANA will maintain a registry of allocated single-letter (singleton) 1502 subtags. This registry will use the record-jar format described by 1503 the ABNF in Section 3.1. Upon publication of an extension as an RFC, 1504 the maintaining authority defined in the RFC MUST forward this 1505 registration form to iesg@ietf.org, who will forward the request to 1506 iana@iana.org. The maintaining authority of the extension MUST 1507 maintain the accuracy of the record by sending an updated full copy 1508 of the record to iana@iana.org with the subject line "LANGUAGE TAG 1509 EXTENSION UPDATE" whenever content changes. Only the 'Comments', 1510 'Contact_Email', 'Mailing_List', and 'URL' fields MAY be modified in 1511 these updates. 1513 Failure to maintain this record, the corresponding registry, or meet 1514 other conditions imposed by this section of this document MAY be 1515 appealed to the IESG [RFC2028] under the same rules as other IETF 1516 decisions (see [RFC2026]) and MAY result in the authority to maintain 1517 the extension being withdrawn or reassigned by the IESG. 1518 %% 1519 Identifier: 1520 Description: 1521 Comments: 1522 Added: 1523 RFC: 1524 Authority: 1525 Contact_Email: 1526 Mailing_List: 1527 URL: 1528 %% 1530 Figure 6: Format of Records in the Language Tag Extensions Registry 1532 'Identifier' contains the single letter subtag (singleton) assigned 1533 to the extension. The Internet-Draft submitted to define the 1534 extension SHOULD specify which letter to use, although the IESG MAY 1535 change the assignment when approving the RFC. 1537 'Description' contains the name and description of the extension. 1539 'Comments' is an OPTIONAL field and MAY contain a broader description 1540 of the extension. 1542 'Added' contains the date the RFC was published in the "full-date" 1543 format specified in [RFC3339]. For example: 2004-06-28 represents 1544 June 28, 2004, in the Gregorian calendar. 1546 'RFC' contains the RFC number assigned to the extension. 1548 'Authority' contains the name of the maintaining authority for the 1549 extension. 1551 'Contact_Email' contains the email address used to contact the 1552 maintaining authority. 1554 'Mailing_List' contains the URL or subscription email address of the 1555 mailing list used by the maintaining authority. 1557 'URL' contains the URL of the registry for this extension. 1559 The determination of whether an Internet-Draft meets the above 1560 conditions and the decision to grant or withhold such authority rests 1561 solely with the IESG, and is subject to the normal review and appeals 1562 process associated with the RFC process. 1564 Extension authors are strongly cautioned that many (including most 1565 well-formed) processors will be unaware of any special relationships 1566 or meaning inherent in the order of extension subtags. Extension 1567 authors SHOULD avoid subtag relationships or canonicalization 1568 mechanisms that interfere with matching or with length restrictions 1569 that sometimes exist in common protocols where the extension is used. 1570 In particular, applications MAY truncate the subtags in doing 1571 matching or in fitting into limited lengths, so it is RECOMMENDED 1572 that the most significant information be in the most significant 1573 (left-most) subtags, and that the specification gracefully handle 1574 truncated subtags. 1576 When a language tag is to be used in a specific, known, protocol, it 1577 is RECOMMENDED that that the language tag not contain extensions not 1578 supported by that protocol. In addition, note that some protocols 1579 MAY impose upper limits on the length of the strings used to store or 1580 transport the language tag. 1582 3.7 Initialization of the Registry 1584 Upon publication of this document as a BCP, the Language Subtag 1585 Registry MUST be created and populated with the initial set of 1586 subtags. This includes converting the entries from the existing IANA 1587 language tag registry defined by RFC 3066 to the new format. This 1588 section defines the process for defining the new registry and 1589 performing the conversion of the old registry. 1591 The impact on the IANA maintainers of the registry of this conversion 1592 will be a small increase in the frequency of new entries. The 1593 initial set of records represents no impact on IANA, since the work 1594 to create it will be performed externally (as defined in this 1595 section). Future work will be limited to inserting or replacing 1596 whole records preformatted for IANA by the Language Subtag Reviewer. 1598 The initial registry will be created by the LTRU working group. 1599 Using the instructions in this document, the working group will 1600 prepare an Informational RFC by creating a series of Internet-Drafts 1601 containing the prototype registry according to the rules in Sections 1602 4.2.2 and 4.2.3 and subject to IESG review as described in Section 1603 6.1.1 of [RFC2026]. 1605 When the Internet-Draft containing the prototype registry has been 1606 approved by the IESG for publication as an RFC, the document will be 1607 forwarded to IANA, which will post the contents of the new registry 1608 on-line. 1610 Tags in the RFC 3066 registry that are not deprecated that consist 1611 entirely of subtags that are defined by this document and which have 1612 the correct form and format for tags defined by this document are 1613 superseded by this document. Such tags MUST be placed in records of 1614 type 'redundant' in the registry. For example, "zh-Hant" is now 1615 defined by this document because 'zh' is an ISO 639-1 code and 'Hant' 1616 is an ISO 15924 code and both are defined in the registry. 1618 Tags in the RFC 3066 registry that contain one or more subtags that 1619 do not match the valid registration pattern or which are not 1620 otherwise defined by this document MUST have records of type 1621 'grandfathered' created in the registry. These records cannot become 1622 type 'redundant' except by revision of this document, but MAY have a 1623 'Deprecated' and 'Preferred-Value' field added to them if a subtag 1624 assignment or combination of assignments renders the tag obsolete. 1626 Tags in the RFC 3066 registry that have a notation that they are 1627 deprecated MUST be maintained as grandfathered entries. The record 1628 for the grandfathered entry MUST contain a 'Deprecated' field with 1629 the most appropriate date that can be determined for when the RFC 1630 3066 record was deprecated. The 'Comments' field SHOULD contain the 1631 reason for the deprecation. The 'Preferred-Value' field MAY contain 1632 a tag that replaces the value. For example, the tag "art-lojban" is 1633 deprecated and will be placed in the grandfathered section. It's 1634 'Deprecated' field will contain the deprecation date (in this case 1635 "2003-09-02") and the 'Preferred-Value' field the value "jbo". 1637 The remaining tags in the RFC 3066 registry are not deprecated, have 1638 a format consistent with language tags as defined by this document, 1639 but contain subtags which are not defined by ISO 639, ISO 15924, or 1640 ISO 3166. These subtags are consistent with registration as 1641 variants. The initial registry SHALL contain appropriate variant 1642 records for the following subtags, and registered RFC 3066 tags 1643 containing these subtags MUST be entered into the initial registry as 1644 type 'redundant': 1646 1901 (use with Prefix: de) 1648 1996 (use with Prefix: de) 1650 nedis (use with Prefix: sl) 1652 rozaj (use with Prefix: sl) 1654 All remaining RFC 3066 registered tags MUST be entered into the 1655 initial registry in records of type 'grandfathered'. Interested 1656 parties MAY use the registration process in Section 3.4 in an attempt 1657 to register the variant subtags not already present in the registry. 1658 If all of the subtags in the original tag become fully defined by the 1659 resulting registrations, then the original tag is superseded by this 1660 document. Such tags MUST have their record changed from type 1661 'grandfathered' to type 'redundant' in the registry. Note that 1662 previous approval of a tag under RFC 3066 is no guarantee of approval 1663 of a variant subtag under this document. The existing RFC 3066 tag 1664 maintains its validity, but the original reason for its registration 1665 might have become obsolete. For example, the subtag 'boont' could be 1666 registered, resulting in the change of the grandfathered tag "en- 1667 boont" to type redundant in the registry. 1669 There MUST be a reasonable period in which the community can comment 1670 on the proposed list entries, which SHALL be no less than four weeks 1671 in length. At the completion of this period, the chair(s) will 1672 notify iana@iana.org and the ltru and ietf-languages mail lists that 1673 the task is complete and forward the necessary materials to IANA for 1674 publication. 1676 Registrations that are in process under the rules defined in RFC 3066 1677 MAY be completed under the former rules, at the discretion of the 1678 language tag reviewer. Any new registrations submitted after the 1679 request for conversion of the registry MUST be rejected. New 1680 registrations completed under RFC 3066 SHALL be entered into the 1681 initial registry using the rules defined just above. 1683 All existing RFC 3066 language tag registrations will be maintained 1684 in perpetuity. 1686 Users of tags that are grandfathered SHOULD consider registering 1687 appropriate subtags in the IANA subtag registry (but are NOT REQUIRED 1688 to). 1690 UN numeric codes assigned to 'macro-geographical (continental)' MUST 1691 be defined in the IANA registry and made valid for use in language 1692 tags. These codes MUST be added to the initial version of the 1693 registry. The UN numeric codes for 'economic groupings' or 'other 1694 groupings', and the alphanumeric codes in Appendix X of the UN 1695 document MUST NOT be added to the registry. The UN numeric codes for 1696 countries or areas not associated with an assigned ISO 3166 alpha-2 1697 code MUST NOT be added to the initial version of the registry. These 1698 values MAY be registered by individuals using the process defined in 1699 Section 3.4 and according to the rules in Section 3.3. 1701 When creating records for ISO 639, ISO 15924, ISO3166, and UN M.49 1702 codes, the following criteria SHALL be applied to the inclusion, 1703 preferred value, and deprecation of codes: 1705 For each standard, the date of the standard referenced in RFC 1766 is 1706 selected as the starting date. Codes that were valid on that date in 1707 the selected standard are added to the registry. Codes that were 1708 previously assigned by but which were vacated or withdrawn before 1709 that date are not added to the registry. For each successive change 1710 to the standard, any additional assignments are added to the 1711 registry. Values that are withdrawn are marked as deprecated, but 1712 not removed. Changes in meaning or assignment of a subtag are 1713 permitted during this process (for example, the ISO 3166 code 'CS' 1714 was originally assigned to 'Czechoslovakia' and is now assigned to 1715 'Serbia and Montenegro'). This continues up to the date that this 1716 document was adopted. The resulting set of records is added to the 1717 registry. Future changes or additions to this portion of the 1718 registry are governed by the provisions of this document. 1720 4. Formation and Processing of Language Tags 1722 This section addresses how to use the information in the registry 1723 with the tag syntax to choose, form and process language tags. 1725 4.1 Choice of Language Tag 1727 One is sometimes faced with the choice between several possible tags 1728 for the same body of text. 1730 Interoperability is best served when all users use the same language 1731 tag in order to represent the same language. If an application has 1732 requirements that make the rules here inapplicable, then that 1733 application risks damaging interoperability. It is strongly 1734 RECOMMENDED that users not define their own rules for language tag 1735 choice. 1737 Subtags SHOULD only be used where they add useful distinguishing 1738 information; extraneous subtags interfere with the meaning, 1739 understanding, and processing of language tags. In particular, users 1740 and implementations SHOULD follow the 'Prefix' and 'Suppress-Script' 1741 fields in the registry (defined in Section 3.1): these fields provide 1742 guidance on when specific additional subtags SHOULD (and SHOULD NOT) 1743 be used in a language tag. 1745 Of particular note, many applications can benefit from the use of 1746 script subtags in language tags, as long as the use is consistent for 1747 a given context. Script subtags were not formally defined in RFC 1748 3066 and their use can affect matching and subtag identification by 1749 implementations of RFC 3066, as these subtags appear between the 1750 primary language and region subtags. For example, if a user requests 1751 content in an implementation of Section 2.5 of [RFC3066] using the 1752 language range "en-US", content labeled "en-Latn-US" will not match 1753 the request. Therefore it is important to know when script subtags 1754 will customarily be used and when they ought not be used. In the 1755 registry, the Suppress-Script field helps ensure greater 1756 compatibility between the language tags generated according to the 1757 rules in this document and language tags and tag processors or 1758 consumers based on RFC 3066 by defining when users SHOULD NOT include 1759 a script subtag with a particular primary language subtag. 1761 Extended language subtags (type 'extlang' in the registry, see 1762 Section 3.1) also appear between the primary language and region 1763 subtags and are reserved for future standardization. Applications 1764 might benefit from their judicious use in forming language tags in 1765 the future. Similar recommendations are expected to apply to their 1766 use as apply to script subtags. 1768 Standards, protocols and applications that reference this document 1769 normatively but apply different rules to the ones given in this 1770 section MUST specify how the procedure varies from the one given 1771 here. 1773 The choice of subtags used to form a language tag SHOULD be guided by 1774 the following rules: 1776 1. Use as precise a tag as possible, but no more specific than is 1777 justified. Avoid using subtags that are not important for 1778 distinguishing content in an application. 1780 * For example, 'de' might suffice for tagging an email written 1781 in German, while "de-CH-1996" is probably unnecessarily 1782 precise for such a task. 1784 2. The script subtag SHOULD NOT be used to form language tags unless 1785 the script adds some distinguishing information to the tag. The 1786 field 'Suppress-Script' in the primary language record in the 1787 registry indicates which script subtags do not add distinguishing 1788 information for most applications. 1790 * For example, the subtag 'Latn' should not be used with the 1791 primary language 'en' because nearly all English documents are 1792 written in the Latin script and it adds no distinguishing 1793 information. However, if a document were written in English 1794 mixing Latin script with another script such as Braille 1795 ('Brai'), then it might be appropriate to choose to indicate 1796 both scripts to aid in content selection, such as the 1797 application of a stylesheet. 1799 3. If a tag or subtag has a 'Preferred-Value' field in its registry 1800 entry, then the value of that field SHOULD be used to form the 1801 language tag in preference to the tag or subtag in which the 1802 preferred value appears. 1804 * For example, use 'he' for Hebrew in preference to 'iw'. 1806 4. The 'und' (Undetermined) primary language subtag SHOULD NOT be 1807 used to label content, even if the language is unknown. Omitting 1808 the language tag altogether is preferred to using a tag with a 1809 primary language subtag of 'und'. The 'und' subtag MAY be useful 1810 for protocols that require a language tag to be provided. The 1811 'und' subtag MAY also be useful when matching language tags in 1812 certain situations. 1814 5. The 'mul' (Multiple) primary language subtag SHOULD NOT be used 1815 whenever the protocol allows the separate tags for multiple 1816 languages, as is the case for the Content-Language header in 1817 HTTP. The 'mul' subtag conveys little useful information: 1818 content in multiple languages SHOULD individually tag the 1819 languages where they appear or otherwise indicate the actual 1820 language in preference to the 'mul' subtag. 1822 6. The same variant subtag SHOULD NOT be used more than once within 1823 a language tag. 1825 * For example, do not use "de-DE-1901-1901". 1827 To ensure consistent backward compatibility, this document contains 1828 several provisions to account for potential instability in the 1829 standards used to define the subtags that make up language tags. 1830 These provisions mean that no language tag created under the rules in 1831 this document will become obsolete. 1833 4.2 Meaning of the Language Tag 1835 The relationship between the tag and the information it relates to is 1836 defined by the the context in which the tag appears. Accordingly, 1837 this section can only give possible examples of its usage. 1839 o For a single information object, the associated language tags 1840 might be interpreted as the set of languages that is necessary for 1841 a complete comprehension of the complete object. Example: Plain 1842 text documents. 1844 o For an aggregation of information objects, the associated language 1845 tags could be taken as the set of languages used inside components 1846 of that aggregation. Examples: Document stores and libraries. 1848 o For information objects whose purpose is to provide alternatives, 1849 the associated language tags could be regarded as a hint that the 1850 content is provided in several languages, and that one has to 1851 inspect each of the alternatives in order to find its language or 1852 languages. In this case, the presence of multiple tags might not 1853 mean that one needs to be multi-lingual to get complete 1854 understanding of the document. Example: MIME multipart/ 1855 alternative. 1857 o In markup languages, such as HTML and XML, language information 1858 can be added to each part of the document identified by the markup 1859 structure (including the whole document itself). For example, one 1860 could write C'est la vie. inside a 1861 Norwegian document; the Norwegian-speaking user could then access 1862 a French-Norwegian dictionary to find out what the marked section 1863 meant. If the user were listening to that document through a 1864 speech synthesis interface, this formation could be used to signal 1865 the synthesizer to appropriately apply French text-to-speech 1866 pronunciation rules to that span of text, instead of applying the 1867 inappropriate Norwegian rules. 1869 Language tags are related when they contain a similar sequence of 1870 subtags. For example, if a language tag B contains language tag A as 1871 a prefix, then B is typically "narrower" or "more specific" than A. 1872 Thus "zh-Hant-TW" is more specific than "zh-Hant". 1874 This relationship is not guaranteed in all cases: specifically, 1875 languages that begin with the same sequence of subtags are NOT 1876 guaranteed to be mutually intelligible, although they might be. For 1877 example, the tag "az" shares a prefix with both "az-Latn" 1878 (Azerbaijani written using the Latin script) and "az-Cyrl" 1879 (Azerbaijani written using the Cyrillic script). A person fluent in 1880 one script might not be able to read the other, even though the text 1881 might be identical. Content tagged as "az" most probably is written 1882 in just one script and thus might not be intelligible to a reader 1883 familiar with the other script. 1885 4.3 Length Considerations 1887 [RFC3066] did not provide an upper limit on the size of language 1888 tags. While RFC 3066 did define the semantics of particular subtags 1889 in such a way that most language tags consisted of language and 1890 region subtags with a combined total length of up to six characters, 1891 larger registered tags were not only possible but were actually 1892 registered. 1894 Neither the language tag syntax nor other requirements in this 1895 document impose a fixed upper limit on the number of subtags in a 1896 language tag (and thus an upper bound on the size of a tag). The 1897 language tag syntax suggests that, depending on the specific 1898 language, more subtags (and thus a longer tag) are sometimes 1899 necessary to completely identify the language for certain 1900 applications; thus it is possible to envision long or complex subtag 1901 sequences. 1903 4.3.1 Working with Limited Buffer Sizes 1905 Some applications and protocols are forced to allocate fixed buffer 1906 sizes or otherwise limit the length of a language tag. A conformant 1907 implementation or specification MAY refuse to support the storage of 1908 language tags which exceed a specified length. Any such limitation 1909 SHOULD be clearly documented, and such documentation SHOULD include 1910 what happens to longer tags (for example, whether an error value is 1911 generated or the language tag is truncated). A protocol that allows 1912 tags to be truncated at an arbitrary limit, without giving any 1913 indication of what that limit is, has the potential for causing harm 1914 by changing the meaning of tags in substantial ways. 1916 In practice, most language tags do not require more than a few 1917 subtags and will not approach reasonably sized buffer limitations: 1918 see Section 4.1. 1920 Some specifications or protocols have limits on tag length but do not 1921 have a fixed length limitation. For example, [RFC2231] has no 1922 explicit length limitation: the length available for the language tag 1923 is constrained by the length of other header components (such as the 1924 charset's name) coupled with the 76 character limit in [RFC2047]. 1925 Thus the "limit" might be 50 or more characters, but it could 1926 potentially be quite small. 1928 The considerations for assigning a buffer limit are: 1930 Implementations SHOULD NOT truncate language tags unless the 1931 meaning of the tag is purposefully being changed, or unless the 1932 tag does not fit into a limited buffer size specified by a 1933 protocol for storage or transmission. 1935 Implementations SHOULD warn the user when a tag is truncated since 1936 truncation changes the semantic meaning of the tag. 1938 Implementations of protocols or specifications that are space 1939 constrained but do not have a fixed limit SHOULD use the longest 1940 possible tag in preference to truncation. 1942 Protocols or specifications that specify limited buffer sizes for 1943 language tags MUST allow for language tags of up to 33 characters. 1945 Protocols or specifications that specify limited buffer sizes for 1946 language tags SHOULD allow for language tags of at least 42 1947 characters. 1949 The following illustration shows how the 42-character recommendation 1950 was derived. The combination of language and extended language 1951 subtags was chosen for future compatibility. At up to 15 characters, 1952 this combination is longer than the longest possible primary language 1953 subtag (8 characters): 1955 language = 3 (ISO 639-2; ISO 639-1 requires 2) 1956 extlang1 = 4 (each subsequent subtag includes '-') 1957 extlang2 = 4 (unlikely: needs prefix="language-extlang1") 1958 extlang3 = 4 (extremely unlikely) 1959 script = 5 (if not suppressed: see Section 4.1) 1960 region = 4 (UN M.49; ISO 3166 requires 3) 1961 variant1 = 9 (MUST have language as a prefix) 1962 variant2 = 9 (MUST have language-variant1 as a prefix) 1964 total = 42 characters 1966 Figure 7: Derivation of the Limit on Tag Length 1968 4.3.2 Truncation of Language Tags 1970 Truncation of a language tag alters the meaning of the tag, and thus 1971 SHOULD be avoided. However, truncation of language tags is sometimes 1972 necessary due to limited buffer sizes. Such truncation MUST NOT 1973 permit a subtag to be chopped off in the middle or the formation of 1974 invalid tags (for example, one ending with the "-" character). 1976 This means that applications or protocols which truncate tags MUST do 1977 so by progressively removing subtags along with their preceding "-" 1978 from the right side of the language tag until the tag is short enough 1979 for the given buffer. If the resulting tag ends with a single- 1980 character subtag, that subtag and its preceding "-" MUST also be 1981 removed. For example: 1983 Tag to truncate: zh-Hant-CN-variant1-a-extend1-x-wadegile-private1 1984 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 1985 2. zh-Latn-CN-variant1-a-extend1 1986 3. zh-Latn-CN-variant1 1987 4. zh-Latn-CN 1988 5. zh-Latn 1989 6. zh 1991 Figure 8: Example of Tag Truncation 1993 4.4 Canonicalization of Language Tags 1995 Since a particular language tag is sometimes used by many processes, 1996 language tags SHOULD always be created or generated in a canonical 1997 form. 1999 A language tag is in canonical form when: 2001 1. The tag is well-formed according the rules in Section 2.1 and 2002 Section 2.2. 2004 2. Subtags of type 'Region' that have a Preferred-Value mapping in 2005 the IANA registry (see Section 3.1) SHOULD be replaced with their 2006 mapped value. 2008 3. Redundant or grandfathered tags that have a Preferred-Value 2009 mapping in the IANA registry (see Section 3.1) MUST be replaced 2010 with their mapped value. These items are either deprecated 2011 mappings created before the adoption of this document (such as 2012 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 2013 the result of later registrations or additions to this document 2014 (for example, "zh-guoyu" might be mapped to a language-extlang 2015 combination such as "zh-cmn" by some future update of this 2016 document). 2018 4. Other subtags that have a Preferred-Value mapping in the IANA 2019 registry (see Section 3.1) MUST be replaced with their mapped 2020 value. These items consist entirely of clerical corrections to 2021 ISO 639-1 in which the deprecated subtags have been maintained 2022 for compatibility purposes. 2024 5. If more than one extension subtag sequence exists, the extension 2025 sequences are ordered into case-insensitive ASCII order by 2026 singleton subtag. 2028 Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical 2029 form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in 2030 canonical form. 2032 Example: The language tag "en-NH" (English as used in the New 2033 Hebrides) is not canonical because the 'NH' subtag has a canonical 2034 mapping to 'VU' (Vanuatu), although the tag "en-NH" maintains its 2035 validity. 2037 Canonicalization of language tags does not imply anything about the 2038 use of upper or lowercase letters when processing or comparing 2039 subtags (and as described in Section 2.1). All comparisons MUST be 2040 performed in a case-insensitive manner. 2042 When performing canonicalization of language tags, processors MAY 2043 regularize the case of the subtags (that is, this process is 2044 OPTIONAL), following the case used in the registry. Note that this 2045 corresponds to the following casing rules: uppercase all non-initial 2046 two-letter subtags; titlecase all non-initial four-letter subtags; 2047 lowercase everything else. 2049 Note: Case folding of ASCII letters in certain locales, unless 2050 carefully handled, sometimes produces non-ASCII character values. 2051 The Unicode Character Database file "SpecialCasing.txt" defines the 2052 specific cases that are known to cause problems with this. In 2053 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2054 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2055 Implementers SHOULD specify a locale-neutral casing operation to 2056 ensure that case folding of subtags does not produce this value, 2057 which is illegal in language tags. For example, if one were to 2058 uppercase the region subtag 'in' using Turkish locale rules, the 2059 sequence U+0130 U+004E would result instead of the expected 'IN'. 2061 Note: if the field 'Deprecated' appears in a registry record without 2062 an accompanying 'Preferred-Value' field, then that tag or subtag is 2063 deprecated without a replacement. Validating processors SHOULD NOT 2064 generate tags that include these values, although the values are 2065 canonical when they appear in a language tag. 2067 An extension MUST define any relationships that exist between the 2068 various subtags in the extension and thus MAY define an alternate 2069 canonicalization scheme for the extension's subtags. Extensions MAY 2070 define how the order of the extension's subtags are interpreted. For 2071 example, an extension could define that its subtags are in canonical 2072 order when the subtags are placed into ASCII order: that is, "en-a- 2073 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2074 define that the order of the subtags influences their semantic 2075 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2076 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2077 so that they are tolerant of the typical processes described in 2078 Section 3.6. 2080 4.5 Considerations for Private Use Subtags 2082 Private-use subtags require private agreement between the parties 2083 that intend to use or exchange language tags that use them and great 2084 caution SHOULD be used in employing them in content or protocols 2085 intended for general use. Private-use subtags are simply useless for 2086 information exchange without prior arrangement. 2088 The value and semantic meaning of private-use tags and of the subtags 2089 used within such a language tag are not defined by this document. 2091 The use of subtags defined in the IANA registry as having a specific 2092 private use meaning convey more information that a purely private use 2093 tag prefixed by the singleton subtag 'x'. For applications this 2094 additional information MAY be useful. 2096 For example, the region subtags 'AA', 'ZZ' and in the ranges 2097 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2098 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2099 great deal of public, interchangeable information about the language 2100 material (that it is Chinese in the simplified Chinese script and is 2101 suitable for some geographic region 'XQ'). While the precise 2102 geographic region is not known outside of private agreement, the tag 2103 conveys far more information than an opaque tag such as "x-someLang", 2104 which contains no information about the language subtag or script 2105 subtag outside of the private agreement. 2107 However, in some cases content tagged with private use subtags MAY 2108 interact with other systems in a different and possibly unsuitable 2109 manner compared to tags that use opaque, privately defined subtags, 2110 so the choice of the best approach sometimes depends on the 2111 particular domain in question. 2113 5. IANA Considerations 2115 This section deals with the processes and requirements necessary for 2116 IANA to undertake to maintain the subtag and extension registries as 2117 defined by this document and in accordance with the requirements of 2118 [RFC2434]. 2120 The impact on the IANA maintainers of the two registries defined by 2121 this document will be a small increase in the frequency of new 2122 entries or updates. 2124 Upon adoption of this document, the process described in Section 3.7 2125 will be used to generate the initial Language Subtag Registry. The 2126 initial set of records represents no impact on IANA, since the work 2127 to create it will be performed externally (as defined in that 2128 section). The new registry will be listed under "Language Tags" at 2129 . The existing directory of 2130 registration forms and RFC 3066 registrations will be relabeled as 2131 "Language Tags (Obsolete)" and maintained (but not added to or 2132 modified). 2134 Future work on the Language Subtag Registry will be limited to 2135 inserting or replacing whole records preformatted for IANA by the 2136 Language Subtag Reviewer as described in Section 3.2 of this 2137 document. Each record will be sent to iana@iana.org with a subject 2138 line indicating whether the enclosed record is an insertion (of a new 2139 record) or a replacement of an existing record which has a Type and 2140 Subtag (or Tag) field that exactly matches the record sent. Records 2141 cannot be deleted from the registry. 2143 The Language Tag Extensions registry will also be generated and sent 2144 to IANA as described in Section 3.6. This registry can contain at 2145 most 35 records and thus changes to this registry are expected to be 2146 very infrequent. 2148 Future work by IANA on the Language Tag Extensions Registry is 2149 limited to two cases. First, the IESG MAY request that new records 2150 be inserted into this registry from time to time. These requests 2151 will include the record to insert in the exact format described in 2152 Section 3.6. In addition, there MAY be occasional requests from the 2153 maintaining authority for a specific extension to update the contact 2154 information or URLs in the record. These requests MUST include the 2155 complete, updated record. IANA is not responsible for validating the 2156 information provided, only that it is properly formatted. It should 2157 reasonably be seen to come from the maintaining authority named in 2158 the record present in the registry. 2160 6. Security Considerations 2162 Language tags used in content negotiation, like any other information 2163 exchanged on the Internet, might be a source of concern because they 2164 might be used to infer the nationality of the sender, and thus 2165 identify potential targets for surveillance. 2167 This is a special case of the general problem that anything sent is 2168 visible to the receiving party and possibly to third parties as well. 2169 It is useful to be aware that such concerns can exist in some cases. 2171 The evaluation of the exact magnitude of the threat, and any possible 2172 countermeasures, is left to each application protocol (see BCP 72 2173 [RFC3552] for best current practice guidance on security threats and 2174 defenses). 2176 The language tag associated with a particular information item is of 2177 no consequence whatsoever in determining whether that content might 2178 contain possible homographs. The fact that a text is tagged as being 2179 in one language or using a particular script subtag provides no 2180 assurance whatsoever that it does not contain characters from scripts 2181 other than the one(s) associated with or specified by that language 2182 tag. 2184 Since there is no limit to the number of variant, private use, and 2185 extension subtags, and consequently no limit on the possible length 2186 of a tag, implementations need to guard against buffer overflow 2187 attacks. See Section 4.3 for details on language tag truncation, 2188 which can occur as a consequence of defenses against buffer overflow. 2190 Although the specification of valid subtags for an extension (see: 2191 Section 3.6) MUST be available over the Internet, implementations 2192 SHOULD NOT mechanically depend on it being always accessible, to 2193 prevent denial-of-service attacks. 2195 7. Character Set Considerations 2197 The syntax in this document requires that language tags use only the 2198 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2199 character sets, so the composition of language tags should not have 2200 any character set issues. 2202 Rendering of characters based on the content of a language tag is not 2203 addressed in this memo. Historically, some languages have relied on 2204 the use of specific character sets or other information in order to 2205 infer how a specific character should be rendered (notably this 2206 applies to language and culture specific variations of Han ideographs 2207 as used in Japanese, Chinese, and Korean). When language tags are 2208 applied to spans of text, rendering engines can use that information 2209 in deciding which font to use in the absence of other information, 2210 particularly where languages with distinct writing traditions use the 2211 same characters. 2213 8. Changes from RFC 3066 2215 The main goals for this revision of language tags were the following: 2217 *Compatibility.* All valid RFC 3066 language tags (including those 2218 in the IANA registry) remain valid in this specification. Thus 2219 there is complete backward compatibility of this specification with 2220 existing content. In addition, this document defines language tags 2221 in such as way as to ensure future compatibility, and processors 2222 based solely on the RFC 3066 ABNF (such as those described in 2223 [XMLSchema]) will be able to process tags described by this document. 2225 *Stability.* Because of the changes in underlying ISO standards, a 2226 valid RFC 3066 language tag may become invalid (or have its meaning 2227 change) at a later date. With so much of the world's computing 2228 infrastructure dependent on language tags, this is simply 2229 unacceptable: it invalidates content that may have an extensive 2230 shelf-life. In this specification, once a language tag is valid, it 2231 remains valid forever. Previously, there was no way to determine 2232 when two tags were equivalent. This specification provides a stable 2233 mechanism for doing so, through the use of canonical forms. These 2234 are also stable, so that implementations can depend on the use of 2235 canonical forms to assess equivalency. 2237 *Validity.* The structure of language tags defined by this document 2238 makes it possible to determine if a particular tag is well-formed 2239 without regard for the actual content or "meaning" of the tag as a 2240 whole. This is important because the registry and underlying 2241 standards change over time. In addition, it must be possible to 2242 determine if a tag is valid (or not) for a given point in time in 2243 order to provide reproducible, testable results. This process must 2244 not be error-prone; otherwise even intelligent people will generate 2245 implementations that give different results. This specification 2246 provides for that by having a single data file, with specific 2247 versioning information, so that the validity of language tags at any 2248 point in time can be precisely determined (instead of interpolating 2249 values from many separate sources). 2251 *Extensibility.* It is important to be able to differentiate between 2252 written forms of language -- for many implementations this is more 2253 important than distinguishing between spoken variants of a language. 2254 Languages are written in a wide variety of different scripts, so this 2255 document provides for the generative use of ISO 15924 script codes. 2256 Like the generative use of ISO language and country codes in RFC 2257 3066, this allows combinations to be produced without resorting to 2258 the registration process. The addition of UN codes provides for the 2259 generation of language tags with regional scope, which is also 2260 required for information technology. 2262 The recast of the registry from containing whole language tags to 2263 subtags is a key part of this. An important feature of RFC 3066 was 2264 that it allowed generative use of subtags. This allows people to 2265 meaningfully use generated tags, without the delays in registering 2266 whole tags, and the burden on the registry of having to supply all of 2267 the combinations that people may find useful. 2269 Because of the widespread use of language tags, it is potentially 2270 disruptive to have periodic revisions of the core specification, 2271 despite demonstrated need. The extension mechanism provides for a 2272 way for independent RFCs to define extensions to language tags. 2273 These extensions have a very constrained, well-defined structure to 2274 prevent extensions from interfering with implementations of language 2275 tags defined in this document. The document also anticipates 2276 features of ISO 639-3 with the addition of the extended language 2277 subtags, as well as the possibility of other ISO 639 parts becoming 2278 useful for the formation of language tags in the future. The use and 2279 definition of private use tags has also been modified, to allow 2280 people to move as much information as possible out of private use 2281 tags, and into the regular structure. The goal is to dramatically 2282 reduce the need to produce a revision of this document in the future. 2284 The specific changes in this document to meet these goals are: 2286 o Defines the ABNF and rules for subtags so that the category of all 2287 subtags can be determined without reference to the registry. 2289 o Adds the concept of well-formed vs. validating processors, 2290 defining the rules by which an implementation can claim to be one 2291 or the other. 2293 o Replaces the IANA language tag registry with a language subtag 2294 registry that provides a complete list of valid subtags in the 2295 IANA registry. This allows for robust implementation and ease of 2296 maintenance. The language subtag registry becomes the canonical 2297 source for forming language tags. 2299 o Provides a process that guarantees stability of language tags, by 2300 handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in 2301 the event that they register a previously used value for a new 2302 purpose. 2304 o Allows ISO 15924 script code subtags and allows them to be used 2305 generatively. Defines a method for indicating in the registry 2306 when script subtags are necessary for a given language tag. 2308 o Adds the concept of a variant subtag and allows variants to be 2309 used generatively. 2311 o Adds the ability to use a class of UN M.49 tags for supra- 2312 national regions and to resolve conflicts in the assignment of ISO 2313 3166 codes. 2315 o Defines the private-use tags in ISO 639, ISO 15924, and ISO 3166 2316 as the mechanism for creating private-use language, script, and 2317 region subtags respectively. 2319 o Adds a well-defined extension mechanism. 2321 o Defines an extended language subtag, possibly for use with certain 2322 anticipated features of ISO 639-3. 2324 Ed Note: The following items are provided for the convenience of 2325 reviewers and will be removed from the final document. 2327 Changes between draft-ietf-ltru-registry-05 and this version are: 2329 o Changes to the initial population rules to pre-register four 2330 subtags. This included changing all the variant examples to use 2331 just those four subtags (nedis, rozaj, 1996, and 1901) in 2332 appropriate ways. It also includes substandtial wordsmithing of 2333 the rules on handling RFC 3066 grandfathered/redundant 2334 registrations (A.Phillips) 2336 o Rewrote the introduction to use "tag" instead of many (long, 2337 convoluted) synonyms and to generally simplify the text. (thread 2338 of #944) (M.Duerst, A.Phillips) 2340 o Added an introduction to Section 2 (moved from Section 4.2). 2341 (M.Duerst) 2343 o Reorganized the resulting Section 4.2. 2345 o Divided Section 4.3 by added two subsections, moving paragraphs to 2346 fit into the proper sub-section. Made the actual requirements 2347 into a list so that they would be very visible. (I.McDonald) 2349 o Added the processing instruction symrefs='yes' (F.Ellermann) 2351 o Moved Length Considerations from Section 2.1 to Section 4.3. Some 2352 text was moved or reorganized as a result and a small change was 2353 made in Section 4.1 (Choice) to ensure that no information was 2354 lost. (A.Phillips) 2356 o Added a small description of each subtag type to the sub-section 2357 on each subtag in Section 2.1. (F.Charles) 2359 o Modified the restriction on using extended language subtags in 2360 Section 2.2.2 so that it is clearer. (J.Cowan) 2362 9. References 2364 9.1 Normative References 2366 [ISO639-1] 2367 International Organization for Standardization, "ISO 639- 2368 1:2002, Codes for the representation of names of languages 2369 -- Part 1: Alpha-2 code", ISO Standard 639, 2002, . 2372 [ISO639-2] 2373 International Organization for Standardization, "ISO 639- 2374 2:1998 - Codes for the representation of names of 2375 languages -- Part 2: Alpha-3 code - edition 1", 2376 August 1988, . 2378 [ISO15924] 2379 ISO TC46/WG3, "ISO 15924:2003 (E/F) - Codes for the 2380 representation of names of scripts", January 2004, . 2383 [ISO3166] International Organization for Standardization, "Codes for 2384 the representation of names of countries, 3rd edition", 2385 ISO Standard 3166, August 1988, . 2387 [UN_M.49] Statistical Division, United Nations, "Standard Country or 2388 Area Codes for Statistical Use", UN Standard Country or 2389 Area Codes for Statistical Use, Revision 4 (United Nations 2390 publication, Sales No. 98.XVII.9, June 1999, . 2392 [ISO10646] 2393 International Organization for Standardization, "ISO/IEC 2394 10646-1:2000. Information technology -- Universal 2395 Multiple-Octet Coded Character Set (UCS) -- Part 1: 2396 Architecture and Basic Multilingual Plane and ISO/IEC 2397 10646-2:2001. Information technology -- Universal 2398 Multiple-Octet Coded Character Set (UCS) -- Part 2: 2399 Supplementary Planes, as, from time to time, amended, 2400 replaced by a new edition or expanded by the addition of 2401 new parts", 2000, . 2403 [RFC2234bis] 2404 Crocker, D. and P. Overell, "Augmented BNF for Syntax 2405 Specifications: ABNF", draft-crocker-abnf-rfc2234bis-00 2406 (work in progress), March 2005. 2408 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2409 3", BCP 9, RFC 2026, October 1996. 2411 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2412 the IETF Standards Process", BCP 11, RFC 2028, 2413 October 1996. 2415 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2416 Part Three: Message Header Extensions for Non-ASCII Text", 2417 RFC 2047, November 1996. 2419 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2420 Requirement Levels", BCP 14, RFC 2119, March 1997. 2422 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2423 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2424 October 1998. 2426 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2427 10646", RFC 2781, February 2000. 2429 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2430 Understanding Concerning the Technical Work of the 2431 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2433 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2434 Timestamps", RFC 3339, July 2002. 2436 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2437 Text on Security Considerations", BCP 72, RFC 3552, 2438 July 2003. 2440 9.2 Informative References 2442 [iso639.principles] 2443 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2444 Committee: Working principles for ISO 639 maintenance", 2445 March 2000, 2446 . 2449 [record-jar] 2450 Raymond, E., "The Art of Unix Programming", 2003. 2452 [XML10] Bray (et al), T., "Extensible Markup Language (XML) 1.0", 2453 02 2004. 2455 [XMLSchema] 2456 Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part 2: 2457 Datatypes Second Edition", 10 2004, < 2458 http://www.w3.org/TR/xmlschema-2/>. 2460 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2461 Standard, Version 4.1.0, defined by: The Unicode Standard, 2462 Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321- 2463 18578-1), as amended by Unicode 4.0.1 2464 (http://www.unicode.org/versions/Unicode4.0.1) and by 2465 Unicode 4.1.0 2466 (http://www.unicode.org/versions/Unicode4.1.0).", 2467 March 2005. 2469 [RFC1766] Alvestrand, H., "Tags for the Identification of 2470 Languages", RFC 1766, March 1995. 2472 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2473 Word Extensions: Character Sets, Languages, and 2474 Continuations", RFC 2231, November 1997. 2476 [RFC3066] Alvestrand, H., "Tags for the Identification of 2477 Languages", BCP 47, RFC 3066, January 2001. 2479 Authors' Addresses 2481 Addison Phillips (editor) 2482 Quest Software 2484 Email: addison.phillips@quest.com 2486 Mark Davis (editor) 2487 IBM 2489 Email: mark.davis@us.ibm.com 2491 Appendix A. Acknowledgements 2493 Any list of contributors is bound to be incomplete; please regard the 2494 following as only a selection from the group of people who have 2495 contributed to make this document what it is today. 2497 The contributors to RFC 3066 and RFC 1766, the precursors of this 2498 document, made enormous contributions directly or indirectly to this 2499 document and are generally responsible for the success of language 2500 tags. 2502 The following people (in alphabetical order) contributed to this 2503 document or to RFCs 1766 and 3066: 2505 Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet, 2506 Nathaniel Borenstein, Eric Brunner, Sean M. Burke, M.T. Carrasco 2507 Benitez, Jeremy Carroll, John Clews, Jim Conklin, Peter Constable, 2508 John Cowan, Mark Crispin, Dave Crocker, Martin Duerst, Frank 2509 Ellerman, Michael Everson, Doug Ewell, Ned Freed, Tim Goodwin, Dirk- 2510 Willem van Gulik, Marion Gunn, Joel Halpren, Elliotte Rusty Harold, 2511 Paul Hoffman, Scott Hollenbeck, Richard Ishida, Olle Jarnefors, Kent 2512 Karlsson, John Klensin, Alain LaBonte, Eric Mader, Ira McDonald, 2513 Keith Moore, Chris Newman, Masataka Ohta, Randy Presuhn, George 2514 Rhoten, Markus Scherer, Keld Jorn Simonsen, Thierry Sourbier, Otto 2515 Stolz, Tex Texin, Andrea Vine, Rhys Weatherley, Misha Wolf, Francois 2516 Yergeau and many, many others. 2518 Very special thanks must go to Harald Tveit Alvestrand, who 2519 originated RFCs 1766 and 3066, and without whom this document would 2520 not have been possible. Special thanks must go to Michael Everson, 2521 who has served as language tag reviewer for almost the complete 2522 period since the publication of RFC 1766. Special thanks to Doug 2523 Ewell, for his production of the first complete subtag registry, and 2524 his work in producing a test parser for verifying language tags. 2526 Appendix B. Examples of Language Tags (Informative) 2528 Simple language subtag: 2530 de (German) 2532 fr (French) 2534 ja (Japanese) 2536 i-enochian (example of a grandfathered tag) 2538 Language subtag plus Script subtag: 2540 zh-Hant (Chinese written using the Traditional Chinese script) 2542 zh-Hans (Chinese written using the Simplified Chinese script) 2544 sr-Cyrl (Serbian written using the Cyrillic script) 2546 sr-Latn (Serbian written using the Latin script) 2548 Language-Script-Region: 2550 zh-Hans-CN (Chinese written using the Simplified script as used in 2551 mainland China) 2553 sr-Latn-CS (Serbian written using the Latin script as used in 2554 Serbia and Montenegro) 2556 Language-Variant: 2558 sl-rozaj (Resian dialect of Slovenian 2560 sl-nedis (Nadiza dialect of Slovenian) 2562 Language-Region-Variant: 2564 de-CH-1901 (German as used in Switzerland using the 1901 variant 2565 [othography]) 2567 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 2569 Language-Script-Region-Variant: 2571 sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the 2572 Latin script as used in Italy. Note that this tag is NOT 2573 RECOMMENDED because subtag 'sl' has a Suppress-Script value of 2574 'Latn') 2576 Language-Region: 2578 de-DE (German for Germany) 2580 en-US (English as used in the United States) 2582 es-419 (Spanish for Latin America and Caribbean region using the 2583 UN region code) 2585 Private-use subtags: 2587 de-CH-x-phonebk 2589 az-Arab-x-AZE-derbend 2591 Extended language subtags (examples ONLY: extended languages MUST be 2592 defined by revision or update to this document): 2594 zh-min 2596 zh-min-nan-Hant-CN 2598 Private-use registry values: 2600 x-whatever (private use using the singleton 'x') 2602 qaa-Qaaa-QM-x-southern (all private tags) 2604 de-Qaaa (German, with a private script) 2606 sr-Latn-QM (Serbian, Latin-script, private region) 2608 sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro) 2610 Tags that use extensions (examples ONLY: extensions MUST be defined 2611 by revision or update to this document or by RFC): 2613 en-US-u-islamCal 2615 zh-CN-a-myExt-x-private 2616 en-a-myExt-b-another 2618 Some Invalid Tags: 2620 de-419-DE (two region tags) 2622 a-DE (use of a single character subtag in primary position; note 2623 that there are a few grandfathered tags that start with "i-" that 2624 are valid) 2626 ar-a-aaa-b-bbb-a-ccc (two extensions with same single letter 2627 prefix) 2629 Appendix C. Example Registry 2631 Example Registry 2633 File-Date: 2005-04-18 2634 %% 2635 Type: language 2636 Subtag: aa 2637 Description: Afar 2638 Added: 2004-07-06 2639 %% 2640 Type: language 2641 Subtag: ab 2642 Description: Abkhazian 2643 Added: 2004-07-06 2644 %% 2645 Type: language 2646 Subtag: ae 2647 Description: Avestan 2648 Added: 2004-07-06 2649 %% 2650 Type: language 2651 Subtag: ar 2652 Description: Arabic 2653 Added: 2004-07-06 2654 Suppress-Script: Arab 2655 Comment: Arabic text is usually written in Arabic script 2656 %% 2657 Type: language 2658 Subtag: qaa..qtz 2659 Description: PRIVATE USE 2660 Added: 2004-08-01 2661 Comment: Use private use codes in preference 2662 to the x- singleton for primary language 2663 Comment: This is an example of two comments. 2664 %% 2665 Type: script 2666 Subtag: Arab 2667 Description: Arabic 2668 Added: 2004-07-06 2669 %% 2670 Type: script 2671 Subtag: Armn 2672 Description: Armenian 2673 Added: 2004-07-06 2674 %% 2675 Type: script 2676 Subtag: Bali 2677 Description: Balinese 2678 Added: 2004-07-06 2679 %% 2680 Type: script 2681 Subtag: Batk 2682 Description: Batak 2683 Added: 2004-07-06 2684 %% 2685 Type: region 2686 Subtag: AA 2687 Description: PRIVATE USE 2688 Added: 2004-08-01 2689 %% 2690 Type: region 2691 Subtag: AD 2692 Description: Andorra 2693 Added: 2004-07-06 2694 %% 2695 Type: region 2696 Subtag: AE 2697 Description: United Arab Emirates 2698 Added: 2004-07-06 2699 %% 2700 Type: region 2701 Subtag: AX 2702 Description: Åland Islands 2703 Added: 2004-07-06 2704 Comments: The description shows a Unicode escape 2705 for the letter A-ring. 2706 %% 2707 Type: region 2708 Subtag: 001 2709 Description: World 2710 Added: 2004-07-06 2711 %% 2712 Type: region 2713 Subtag: 002 2714 Description: Africa 2715 Added: 2004-07-06 2716 %% 2717 Type: region 2718 Subtag: 003 2719 Description: North America 2720 Added: 2004-07-06 2721 %% 2722 Type: variant 2723 Subtag: 1901 2724 Description: Traditional German 2725 orthography 2726 Added: 2004-09-09 2727 Prefix: de 2728 Comment: 2729 %% 2730 Type: variant 2731 Subtag: nedis 2732 Description: Nadiza dialect 2733 Description: Natisone dialect 2734 Added: 2003-10-09 2735 Prefix: sl 2736 %% 2737 Type: grandfathered 2738 Tag: art-lojban 2739 Description: Lojban 2740 Added: 2001-11-11 2741 Canonical: jbo 2742 Deprecated: 2003-09-02 2743 %% 2744 Type: grandfathered 2745 Tag: en-GB-oed 2746 Description: English, Oxford English Dictionary spelling 2747 Added: 2003-07-09 2748 %% 2749 Type: grandfathered 2750 Tag: i-ami 2751 Description: 'Amis 2752 Added: 1999-05-25 2753 %% 2754 Type: grandfathered 2755 Tag: i-bnn 2756 Description: Bunun 2757 Added: 1999-05-25 2758 %% 2759 Type: redundant 2760 Tag: az-Arab 2761 Description: Azerbaijani in Arabic script 2762 Added: 2003-05-30 2763 %% 2764 Type: redundant 2765 Tag: az-Cyrl 2766 Description: Azerbaijani in Cyrillic script 2767 Added: 2003-05-30 2768 %% 2770 Figure 9: Example of the Registry Format 2772 Intellectual Property Statement 2774 The IETF takes no position regarding the validity or scope of any 2775 Intellectual Property Rights or other rights that might be claimed to 2776 pertain to the implementation or use of the technology described in 2777 this document or the extent to which any license under such rights 2778 might or might not be available; nor does it represent that it has 2779 made any independent effort to identify any such rights. Information 2780 on the procedures with respect to rights in RFC documents can be 2781 found in BCP 78 and BCP 79. 2783 Copies of IPR disclosures made to the IETF Secretariat and any 2784 assurances of licenses to be made available, or the result of an 2785 attempt made to obtain a general license or permission for the use of 2786 such proprietary rights by implementers or users of this 2787 specification can be obtained from the IETF on-line IPR repository at 2788 http://www.ietf.org/ipr. 2790 The IETF invites any interested party to bring to its attention any 2791 copyrights, patents or patent applications, or other proprietary 2792 rights that may cover technology that may be required to implement 2793 this standard. Please address the information to the IETF at 2794 ietf-ipr@ietf.org. 2796 Disclaimer of Validity 2798 This document and the information contained herein are provided on an 2799 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2800 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2801 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2802 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2803 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2804 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2806 Copyright Statement 2808 Copyright (C) The Internet Society (2005). This document is subject 2809 to the rights, licenses and restrictions contained in BCP 78, and 2810 except as set forth therein, the authors retain all their rights. 2812 Acknowledgment 2814 Funding for the RFC Editor function is currently provided by the 2815 Internet Society.