idnits 2.17.1 draft-ietf-ltru-registry-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2647. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2624. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2631. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2637. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC3066, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 22, 2005) is 6789 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646' ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2860 -- Obsolete informational reference (is this intentional?): RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Obsolete informational reference (is this intentional?): RFC 3066 (Obsoleted by RFC 4646, RFC 4647) Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Quest Software 4 Obsoletes: 3066 (if approved) M. Davis, Ed. 5 Expires: March 26, 2006 IBM 6 September 22, 2005 8 Tags for Identifying Languages 9 draft-ietf-ltru-registry-13 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on March 26, 2006. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). 40 Abstract 42 This document describes the structure, content, construction, and 43 semantics of language tags for use in cases where it is desirable to 44 indicate the language used in an information object. It also 45 describes how to register values for use in language tags and the 46 creation of user defined extensions for private interchange. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4 52 2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 2.2 Language Subtag Sources and Interpretation . . . . . . . . 6 54 2.2.1 Primary Language Subtag . . . . . . . . . . . . . . . 8 55 2.2.2 Extended Language Subtags . . . . . . . . . . . . . . 10 56 2.2.3 Script Subtag . . . . . . . . . . . . . . . . . . . . 10 57 2.2.4 Region Subtag . . . . . . . . . . . . . . . . . . . . 11 58 2.2.5 Variant Subtags . . . . . . . . . . . . . . . . . . . 13 59 2.2.6 Extension Subtags . . . . . . . . . . . . . . . . . . 14 60 2.2.7 Private Use Subtags . . . . . . . . . . . . . . . . . 15 61 2.2.8 Pre-Existing RFC 3066 Registrations . . . . . . . . . 16 62 2.2.9 Classes of Conformance . . . . . . . . . . . . . . . . 16 63 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 18 64 3.1 Format of the IANA Language Subtag Registry . . . . . . . 18 65 3.2 Language Subtag Reviewer . . . . . . . . . . . . . . . . . 23 66 3.3 Maintenance of the Registry . . . . . . . . . . . . . . . 24 67 3.4 Stability of IANA Registry Entries . . . . . . . . . . . . 25 68 3.5 Registration Procedure for Subtags . . . . . . . . . . . . 28 69 3.6 Possibilities for Registration . . . . . . . . . . . . . . 31 70 3.7 Extensions and Extensions Registry . . . . . . . . . . . . 33 71 3.8 Initialization of the Registries . . . . . . . . . . . . . 36 72 4. Formation and Processing of Language Tags . . . . . . . . . . 38 73 4.1 Choice of Language Tag . . . . . . . . . . . . . . . . . . 38 74 4.2 Meaning of the Language Tag . . . . . . . . . . . . . . . 40 75 4.3 Length Considerations . . . . . . . . . . . . . . . . . . 41 76 4.3.1 Working with Limited Buffer Sizes . . . . . . . . . . 41 77 4.3.2 Truncation of Language Tags . . . . . . . . . . . . . 43 78 4.4 Canonicalization of Language Tags . . . . . . . . . . . . 43 79 4.5 Considerations for Private Use Subtags . . . . . . . . . . 45 80 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 47 81 5.1 Language Subtag Registry . . . . . . . . . . . . . . . . . 47 82 5.2 Extensions Registry . . . . . . . . . . . . . . . . . . . 48 83 6. Security Considerations . . . . . . . . . . . . . . . . . . . 49 84 7. Character Set Considerations . . . . . . . . . . . . . . . . . 50 85 8. Changes from RFC 3066 . . . . . . . . . . . . . . . . . . . . 51 86 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 54 87 9.1 Normative References . . . . . . . . . . . . . . . . . . . 54 88 9.2 Informative References . . . . . . . . . . . . . . . . . . 55 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 56 90 A. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 57 91 B. Examples of Language Tags (Informative) . . . . . . . . . . . 58 92 Intellectual Property and Copyright Statements . . . . . . . . 61 94 1. Introduction 96 Human beings on our planet have, past and present, used a number of 97 languages. There are many reasons why one would want to identify the 98 language used when presenting or requesting information. 100 A user's language preferences often need to be identified so that 101 appropriate processing can be applied. For example, the user's 102 language preferences in a Web browser can be used to select Web pages 103 appropriately. Language preferences can also be used to select among 104 tools (such as dictionaries) to assist in the processing or 105 understanding of content in different languages. 107 In addition, knowledge about the particular language used by some 108 piece of information content might be useful or even required by some 109 types of processing; for example spell-checking, computer-synthesized 110 speech, Braille transcription, or high-quality print renderings. 112 One means of indicating the language used is by labeling the 113 information content with an identifier or "tag". These tags can be 114 used to specify user preferences when selecting information content, 115 or for labeling additional attributes of content and associated 116 resources. 118 Tags can also be used to indicate additional language attributes of 119 content. For example, indicating specific information about the 120 dialect, writing system, or orthography used in a document or 121 resource may enable the user to obtain information in a form that 122 they can understand, or important in processing or rendering the 123 given content into an appropriate form or style. 125 This document specifies a particular identifier mechanism (the 126 language tag) and a registration function for values to be used to 127 form tags. It also defines a mechanism for private use values and 128 future extension. 130 This document replaces [RFC3066], which replaced [RFC1766]. For a 131 list of changes in this document, see Section 8. 133 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 134 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 135 document are to be interpreted as described in [RFC2119]. 137 2. The Language Tag 139 Language tags are used to help identify languages, whether spoken, 140 written, signed, or otherwise signaled, for the purpose of 141 communication. This includes constructed and artificial languages, 142 but excludes languages not intended primarily for human 143 communication, such as programming languages. 145 2.1 Syntax 147 The language tag is composed of one or more parts or "subtags". Each 148 subtag consists of a sequence of alpha-numeric characters. Subtags 149 are distinguished and separated from one another by a hyphen ("-", 150 ABNF [RFC2234bis] %x2D). A language tag consists of a "primary 151 language" subtag and a (possibly empty) series of subsequent subtags, 152 each of which refines or narrows the range of language identified by 153 the overall tag. 155 Usually, each type of subtag is distinguished by length, position in 156 the tag, and content: subtags can be recognized solely by these 157 features. The only exception to this is a fixed list of 158 grandfathered tags registered under RFC 3066 [RFC3066]. This makes 159 it possible to construct a parser that can extract and assign some 160 semantic information to the subtags, even if the specific subtag 161 values are not recognized. Thus a parser need not have an up-to-date 162 copy (or any copy at all) of the subtag registry to perform most 163 searching and matching operations. 165 The syntax of the language tag in ABNF [RFC2234bis] is: 167 Language-Tag = langtag 168 / privateuse ; private use tag 169 / grandfathered ; grandfathered registrations 171 langtag = (language 172 ["-" script] 173 ["-" region] 174 *("-" variant) 175 *("-" extension) 176 ["-" privateuse]) 178 language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code 179 / 4ALPHA ; reserved for future use 180 / 5*8ALPHA ; registered language subtag 182 extlang = *3("-" 3ALPHA) ; reserved for future use 184 script = 4ALPHA ; ISO 15924 code 186 region = 2ALPHA ; ISO 3166 code 187 / 3DIGIT ; UN M.49 code 189 variant = 5*8alphanum ; registered variants 190 / (DIGIT 3alphanum) 192 extension = singleton 1*("-" (2*8alphanum)) 194 singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT 195 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" 196 ; Single letters: x/X is reserved for private use 198 privateuse = ("x"/"X") 1*("-" (1*8alphanum)) 200 grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum)) 201 ; grandfathered registration 202 ; Note: i is the only singleton 203 ; that starts a grandfathered tag 205 alphanum = (ALPHA / DIGIT) ; letters and numbers 207 Figure 1: Language Tag ABNF 209 Note: There is a subtlety in the ABNF for 'variant': variants 210 starting with a digit MAY be four characters long, while those 211 starting with a letter MUST be at least five characters long. 213 All subtags have a maximum length of eight characters and whitespace 214 is not permitted in a language tag. For examples of language tags, 215 see Appendix B. 217 Note that although [RFC2234bis] refers to octets, the language tags 218 described in this document are sequences of characters from the US- 219 ASCII [ISO646] repertoire. Language tags MAY be used in documents 220 and applications that use other encodings, so long as these encompass 221 the US-ASCII repertoire. An example of this would be an XML document 222 that uses the UTF-16LE [RFC2781] encoding of [Unicode]. 224 The tags and their subtags, including private use and extensions, are 225 to be treated as case insensitive: there exist conventions for the 226 capitalization of some of the subtags, but these MUST NOT be taken to 227 carry meaning. 229 For example: 231 o [ISO639-1] recommends that language codes be written in lower case 232 ('mn' Mongolian). 234 o [ISO3166-1] recommends that country codes be capitalized ('MN' 235 Mongolia). 237 o [ISO15924] recommends that script codes use lower case with the 238 initial letter capitalized ('Cyrl' Cyrillic). 240 However, in the tags defined by this document, the uppercase US-ASCII 241 letters in the range 'A' through 'Z' are considered equivalent and 242 mapped directly to their US-ASCII lowercase equivalents in the range 243 'a' through 'z'. Thus the tag "mn-Cyrl-MN" is not distinct from "MN- 244 cYRL-mn" or "mN-cYrL-Mn" (or any other combination) and each of these 245 variations conveys the same meaning: Mongolian written in the 246 Cyrillic script as used in Mongolia. 248 Although case distinctions do not carry meaning in language tags, 249 consistent formatting and presentation of the tags will aid users. 250 The format of the tags and subtags in the registry is RECOMMENDED. 251 In this format, all non-initial two-letter subtags are uppercase, all 252 non-initial four-letter subtags are titlecase, and all other subtags 253 are lowercase. 255 2.2 Language Subtag Sources and Interpretation 257 The namespace of language tags and their subtags is administered by 258 the Internet Assigned Numbers Authority (IANA) [RFC2860] according to 259 the rules in Section 5 of this document. The Language Subtag 260 Registry maintained by IANA is the source for valid subtags: other 261 standards referenced in this section provide the source material for 262 that registry. 264 Terminology in this section: 266 o Tag or tags refers to a complete language tag, such as 267 "fr-Latn-CA". Examples of tags in this document are enclosed in 268 double-quotes ("en-US"). 270 o Subtag refers to a specific section of a tag, delimited by hyphen, 271 such as the subtag 'Latn' in "fr-Latn-CA". Examples of subtags in 272 this document are enclosed in single quotes ('Latn'). 274 o Code or codes refers to values defined in external standards (and 275 which are used as subtags in this document). For example, 'Latn' 276 is an [ISO15924] script code which was used to define the 'Latn' 277 script subtag for use in a language tag. Examples of codes in 278 this document are enclosed in single quotes ('en', 'Latn'). 280 The definitions in this section apply to the various subtags within 281 the language tags defined by this document, excepting those 282 "grandfathered" tags defined in Section 2.2.8. 284 Language tags are designed so that each subtag type has unique length 285 and content restrictions. These make identification of the subtag's 286 type possible, even if the content of the subtag itself is 287 unrecognized. This allows tags to be parsed and processed without 288 reference to the latest version of the underlying standards or the 289 IANA registry and makes the associated exception handling when 290 parsing tags simpler. 292 Subtags in the IANA registry that do not come from an underlying 293 standard can only appear in specific positions in a tag. 294 Specifically, they can only occur as primary language subtags or as 295 variant subtags. 297 Note that sequences of private use and extension subtags MUST occur 298 at the end of the sequence of subtags and MUST NOT be interspersed 299 with subtags defined elsewhere in this document. 301 Single letter and digit subtags are reserved for current or future 302 use. These include the following current uses: 304 o The single letter subtag 'x' is reserved to introduce a sequence 305 of private use subtags. The interpretation of any private use 306 subtags is defined solely by private agreement and is not defined 307 by the rules in this section or in any standard or registry 308 defined in this document. 310 o All other single letter subtags are reserved to introduce 311 standardized extension subtag sequences as described in 312 Section 3.7. 314 The single letter subtag 'i' is used by some grandfathered tags, such 315 as "i-enochian", where it always appears in the first position and 316 cannot be confused with an extension. 318 2.2.1 Primary Language Subtag 320 The primary language subtag is the first subtag in a language tag 321 (with the exception of private use and certain grandfathered tags) 322 and cannot be omitted. The following rules apply to the primary 323 language subtag: 325 1. All two character language subtags were defined in the IANA 326 registry according to the assignments found in the standard ISO 327 639 Part 1, "ISO 639-1:2002, Codes for the representation of 328 names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using 329 assignments subsequently made by the ISO 639 Part 1 maintenance 330 agency or governing standardization bodies. 332 2. All three character language subtags were defined in the IANA 333 registry according to the assignments found in ISO 639 Part 2, 334 "ISO 639-2:1998 - Codes for the representation of names of 335 languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2], or 336 assignments subsequently made by the ISO 639 Part 2 maintenance 337 agency or governing standardization bodies. 339 3. The subtags in the range 'qaa' through 'qtz' are reserved for 340 private use in language tags. These subtags correspond to codes 341 reserved by ISO 639-2 for private use. These codes MAY be used 342 for non-registered primary-language subtags (instead of using 343 private use subtags following 'x-'). Please refer to Section 4.5 344 for more information on private use subtags. 346 4. All four character language subtags are reserved for possible 347 future standardization. 349 5. All language subtags of 5 to 8 characters in length in the IANA 350 registry were defined via the registration process in Section 3.5 351 and MAY be used to form the primary language subtag. At the time 352 this document was created, there were no examples of this kind of 353 subtag and future registrations of this type will be discouraged: 354 primary languages are strongly RECOMMENDED for registration with 355 ISO 639 and proposals rejected by ISO 639/RA will be closely 356 scrutinized before they are registered with IANA. 358 6. The single character subtag 'x' as the primary subtag indicates 359 that the language tag consists solely of subtags whose meaning is 360 defined by private agreement. For example, in the tag "x-fr-CH", 361 the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the 362 French language or the country of Switzerland (or any other value 363 in the IANA registry) unless there is a private agreement in 364 place to do so. See Section 4.5. 366 7. The single character subtag 'i' is used by some grandfathered 367 tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other 368 grandfathered tags have a primary language subtag in their first 369 position) 371 8. Other values MUST NOT be assigned to the primary subtag except by 372 revision or update of this document. 374 Note: For languages that have both an ISO 639-1 two character code 375 and an ISO 639-2 three character code, only the ISO 639-1 two 376 character code is defined in the IANA registry. 378 Note: For languages that have no ISO 639-1 two character code and for 379 which the ISO 639-2/T (Terminology) code and the ISO 639-2/B 380 (Bibliographic) codes differ, only the Terminology code is defined in 381 the IANA registry. At the time this document was created, all 382 languages that had both kinds of three character code were also 383 assigned a two character code; it is not expected that future 384 assignments of this nature will occur. 386 Note: To avoid problems with versioning and subtag choice as 387 experienced during the transition between RFC 1766 and RFC 3066, as 388 well as the canonical nature of subtags defined by this document, the 389 ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ 390 RA-JAC) has included the following statement in [iso639.principles]: 392 "A language code already in ISO 639-2 at the point of freezing ISO 393 639-1 shall not later be added to ISO 639-1. This is to ensure 394 consistency in usage over time, since users are directed in Internet 395 applications to employ the alpha-3 code when an alpha-2 code for that 396 language is not available." 398 In order to avoid instability in the canonical form of tags, if a two 399 character code is added to ISO 639-1 for a language for which a three 400 character code was already included in ISO 639-2, the two character 401 code MUST NOT be registered. See Section 3.4. 403 For example, if some content were tagged with 'haw' (Hawaiian), which 404 currently has no two character code, the tag would not be invalidated 405 if ISO 639-1 were to assign a two character code to the Hawaiian 406 language at a later date. 408 For example, one of the grandfathered IANA registrations is 409 "i-enochian". The subtag 'enochian' could be registered in the IANA 410 registry as a primary language subtag (assuming that ISO 639 does not 411 register this language first), making tags such as "enochian-AQ" and 412 "enochian-Latn" valid. 414 2.2.2 Extended Language Subtags 416 The following rules apply to the extended language subtags: 418 1. Three letter subtags immediately following the primary subtag are 419 reserved for future standardization, anticipating work that is 420 currently under way on ISO 639. 422 2. Extended language subtags MUST follow the primary subtag and 423 precede any other subtags. 425 3. There MAY be up to three extended language subtags. 427 4. Extended language subtags MUST NOT be registered or used to form 428 language tags. Their syntax is described here so that 429 implementations can be compatible with any future revision of 430 this document which does provide for their registration. 432 Extended language subtag records, once they appear in the registry, 433 MUST include exactly one 'Prefix' field indicating an appropriate 434 language subtag or sequence of subtags that MUST always appear as a 435 prefix to the extended language subtag. 437 Example: In a future revision or update of this document, the tag 438 "zh-gan" (registered under RFC 3066) might become a valid non- 439 grandfathered (that is, redundant) tag in which the subtag 'gan' 440 might represent the Chinese dialect 'Gan'. 442 2.2.3 Script Subtag 444 Script subtags are used to indicate the script or writing system 445 variations that distinguish the written forms of a language or its 446 dialects. The following rules apply to the script subtags: 448 1. All four character subtags were defined according to 449 [ISO15924]--"Codes for the representation of the names of 450 scripts": alpha-4 script codes, or subsequently assigned by the 451 ISO 15924 maintenance agency or governing standardization bodies, 452 denoting the script or writing system used in conjunction with 453 this language. 455 2. Script subtags MUST immediately follow the primary language 456 subtag and all extended language subtags and MUST occur before 457 any other type of subtag described below. 459 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 460 use in language tags. These subtags correspond to codes reserved 461 by ISO 15924 for private use. These codes MAY be used for non- 462 registered script values. Please refer to Section 4.5 for more 463 information on private use subtags. 465 4. Script subtags MUST NOT be registered using the process in 466 Section 3.5 of this document. Variant subtags MAY be considered 467 for registration for that purpose. 469 5. There MUST be at most one script subtag in a language tag and the 470 script subtag SHOULD be omitted when it adds no distinguishing 471 value to the tag or when the primary language subtag's record 472 includes a Suppress-Script field listing the applicable script 473 subtag. 475 Example: "sr-Latn" represents Serbian written using the Latin script. 477 2.2.4 Region Subtag 479 Region subtags are used to indicate linguistic variations associated 480 with or appropriate to a specific country, territory, or region. 481 Typically, a region subtag is used to indicate regional dialects or 482 usage, or region-specific spelling conventions. A region subtag can 483 also be used to indicate that content is expressed in a way that is 484 appropriate for use throughout a region; for instance, Spanish 485 content tailored to be useful throughout Latin America. 487 The following rules apply to the region subtags: 489 1. Region subtags MUST follow any language, extended language, or 490 script subtags and MUST precede all other subtags. 492 2. All two character subtags following the primary subtag were 493 defined in the IANA registry according to the assignments found 494 in [ISO3166-1] ("Codes for the representation of names of 495 countries and their subdivisions -- Part 1: Country codes") using 496 the list of alpha-2 country codes, or using assignments 497 subsequently made by the ISO 3166 maintenance agency or governing 498 standardization bodies. 500 3. All three character subtags consisting of digit (numeric) 501 characters following the primary subtag were defined in the IANA 502 registry according to the assignments found in UN Standard 503 Country or Area Codes for Statistical Use [UN_M.49] or 504 assignments subsequently made by the governing standards body. 505 Note that not all of the UN M.49 codes are defined in the IANA 506 registry. The following rules define which codes are entered 507 into the registry as valid subtags: 509 A. UN numeric codes assigned to 'macro-geographical 510 (continental)' or sub-regions MUST be registered in the 511 registry. These codes are not associated with an assigned 512 ISO 3166 alpha-2 code and represent supra-national areas, 513 usually covering more than one nation, state, province, or 514 territory. 516 B. UN numeric codes for 'economic groupings' or 'other 517 groupings' MUST NOT be registered in the IANA registry and 518 MUST NOT be used to form language tags. 520 C. UN numeric codes for countries or areas with ambiguous ISO 521 3166 alpha-2 codes, when entered into the registry, MUST be 522 defined according to the rules in Section 3.4 and MUST be 523 used to form language tags that represent the country or 524 region for which they are defined. 526 D. UN numeric codes for countries or areas for which there is an 527 associated ISO 3166 alpha-2 code in the registry MUST NOT be 528 entered into the registry and MUST NOT be used to form 529 language tags. Note that the ISO 3166-based subtag in the 530 registry MUST actually be associated with the UN M.49 code in 531 question. 533 E. UN numeric codes and ISO 3166 alpha-2 codes for countries or 534 areas listed as eligible for registration in [initial- 535 registry] but not presently registered MAY be entered into 536 the IANA registry via the process described in Section 3.5. 537 Once registered, these codes MAY be used to form language 538 tags. 540 F. All other UN numeric codes for countries or areas which do 541 not have an associated ISO 3166 alpha-2 code MUST NOT be 542 entered into the registry and MUST NOT be used to form 543 language tags. For more information about these codes, see 544 Section 3.4. 546 4. Note: The alphanumeric codes in Appendix X of the UN document 547 MUST NOT be entered into the registry and MUST NOT be used to 548 form language tags. (At the time this document was created these 549 values match the ISO 3166 alpha-2 codes.) 551 5. There MUST be at most one region subtag in a language tag and the 552 region subtag MAY be omitted, as when it adds no distinguishing 553 value to the tag. 555 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 556 reserved for private use in language tags. These subtags 557 correspond to codes reserved by ISO 3166 for private use. These 558 codes MAY be used for private use region subtags (instead of 559 using a private use subtag sequence). Please refer to 560 Section 4.5 for more information on private use subtags. 562 "de-CH" represents German ('de') as used in Switzerland ('CH'). 564 "sr-Latn-CS" represents Serbian ('sr') written using Latin script 565 ('Latn') as used in Serbia and Montenegro ('CS'). 567 "es-419" represents Spanish ('es') appropriate to the UN-defined 568 Latin America and Caribbean region ('419'). 570 2.2.5 Variant Subtags 572 Variant subtags are used to indicate additional, well-recognized 573 variations that define a language or its dialects which are not 574 covered by other available subtags. The following rules apply to the 575 variant subtags: 577 1. Variant subtags are not associated with any external standard. 578 Variant subtags and their meanings are defined by the 579 registration process defined in Section 3.5. 581 2. Variant subtags MUST follow all of the other defined subtags, but 582 precede any extension or private use subtag sequences. 584 3. More than one variant MAY be used to form the language tag. 586 4. Variant subtags MUST be registered with IANA according to the 587 rules in Section 3.5 of this document before being used to form 588 language tags. In order to distinguish variants from other types 589 of subtags, registrations MUST meet the following length and 590 content restrictions: 592 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 593 at least five characters long. 595 2. Variant subtags that begin with a digit (0-9) MUST be at 596 least four characters long. 598 Variant subtag records in the language subtag registry MAY include 599 one or more 'Prefix' fields, which indicates the language tag or tags 600 that would make a suitable prefix (with other subtags, as 601 appropriate) in forming a language tag with the variant. For 602 example, the subtag 'nedis' has a Prefix of "sl", making it suitable 603 to form language tags such as "sl-nedis" and "sl-IT-nedis", but not 604 suitable for use in a tag such as "zh-nedis" or "it-IT-nedis". 606 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. 608 "de-CH-1996" represents German as used in Switzerland and as written 609 using the spelling reform beginning in the year 1996 C.E. 611 Most variants that share a prefix are mutually exclusive. For 612 example, the German orthographic variations '1996' and '1901' SHOULD 613 NOT be used in the same tag, as they represent the dates of different 614 spelling reforms. A variant that can meaningfully be used in 615 combination with another variant SHOULD include a 'Prefix' field in 616 its registry record that lists that other variant. For example, if 617 another German variant 'example' were created that made sense to use 618 with '1996', then 'example' should include two Prefix fields: "de" 619 and "de-1996". 621 2.2.6 Extension Subtags 623 Extensions provide a mechanism for extending language tags for use in 624 various applications. See: Section 3.7. The following rules apply 625 to extensions: 627 1. Extension subtags are separated from the other subtags defined 628 in this document by a single character subtag ("singleton"). 629 The singleton MUST be one allocated to a registration authority 630 via the mechanism described in Section 3.7 and MUST NOT be the 631 letter 'x', which is reserved for private use subtag sequences. 633 2. Note: Private use subtag sequences starting with the singleton 634 subtag 'x' are described in Section 2.2.7 below. 636 3. An extension MUST follow at least a primary language subtag. 637 That is, a language tag cannot begin with an extension. 638 Extensions extend language tags, they do not override or replace 639 them. For example, "a-value" is not a well-formed language tag, 640 while "de-a-value" is. 642 4. Each singleton subtag MUST appear at most one time in each tag 643 (other than as a private use subtag). That is, singleton 644 subtags MUST NOT be repeated. For example, the tag "en-a-bbb-a- 645 ccc" is invalid because the subtag 'a' appears twice. Note that 646 the tag "en-a-bbb-x-a-ccc" is valid because the second 647 appearance of the singleton 'a' is in a private use sequence. 649 5. Extension subtags MUST meet all of the requirements for the 650 content and format of subtags defined in this document. 652 6. Extension subtags MUST meet whatever requirements are set by the 653 document that defines their singleton prefix and whatever 654 requirements are provided by the maintaining authority. 656 7. Each extension subtag MUST be from two to eight characters long 657 and consist solely of letters or digits, with each subtag 658 separated by a single '-'. 660 8. Each singleton MUST be followed by at least one extension 661 subtag. For example, the tag "tlh-a-b-foo" is invalid because 662 the first singleton 'a' is followed immediately by another 663 singleton 'b'. 665 9. Extension subtags MUST follow all language, extended language, 666 script, region and variant subtags in a tag. 668 10. All subtags following the singleton and before another singleton 669 are part of the extension. Example: In the tag "fr-a-Latn", the 670 subtag 'Latn' does not represent the script subtag 'Latn' 671 defined in the IANA Language Subtag Registry. Its meaning is 672 defined by the extension 'a'. 674 11. In the event that more than one extension appears in a single 675 tag, the tag SHOULD be canonicalized as described in 676 Section 4.4. 678 For example, if the prefix singleton 'r' and the shown subtags were 679 defined, then the following tag would be a valid example: "en-Latn- 680 GB-boont-r-extended-sequence-x-private" 682 2.2.7 Private Use Subtags 684 Private use subtags are used to indicate distinctions in language 685 important in a given context by private agreement. The following 686 rules apply to private use subtags: 688 1. Private use subtags are separated from the other subtags defined 689 in this document by the reserved single-character subtag 'x'. 691 2. Private use subtags MUST conform to the format and content 692 constraints defined in the ABNF for all subtags. 694 3. Private use subtags MUST follow all language, extended language, 695 script, region, variant, and extension subtags in the tag. 696 Another way of saying this is that all subtags following the 697 singleton 'x' MUST be considered private use. Example: The 698 subtag 'US' in the tag "en-x-US" is a private use subtag. 700 4. A tag MAY consist entirely of private use subtags. 702 5. No source is defined for private use subtags. Use of private use 703 subtags is by private agreement only. 705 6. Private use subtags are NOT RECOMMENDED where alternatives exist 706 or for general interchange. See Section 4.5 for more information 707 on private use subtag choice. 709 For example: Users who wished to utilize codes from the Ethnologue 710 publication of SIL International for language identification might 711 agree to exchange tags such as "az-Arab-x-AZE-derbend". This example 712 contains two private use subtags. The first is 'AZE' and the second 713 is 'derbend'. 715 2.2.8 Pre-Existing RFC 3066 Registrations 717 Existing IANA-registered language tags from RFC 1766 and/or RFC 3066 718 maintain their validity. These tags will be maintained in the 719 registry in records of either the "grandfathered" or "redundant" 720 type. Grandfathered tags contain one or more subtags that are not 721 defined in the Language Subtag Registry (see Section 3). Redundant 722 tags consist entirely of subtags defined above and whose independent 723 registration is superseded by this document. For more information 724 see Section 3.8. 726 It is important to note that all language tags formed under the 727 guidelines in this document were either legal, well-formed tags or 728 could have been registered under RFC 3066. 730 2.2.9 Classes of Conformance 732 Implementations sometimes need to describe their capabilities with 733 regard to the rules and practices described in this document. There 734 are two classes of conforming implementations described by this 735 document: "well-formed" processors and "validating" processors. 736 Claims of conformance SHOULD explicitly reference one of these 737 definitions. 739 An implementation that claims to check for well-formed language tags 740 MUST: 742 o Check that the tag and all of its subtags, including extension and 743 private use subtags, conform to the ABNF or that the tag is on the 744 list of grandfathered tags. 746 o Check that singleton subtags that identify extensions do not 747 repeat. For example, the tag "en-a-xx-b-yy-a-zz" is not well- 748 formed. 750 Well-formed processors are strongly encouraged to implement the 751 canonicalization rules contained in Section 4.4. 753 An implementation that claims to be validating MUST: 755 o Check that the tag is well-formed. 757 o Specify the particular registry date for which the implementation 758 performs validation of subtags. 760 o Check that either the tag is a grandfathered tag, or that all 761 language, script, region, and variant subtags consist of valid 762 codes for use in language tags according to the IANA registry as 763 of the particular date specified by the implementation. 765 o Specify which, if any, extension RFCs as defined in Section 3.7 766 are supported, including version, revision, and date. 768 o For any such extensions supported, check that all subtags used in 769 that extension are valid. 771 o For variant and extended language subtags, if the registry 772 contains one or more 'Prefix' fields for that subtag, check that 773 the tag matches at least one prefix. The tag matches if all the 774 subtags in the 'Prefix' also appear in the tag. For example, the 775 prefix "es-CO" matches the tag "es-Latn-CO-x-private" because both 776 the 'es' language subtag and 'CO' region subtag appear in the tag. 778 3. Registry Format and Maintenance 780 This section defines the Language Subtag Registry and the maintenance 781 and update procedures associated with it, as well as a registry for 782 extensions to language tags (Section 3.7). 784 The Language Subtag Registry contains a comprehensive list of all of 785 the subtags valid in language tags. This allows implementers a 786 straightforward and reliable way to validate language tags. The 787 Language Subtag Registry will be maintained so that, except for 788 extension subtags, it is possible to validate all of the subtags that 789 appear in a language tag under the provisions of this document or its 790 revisions or successors. In addition, the meaning of the various 791 subtags will be unambiguous and stable over time. (The meaning of 792 private use subtags, of course, is not defined by the IANA registry.) 794 3.1 Format of the IANA Language Subtag Registry 796 The IANA Language Subtag Registry ("the registry") consists of a text 797 file that is machine readable in the format described in this 798 section, plus copies of the registration forms approved in accordance 799 with the process described in Section 3.5. The existing registration 800 forms for grandfathered and redundant tags taken from RFC 3066 will 801 be maintained as part of the obsolete RFC 3066 registry. The 802 remaining set of initial subtags will not have registration forms 803 created for them. 805 The registry is in the text format described below. This format was 806 based on the record-jar format described in [record-jar]. 808 Each line of text is limited to 72 characters, including all 809 whitespace. Records are separated by lines containing only the 810 sequence "%%" (%x25.25). 812 Each field can be viewed as a single, logical line of ASCII 813 characters, comprising a field-name and a field-body separated by a 814 COLON character (%x3A). For convenience, the field-body portion of 815 this conceptual entity can be split into a multiple-line 816 representation; this is called "folding". The format of the registry 817 is described by the following ABNF (per [RFC2234bis]): 819 registry = record *("%%" CRLF record) 820 record = 1*( field-name *SP ":" *SP field-body CRLF ) 821 field-name = (ALPHA / DIGIT)[*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] 822 field-body = *(ASCCHAR/LWSP) 823 ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 824 UNICHAR = "&#x" 2*6HEXDIG ";" 825 Figure 2: record-jar ABNF 827 The sequence '..' (%x2E.2E) in a field-body denotes a range of 828 values. Such a range represents all subtags of the same length that 829 are in alphabetic or numeric order within that range, including the 830 values explicitly mentioned. For example 'a..c' denotes the values 831 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 832 '13'. 834 Characters from outside the US-ASCII[ISO646] repertoire, as well as 835 the AMPERSAND character ("&", %x26) when it occurs in a field-body 836 are represented by a "Numeric Character Reference" using hexadecimal 837 notation in the style used by [XML10] (see 838 ). This consists of the 839 sequence "&#x" (%x26.23.78) followed by a hexadecimal representation 840 of the character's code point in [ISO10646] followed by a closing 841 semicolon (%x3B). For example, the EURO SIGN, U+20AC, would be 842 represented by the sequence "€". Note that the hexadecimal 843 notation MAY have between two and six digits. 845 All fields whose field-body contains a date value use the "full-date" 846 format specified in [RFC3339]. For example: "2004-06-28" represents 847 June 28, 2004 in the Gregorian calendar. 849 The first record in the file contains the single field whose field- 850 name is "File-Date". The field-body of this record contains the last 851 modification date of this copy of the registry, making it possible to 852 compare different versions of the registry. The registry on the IANA 853 website is the most current. Versions with an older date than that 854 one are not up-to-date. 856 File-Date: 2004-06-28 857 %% 859 Figure 3: Example of the File-Date Record 861 Subsequent records represent subtags in the registry. Each of the 862 fields in each record MUST occur no more than once, unless otherwise 863 noted below. Each record MUST contain the following fields: 865 o 'Type' 867 * Type's field-value MUST consist of one of the following 868 strings: "language", "extlang", "script", "region", "variant", 869 "grandfathered", and "redundant" and denotes the type of tag or 870 subtag. 872 o Either 'Subtag' or 'Tag' 874 * Subtag's field-value contains the subtag being defined. This 875 field MUST only appear in records of whose 'Type' has one of 876 these values: "language", "extlang", "script", "region", or 877 "variant". 879 * Tag's field-value contains a complete language tag. This field 880 MUST only appear in records whose 'Type' has one of these 881 values: "grandfathered" or "redundant". Note that the field- 882 value will always follow the 'grandfathered' production in the 883 ABNF in Section 2.1 885 o Description 887 * Description's field-value contains a non-normative description 888 of the subtag or tag. 890 o Added 892 * Added's field-value contains the date the record was added to 893 the registry. 895 The 'Subtag' or 'Tag' field MUST use lowercase letters to form the 896 subtag or tag, with two exceptions. Subtags whose 'Type' field is 897 'script' (in other words, subtags defined by ISO 15924) MUST use 898 titlecase. Subtags whose 'Type' field is 'region' (in other words, 899 subtags defined by ISO 3166) MUST use uppercase. These exceptions 900 mirror the use of case in the underlying standards. 902 The field 'Description' MAY appear more than one time and contains a 903 description of the tag or subtag in the record. At least one of the 904 'Description' fields MUST be written or transcribed into the Latin 905 script; the same or additional fields MAY also include a description 906 in a non-Latin script. The 'Description' field is used for 907 identification purposes and SHOULD NOT be taken to represent the 908 actual native name of the language or variation or to be in any 909 particular language. Most descriptions are taken directly from 910 source standards such as ISO 639 or ISO 3166. 912 Note: Descriptions in registry entries that correspond to ISO 639, 913 ISO 15924, ISO 3166 or UN M.49 codes are intended only to indicate 914 the meaning of that identifier as defined in the source standard at 915 the time it was added to the registry. The description does not 916 replace the content of the source standard itself. The descriptions 917 are not intended to be the English localized names for the subtags. 918 Localization or translation of language tag and subtag descriptions 919 is out of scope of this document. 921 Each record MAY also contain the following fields: 923 o Preferred-Value 925 * For fields of type 'language', 'extlang', 'script', 'region', 926 and 'variant', 'Preferred-Value' contains a subtag of the same 927 'Type' which is preferred for forming the language tag. 929 * For fields of type 'grandfathered' and 'redundant', a canonical 930 mapping to a complete language tag. 932 o Deprecated 934 * Deprecated's field-value contains the date the record was 935 deprecated. 937 o Prefix 939 * Prefix's field-value contains a language tag with which this 940 subtag MAY be used to form a new language tag, perhaps with 941 other subtags as well. This field MUST only appear in records 942 whose 'Type' field-value is 'variant' or 'extlang'. For 943 example, the 'Prefix' for the variant 'nedis' is 'sl', meaning 944 that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate 945 while the tag "is-nedis" is not. 947 o Comments 949 * Comments contains additional information about the subtag, as 950 deemed appropriate for understanding the registry and 951 implementing language tags using the subtag or tag. 953 o Suppress-Script 955 * Suppress-Script contains a script subtag that SHOULD NOT be 956 used to form language tags with the associated primary language 957 subtag. This field MUST only appear in records whose 'Type' 958 field-value is 'language'. See Section 4.1. 960 The field 'Deprecated' MAY be added to any record via the maintenance 961 process described in Section 3.3 or via the registration process 962 described in Section 3.5. Usually the addition of a 'Deprecated' 963 field is due to the action of one of the standards bodies, such as 964 ISO 3166, withdrawing a code. In some historical cases it might not 965 have been possible to reconstruct the original deprecation date. For 966 these cases, an approximate date appears in the registry. Although 967 valid in language tags, subtags and tags with a 'Deprecated' field 968 are deprecated and validating processors SHOULD NOT generate these 969 subtags. Note that a record that contains a 'Deprecated' field and 970 no corresponding 'Preferred-Value' field has no replacement mapping. 972 The field 'Preferred-Value' contains a mapping between the record in 973 which it appears and another tag or subtag. The value in this field 974 is STRONGLY RECOMMENDED as the best choice to represent the value of 975 this record when selecting a language tag. These values form three 976 groups: 978 1. ISO 639 language codes which were later withdrawn in favor of 979 other codes. These values are mostly a historical curiosity. 981 2. ISO 3166 region codes which have been withdrawn in favor of a new 982 code. This sometimes happens when a country changes its name or 983 administration in such a way that warrants a new region code. 985 3. Tags grandfathered from RFC 3066. In many cases these tags have 986 become obsolete because the values they represent were later 987 encoded by ISO 639. 989 Records that contain a 'Preferred-Value' field MUST also have a 990 'Deprecated' field. This field contains a date of deprecation. Thus 991 a language tag processor can use the registry to construct the valid, 992 non-deprecated set of subtags for a given date. In addition, for any 993 given tag, a processor can construct the set of valid language tags 994 that correspond to that tag for all dates up to the date of the 995 registry. The ability to do these mappings MAY be beneficial to 996 applications that are matching, selecting, for filtering content 997 based on its language tags. 999 Note that 'Preferred-Value' mappings in records of type 'region' 1000 sometimes do not represent exactly the same meaning as the original 1001 value. There are many reasons for a country code to be changed and 1002 the effect this has on the formation of language tags will depend on 1003 the nature of the change in question. 1005 In particular, the 'Preferred-Value' field does not imply retagging 1006 content that uses the affected subtag. 1008 The field 'Preferred-Value' MUST NOT be modified once created in the 1009 registry. The field MAY be added to records of type "grandfathered" 1010 and "region" according to the rules in Section 3.3. Otherwise the 1011 field MUST NOT be added to any record already in the registry. 1013 The 'Preferred-Value' field in records of type "grandfathered" and 1014 "redundant" contains whole language tags that are strongly 1015 RECOMMENDED for use in place of the record's value. In many cases 1016 the mappings were created by deprecation of the tags during the 1017 period before this document was adopted. For example, the tag "no- 1018 nyn" was deprecated in favor of the ISO 639-1 defined language code 1019 'nn'. 1021 Records of type 'variant' MAY have more than one field of type 1022 'Prefix'. Additional fields of this type MAY be added to a 'variant' 1023 record via the registration process. 1025 Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. 1027 The field-value of the 'Prefix' field consists of a language tag 1028 whose subtags are appropriate to use with this subtag. For example, 1029 the variant subtag '1996' has a Prefix field of "de". This means 1030 that tags starting with the sequence "de-" are appropriate with this 1031 subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while 1032 the tag "fr-1996" is an inappropriate choice. 1034 The field of type 'Prefix' MUST NOT be removed from any record. The 1035 field-value for this type of field MUST NOT be modified. 1037 The field 'Comments' MAY appear more than once per record. This 1038 field MAY be inserted or changed via the registration process and no 1039 guarantee of stability is provided. The content of this field is not 1040 restricted, except by the need to register the information, the 1041 suitability of the request, and by reasonable practical size 1042 limitations. 1044 The field 'Suppress-Script' MUST only appear in records whose 'Type' 1045 field-value is 'language'. This field MUST NOT appear more than one 1046 time in a record. This field indicates a script used to write the 1047 overwhelming majority of documents for the given language and which 1048 therefore adds no distinguishing information to a language tag. It 1049 helps ensure greater compatibility between the language tags 1050 generated according to the rules in this document and language tags 1051 and tag processors or consumers based on RFC 3066. For example, 1052 virtually all Icelandic documents are written in the Latin script, 1053 making the subtag 'Latn' redundant in the tag "is-Latn". 1055 3.2 Language Subtag Reviewer 1057 The Language Subtag Reviewer is appointed by the IESG for an 1058 indefinite term, subject to removal or replacement at the IESG's 1059 discretion. The Language Subtag Reviewer moderates the ietf- 1060 languages mailing list, responds to requests for registration, and 1061 performs the other registry maintenance duties described in 1062 Section 3.3. Only the Language Subtag Reviewer is permitted to 1063 request IANA to change, update or add records to the Language Subtag 1064 Registry. 1066 The performance or decisions of the Language Subtag Reviewer MAY be 1067 appealed to the IESG under the same rules as other IETF decisions 1068 (see [RFC2026]). The IESG can reverse or overturn the decision of 1069 the Language Subtag Reviewer, provide guidance, or take other 1070 appropriate actions. 1072 3.3 Maintenance of the Registry 1074 Maintenance of the registry requires that as codes are assigned or 1075 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language 1076 Subtag Reviewer MUST evaluate each change, determine whether it 1077 conflicts with existing registry entries, and submit the information 1078 to IANA for inclusion in the registry. If a change takes place and 1079 the Language Subtag Reviewer does not do this in a timely manner, 1080 then any interested party MAY use the procedure in Section 3.5 to 1081 register the appropriate update. 1083 Note: The redundant and grandfathered entries together are the 1084 complete list of tags registered under [RFC3066]. The redundant tags 1085 are those that can now be formed using the subtags defined in the 1086 registry together with the rules of Section 2.2. The grandfathered 1087 entries include those that can never be legal under those same 1088 provisions. 1090 The set of redundant and grandfathered tags is permanent and stable: 1091 new entries in this section MUST NOT be added and existing entries 1092 MUST NOT be removed. Records of type 'grandfathered' MAY have their 1093 type converted to 'redundant': see item 12 in Section 3.6 for more 1094 information. The decision making process about which tags were 1095 initially grandfathered and which were made redundant is described in 1096 [initial-registry]. 1098 RFC 3066 tags that were deprecated prior to the adoption of this 1099 document are part of the list of grandfathered tags and their 1100 component subtags were not included as registered variants (although 1101 they remain eligible for registration). For example, the tag "art- 1102 lojban" was deprecated in favor of the language subtag 'jbo'. 1104 The Language Subtag Reviewer MUST ensure that new subtags meet the 1105 requirements in Section 4.1 or submit an appropriate alternate subtag 1106 as described in that section. When either a change or addition to 1107 the registry is needed, the Language Subtag Reviewer MUST prepare the 1108 complete record, including all fields, and forward it to IANA for 1109 insertion into the registry. Each record being modified or inserted 1110 MUST be forwarded in a separate message. 1112 If a record represents a new subtag that does not currently exist in 1113 the registry, then the message's subject line MUST include the word 1114 "INSERT". If the record represents a change to an existing subtag, 1115 then the subject line of the message MUST include the word "MODIFY". 1116 The message MUST contain both the record for the subtag being 1117 inserted or modified and the new File-Date record. Here is an 1118 example of what the body of the message might contain: 1120 LANGUAGE SUBTAG MODIFICATION 1121 File-Date: 2005-01-02 1122 %% 1123 Type: variant 1124 Subtag: nedis 1125 Description: Natisone dialect 1126 Description: Nadiza dialect 1127 Added: 2003-10-09 1128 Prefix: sl 1129 Comments: This is a comment shown 1130 as an example. 1131 %% 1133 Figure 4: Example of a Language Subtag Modification Form 1135 Whenever an entry is created or modified in the registry, the 'File- 1136 Date' record at the start of the registry is updated to reflect the 1137 most recent modification date in the [RFC3339] "full-date" format. 1139 Before forwarding a new registration to IANA, the Language Subtag 1140 Reviewer MUST ensure that values in the 'Subtag' field match case 1141 according to the description in Section 3.1. 1143 3.4 Stability of IANA Registry Entries 1145 The stability of entries and their meaning in the registry is 1146 critical to the long term stability of language tags. The rules in 1147 this section guarantee that a specific language tag's meaning is 1148 stable over time and will not change. 1150 These rules specifically deal with how changes to codes (including 1151 withdrawal and deprecation of codes) maintained by ISO 639, ISO 1152 15924, ISO 3166, and UN M.49 are reflected in the IANA Language 1153 Subtag Registry. Assignments to the IANA Language Subtag Registry 1154 MUST follow the following stability rules: 1156 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1157 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 1158 guaranteed to be stable over time. 1160 2. Values in the 'Description' field MUST NOT be changed in a way 1161 that would invalidate previously-existing tags. They MAY be 1162 broadened somewhat in scope, changed to add information, or 1163 adapted to the most common modern usage. For example, countries 1164 occasionally change their official names: an historical example 1165 of this would be "Upper Volta" changing to "Burkina Faso". 1167 3. Values in the field 'Prefix' MAY be added to records of type 1168 'variant' via the registration process. 1170 4. Values in the field 'Prefix' MAY be modified, so long as the 1171 modifications broaden the set of prefixes. That is, a prefix 1172 MAY be replaced by one of its own prefixes. For example, the 1173 prefix "en-US" could be replaced by "en", but not by the 1174 prefixes "en-Latn", "fr", or "en-US-boont". If one of those 1175 prefixes were needed, a new Prefix SHOULD be registered. 1177 5. Values in the field 'Prefix' MUST NOT be removed. 1179 6. The field 'Comments' MAY be added, changed, modified, or removed 1180 via the registration process or any of the processes or 1181 considerations described in this section. 1183 7. The field 'Suppress-Script' MAY be added or removed via the 1184 registration process. 1186 8. Codes assigned by ISO 639, ISO 15924, and ISO 3166 that do not 1187 conflict with existing subtags of the associated type and whose 1188 meaning is not the same as an existing subtag of the same type 1189 are entered into the IANA registry as new records. 1191 9. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 1192 withdrawn by their respective maintenance or registration 1193 authority remain valid in language tags. A 'Deprecated' field 1194 containing the date of withdrawal is added to the record. If a 1195 new record of the same type is added that represents a 1196 replacement value, then a 'Preferred-Value' field MAY also be 1197 added. The registration process MAY be used to add comments 1198 about the withdrawal of the code by the respective standard. 1200 Example The region code 'TL' was assigned to the country 'Timor- 1201 Leste', replacing the code 'TP' (which was assigned to 'East 1202 Timor' when it was under administration by Portugal). The 1203 subtag 'TP' remains valid in language tags, but its record 1204 contains the a 'Preferred-Value' of 'TL' and its field 1205 'Deprecated' contains the date the new code was assigned 1206 ('2004-07-06'). 1208 10. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 1209 with existing subtags of the associated type, including subtags 1210 that are deprecated, MUST NOT be entered into the registry. The 1211 following additional considerations apply to subtag values that 1212 are reassigned: 1214 A. For ISO 639 codes, if the newly assigned code's meaning is 1215 not represented by a subtag in the IANA registry, the 1216 Language Subtag Reviewer, as described in Section 3.5, SHALL 1217 prepare a proposal for entering in the IANA registry as soon 1218 as practical a registered language subtag as an alternate 1219 value for the new code. The form of the registered language 1220 subtag will be at the discretion of the Language Subtag 1221 Reviewer and MUST conform to other restrictions on language 1222 subtags in this document. 1224 B. For all subtags whose meaning is derived from an external 1225 standard (i.e. ISO 639, ISO 15924, ISO 3166, or UN M.49), 1226 if a new meaning is assigned to an existing code and the new 1227 meaning broadens the meaning of that code, then the meaning 1228 for the associated subtag MAY be changed to match. The 1229 meaning of a subtag MUST NOT be narrowed, however, as this 1230 can result in an unknown proportion of the existing uses of 1231 a subtag becoming invalid. Note: ISO 639 MA/RA has adopted 1232 a similar stability policy. 1234 C. For ISO 15924 codes, if the newly assigned code's meaning is 1235 not represented by a subtag in the IANA registry, the 1236 Language Subtag Reviewer, as described in Section 3.5, SHALL 1237 prepare a proposal for entering in the IANA registry as soon 1238 as practical a registered variant subtag as an alternate 1239 value for the new code. The form of the registered variant 1240 subtag will be at the discretion of the Language Subtag 1241 Reviewer and MUST conform to other restrictions on variant 1242 subtags in this document. 1244 D. For ISO 3166 codes, if the newly assigned code's meaning is 1245 associated with the same UN M.49 code as another 'region' 1246 subtag, then the existing region subtag remains as the 1247 preferred value for that region and no new entry is created. 1248 A comment MAY be added to the existing region subtag 1249 indicating the relationship to the new ISO 3166 code. 1251 E. For ISO 3166 codes, if the newly assigned code's meaning is 1252 associated with a UN M.49 code that is not represented by an 1253 existing region subtag, then the Language Subtag Reviewer, 1254 as described in Section 3.5, SHALL prepare a proposal for 1255 entering the appropriate UN M.49 country code as an entry in 1256 the IANA registry. 1258 F. For ISO 3166 codes, if there is no associated UN numeric 1259 code, then the Language Subtag Reviewer SHALL petition the 1260 UN to create one. If there is no response from the UN 1261 within ninety days of the request being sent, the Language 1262 Subtag Reviewer SHALL prepare a proposal for entering in the 1263 IANA registry as soon as practical a registered variant 1264 subtag as an alternate value for the new code. The form of 1265 the registered variant subtag will be at the discretion of 1266 the Language Subtag Reviewer and MUST conform to other 1267 restrictions on variant subtags in this document. This 1268 situation is very unlikely to ever occur. 1270 11. UN M.49 has codes for both countries and areas (such as '276' 1271 for Germany) and geographical regions and sub-regions (such as 1272 '150' for Europe). UN M.49 country or area codes for which 1273 there is no corresponding ISO 3166 code SHOULD NOT be 1274 registered, except as a surrogate for an ISO 3166 code that is 1275 blocked from registration by an existing subtag. If such a code 1276 becomes necessary, then the registration authority for ISO 3166 1277 SHOULD first be petitioned to assign a code to the region. If 1278 the petition for a code assignment by ISO 3166 is refused or not 1279 acted on in a timely manner, the registration process described 1280 in Section 3.5 MAY then be used to register the corresponding UN 1281 M.49 code. At the time this document was written, there were 1282 only four such codes: 830 (Channel Islands), 831 (Guernsey), 832 1283 (Jersey), and 833 (Isle of Man). This way UN M.49 codes remain 1284 available as the value of last resort in cases where ISO 3166 1285 reassigns a deprecated value in the registry. 1287 12. Stability provisions apply to grandfathered tags with this 1288 exception: should all of the subtags in a grandfathered tag 1289 become valid subtags in the IANA registry, then the field 'Type' 1290 in that record is changed from 'grandfathered' to 'redundant'. 1291 Note that this will not affect language tags that match the 1292 grandfathered tag, since these tags will now match valid 1293 generative subtag sequences. For example, if the subtag 'gan' 1294 in the language tag "zh-gan" were to be registered as an 1295 extended language subtag, then the grandfathered tag "zh-gan" 1296 would be deprecated (but existing content or implementations 1297 that use "zh-gan" would remain valid). 1299 3.5 Registration Procedure for Subtags 1301 The procedure given here MUST be used by anyone who wants to use a 1302 subtag not currently in the IANA Language Subtag Registry. 1304 Only subtags of type 'language' and 'variant' will be considered for 1305 independent registration of new subtags. Handling of subtags needed 1306 for stability and subtags necessary to keep the registry synchronized 1307 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits 1308 defined by this document are described in Section 3.3. Stability 1309 provisions are described in Section 3.4. 1311 This procedure MAY also be used to register or alter the information 1312 for the "Description", "Comments", "Deprecated", or "Prefix" fields 1313 in a subtag's record as described in Section 3.4. Changes to all 1314 other fields in the IANA registry are NOT permitted. 1316 Registering a new subtag or requesting modifications to an existing 1317 tag or subtag starts with the requester filling out the registration 1318 form reproduced below. Note that each response is not limited in 1319 size so that the request can adequately describe the registration. 1320 The fields in the "Record Requested" section SHOULD follow the 1321 requirements in Section 3.1. 1323 LANGUAGE SUBTAG REGISTRATION FORM 1324 1. Name of requester: 1325 2. E-mail address of requester: 1326 3. Record Requested: 1328 Type: 1329 Subtag: 1330 Description: 1331 Prefix: 1332 Preferred-Value: 1333 Deprecated: 1334 Suppress-Script: 1335 Comments: 1337 4. Intended meaning of the subtag: 1338 5. Reference to published description 1339 of the language (book or article): 1340 6. Any other relevant information: 1342 Figure 5: The Language Subtag Registration Form 1344 The subtag registration form MUST be sent to 1345 for a two week review period before it can 1346 be submitted to IANA. (This is an open list and can be joined by 1347 sending a request to .) 1349 Variant subtags are usually registered for use with a particular 1350 range of language tags. For example, the subtag 'rozaj' is intended 1351 for use with language tags that start with the primary language 1352 subtag "sl", since Resian is a dialect of Slovenian. Thus the subtag 1353 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" or "sl- 1354 IT-rozaj". This information is stored in the "Prefix" field in the 1355 registry. Variant registration requests SHOULD include at least one 1356 "Prefix" field in the registration form. 1358 Extended language subtags are reserved for future standardization. 1359 These subtags will be REQUIRED to include exactly one "Prefix" field 1360 once they are allowed for registration. 1362 The 'Prefix' field for a given registered subtag exists in the IANA 1363 registry as a guide to usage. Additional prefixes MAY be added by 1364 filing an additional registration form. In that form, the "Any other 1365 relevant information:" field MUST indicate that it is the addition of 1366 a prefix. 1368 Requests to add a prefix to a variant subtag that imply a different 1369 semantic meaning will probably be rejected. For example, a request 1370 to add the prefix "de" to the subtag 'nedis' so that the tag "de- 1371 nedis" represented some German dialect would be rejected. The 1372 'nedis' subtag represents a particular Slovenian dialect and the 1373 additional registration would change the semantic meaning assigned to 1374 the subtag. A separate subtag SHOULD be proposed instead. 1376 The 'Description' field MUST contain a description of the tag being 1377 registered written or transcribed into the Latin script; it MAY also 1378 include a description in a non-Latin script. Non-ASCII characters 1379 MUST be escaped using the syntax described in Section 3.1. The 1380 'Description' field is used for identification purposes and doesn't 1381 necessarily represent the actual native name of the language or 1382 variation or to be in any particular language. 1384 While the 'Description' field itself is not guaranteed to be stable 1385 and errata corrections MAY be undertaken from time to time, attempts 1386 to provide translations or transcriptions of entries in the registry 1387 itself will probably be frowned upon by the community or rejected 1388 outright, as changes of this nature have an impact on the provisions 1389 in Section 3.4. 1391 When the two week period has passed the Language Subtag Reviewer 1392 either forwards the record to be inserted or modified to 1393 iana@iana.org according to the procedure described in Section 3.3, or 1394 rejects the request because of significant objections raised on the 1395 list or due to problems with constraints in this document (which MUST 1396 be explicitly cited). The Language Subtag Reviewer MAY also extend 1397 the review period in two week increments to permit further 1398 discussion. The Language Subtag Reviewer MUST indicate on the list 1399 whether the registration has been accepted, rejected, or extended 1400 following each two week period. 1402 Note that the Language Subtag Reviewer MAY raise objections on the 1403 list if he or she so desires. The important thing is that the 1404 objection MUST be made publicly. 1406 The applicant is free to modify a rejected application with 1407 additional information and submit it again; this restarts the two 1408 week comment period. 1410 Decisions made by the Language Subtag Reviewer MAY be appealed to the 1411 IESG [RFC2028] under the same rules as other IETF decisions 1412 [RFC2026]. 1414 All approved registration forms are available online in the directory 1415 http://www.iana.org/numbers.html under "languages". 1417 Updates or changes to existing records follow the same procedure as 1418 new registrations. The Language Subtag Reviewer decides whether 1419 there is consensus to update the registration following the two week 1420 review period; normally objections by the original registrant will 1421 carry extra weight in forming such a consensus. 1423 Registrations are permanent and stable. Once registered, subtags 1424 will not be removed from the registry and will remain a valid way in 1425 which to specify a specific language or variant. 1427 Note: The purpose of the "Description" in the registration form is 1428 intended as an aid to people trying to verify whether a language is 1429 registered or what language or language variation a particular subtag 1430 refers to. In most cases, reference to an authoritative grammar or 1431 dictionary of that language will be useful; in cases where no such 1432 work exists, other well known works describing that language or in 1433 that language MAY be appropriate. The Language Subtag Reviewer 1434 decides what constitutes "good enough" reference material. This 1435 requirement is not intended to exclude particular languages or 1436 dialects due to the size of the speaker population or lack of a 1437 standardized orthography. Minority languages will be considered 1438 equally on their own merits. 1440 3.6 Possibilities for Registration 1442 Possibilities for registration of subtags or information about 1443 subtags include: 1445 o Primary language subtags for languages not listed in ISO 639 that 1446 are not variants of any listed or registered language MAY be 1447 registered. At the time this document was created there were no 1448 examples of this form of subtag. Before attempting to register a 1449 language subtag, there MUST be an attempt to register the language 1450 with ISO 639. Subtags MUST NOT be registered for codes that exist 1451 in ISO 639-1 or ISO 639-2, which are under consideration by the 1452 ISO 639 maintenance or registration authorities, or which have 1453 never been attempted for registration with those authorities. If 1454 ISO 639 has previously rejected a language for registration, it is 1455 reasonable to assume that there must be additional very compelling 1456 evidence of need before it will be registered in the IANA registry 1457 (to the extent that it is very unlikely that any subtags will be 1458 registered of this type). 1460 o Dialect or other divisions or variations within a language, its 1461 orthography, writing system, regional or historical usage, 1462 transliteration or other transformation, or distinguishing 1463 variation MAY be registered as variant subtags. An example is the 1464 'rozaj' subtag (the Resian dialect of Slovenian). 1466 o The addition or maintenance of fields (generally of an 1467 informational nature) in Tag or Subtag records as described in 1468 Section 3.1 and subject to the stability provisions in 1469 Section 3.4. This includes descriptions; comments; deprecation 1470 and preferred values for obsolete or withdrawn codes; or the 1471 addition of script or extlang information to primary language 1472 subtags. 1474 o The addition of records and related field value changes necessary 1475 to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and 1476 UN M.49 as described in Section 3.4. 1478 Subtags proposed for registration that would cause all or part of a 1479 grandfathered tag to become redundant but whose meaning conflicts 1480 with or alters the meaning of the grandfathered tag MUST be rejected. 1482 This document leaves the decision on what subtags or changes to 1483 subtags are appropriate (or not) to the registration process 1484 described in Section 3.5. 1486 Note: four character primary language subtags are reserved to allow 1487 for the possibility of alpha4 codes in some future addition to the 1488 ISO 639 family of standards. 1490 ISO 639 defines a maintenance agency for additions to and changes in 1491 the list of languages in ISO 639. This agency is: 1493 International Information Centre for Terminology (Infoterm) 1494 Aichholzgasse 6/12, AT-1120 1495 Wien, Austria 1496 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 1498 ISO 639-2 defines a maintenance agency for additions to and changes 1499 in the list of languages in ISO 639-2. This agency is: 1501 Library of Congress 1502 Network Development and MARC Standards Office 1503 Washington, D.C. 20540 USA 1504 Phone: +1 202 707 6237 Fax: +1 202 707 0115 1505 URL: http://www.loc.gov/standards/iso639 1507 The maintenance agency for ISO 3166 (country codes) is: 1509 ISO 3166 Maintenance Agency 1510 c/o International Organization for Standardization 1511 Case postale 56 1512 CH-1211 Geneva 20 Switzerland 1513 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 1514 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html 1516 The registration authority for ISO 15924 (script codes) is: 1518 Unicode Consortium Box 391476 1519 Mountain View, CA 94039-1476, USA 1520 URL: http://www.unicode.org/iso15924 1522 The Statistics Division of the United Nations Secretariat maintains 1523 the Standard Country or Area Codes for Statistical Use and can be 1524 reached at: 1526 Statistical Services Branch 1527 Statistics Division 1528 United Nations, Room DC2-1620 1529 New York, NY 10017, USA 1531 Fax: +1-212-963-0623 1532 E-mail: statistics@un.org 1533 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm 1535 3.7 Extensions and Extensions Registry 1537 Extension subtags are those introduced by single character subtags 1538 ("singletons") other than 'x'. They are reserved for the generation 1539 of identifiers which contain a language component, and are compatible 1540 with applications that understand language tags. 1542 The structure and form of extensions are defined by this document so 1543 that implementations can be created that are forward compatible with 1544 applications that might be created using singletons in the future. 1545 In addition, defining a mechanism for maintaining singletons will 1546 lend stability to this document by reducing the likely need for 1547 future revisions or updates. 1549 Single character subtags are assigned by IANA using the "IETF 1550 Consensus" policy defined by [RFC2434]. This policy requires the 1551 development of an RFC, which SHALL define the name, purpose, 1552 processes, and procedures for maintaining the subtags. The 1553 maintaining or registering authority, including name, contact email, 1554 discussion list email, and URL location of the registry MUST be 1555 indicated clearly in the RFC. The RFC MUST specify or include each 1556 of the following: 1558 o The specification MUST reference the specific version or revision 1559 of this document that governs its creation and MUST reference this 1560 section of this document. 1562 o The specification and all subtags defined by the specification 1563 MUST follow the ABNF and other rules for the formation of tags and 1564 subtags as defined in this document. In particular it MUST 1565 specify that case is not significant and that subtags MUST NOT 1566 exceed eight characters in length. 1568 o The specification MUST specify a canonical representation. 1570 o The specification of valid subtags MUST be available over the 1571 Internet and at no cost. 1573 o The specification MUST be in the public domain or available via a 1574 royalty-free license acceptable to the IETF and specified in the 1575 RFC. 1577 o The specification MUST be versioned and each version of the 1578 specification MUST be numbered, dated, and stable. 1580 o The specification MUST be stable. That is, extension subtags, 1581 once defined by a specification, MUST NOT be retracted or change 1582 in meaning in any substantial way. 1584 o The specification MUST include in a separate section the 1585 registration form reproduced in this section (below) to be used in 1586 registering the extension upon publication as an RFC. 1588 o IANA MUST be informed of changes to the contact information and 1589 URL for the specification. 1591 IANA will maintain a registry of allocated single character 1592 (singleton) subtags. This registry MUST use the record-jar format 1593 described by the ABNF in Section 3.1. Upon publication of an 1594 extension as an RFC, the maintaining authority defined in the RFC 1595 MUST forward this registration form to iesg@ietf.org, who MUST 1596 forward the request to iana@iana.org. The maintaining authority of 1597 the extension MUST maintain the accuracy of the record by sending an 1598 updated full copy of the record to iana@iana.org with the subject 1599 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only 1600 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY 1601 be modified in these updates. 1603 Failure to maintain this record, the corresponding registry, or meet 1604 other conditions imposed by this section of this document MAY be 1605 appealed to the IESG [RFC2028] under the same rules as other IETF 1606 decisions (see [RFC2026]) and MAY result in the authority to maintain 1607 the extension being withdrawn or reassigned by the IESG. 1608 %% 1609 Identifier: 1610 Description: 1611 Comments: 1612 Added: 1613 RFC: 1614 Authority: 1615 Contact_Email: 1616 Mailing_List: 1617 URL: 1618 %% 1620 Figure 6: Format of Records in the Language Tag Extensions Registry 1622 'Identifier' contains the single character subtag (singleton) 1623 assigned to the extension. The Internet-Draft submitted to define 1624 the extension SHOULD specify which letter or digit to use, although 1625 the IESG MAY change the assignment when approving the RFC. 1627 'Description' contains the name and description of the extension. 1629 'Comments' is an OPTIONAL field and MAY contain a broader description 1630 of the extension. 1632 'Added' contains the date the RFC was published in the "full-date" 1633 format specified in [RFC3339]. For example: 2004-06-28 represents 1634 June 28, 2004, in the Gregorian calendar. 1636 'RFC' contains the RFC number assigned to the extension. 1638 'Authority' contains the name of the maintaining authority for the 1639 extension. 1641 'Contact_Email' contains the email address used to contact the 1642 maintaining authority. 1644 'Mailing_List' contains the URL or subscription email address of the 1645 mailing list used by the maintaining authority. 1647 'URL' contains the URL of the registry for this extension. 1649 The determination of whether an Internet-Draft meets the above 1650 conditions and the decision to grant or withhold such authority rests 1651 solely with the IESG, and is subject to the normal review and appeals 1652 process associated with the RFC process. 1654 Extension authors are strongly cautioned that many (including most 1655 well-formed) processors will be unaware of any special relationships 1656 or meaning inherent in the order of extension subtags. Extension 1657 authors SHOULD avoid subtag relationships or canonicalization 1658 mechanisms that interfere with matching or with length restrictions 1659 that sometimes exist in common protocols where the extension is used. 1660 In particular, applications MAY truncate the subtags in doing 1661 matching or in fitting into limited lengths, so it is RECOMMENDED 1662 that the most significant information be in the most significant 1663 (left-most) subtags, and that the specification gracefully handle 1664 truncated subtags. 1666 When a language tag is to be used in a specific, known, protocol, it 1667 is RECOMMENDED that that the language tag not contain extensions not 1668 supported by that protocol. In addition, note that some protocols 1669 MAY impose upper limits on the length of the strings used to store or 1670 transport the language tag. 1672 3.8 Initialization of the Registries 1674 Upon adoption of this document an initial version of the Language 1675 Subtag Registry containing the various subtags initially valid in a 1676 language tag is necessary. This collection of subtags, along with a 1677 description of the process used to create it, is described by 1678 [initial-registry]. IANA SHALL publish the initial version of the 1679 registry described by this document from the content of [initial- 1680 registry]. Once published by IANA, the maintenance procedures, rules 1681 and registration processes described in this document will be 1682 available for new registrations or updates. 1684 Registrations that are in process under the rules defined in 1685 [RFC3066] when this document is adopted MAY be completed under the 1686 former rules, at the discretion of the Language Tag Reviewer (as 1687 described in [RFC3066]). Until the IESG officially appoints a 1688 Language Subtag Reviewer, the existing Language Tag Reviewer SHALL 1689 serve as the Language Subtag Reviewer. 1691 Any new registrations submitted using the RFC 3066 forms or format 1692 after the adoption of this document and publication of the registry 1693 by IANA MUST be rejected. 1695 An initial version of the Language Extension Registry described in 1696 Section 3.7 is also needed. The Language Extension Registry SHALL be 1697 initialized with a single record containing a single field of type 1698 "File-Date" as a placeholder for future assignments. 1700 4. Formation and Processing of Language Tags 1702 This section addresses how to use the information in the registry 1703 with the tag syntax to choose, form and process language tags. 1705 4.1 Choice of Language Tag 1707 One is sometimes faced with the choice between several possible tags 1708 for the same body of text. 1710 Interoperability is best served when all users use the same language 1711 tag in order to represent the same language. If an application has 1712 requirements that make the rules here inapplicable, then that 1713 application risks damaging interoperability. It is strongly 1714 RECOMMENDED that users not define their own rules for language tag 1715 choice. 1717 Subtags SHOULD only be used where they add useful distinguishing 1718 information; extraneous subtags interfere with the meaning, 1719 understanding, and processing of language tags. In particular, users 1720 and implementations SHOULD follow the 'Prefix' and 'Suppress-Script' 1721 fields in the registry (defined in Section 3.1): these fields provide 1722 guidance on when specific additional subtags SHOULD (and SHOULD NOT) 1723 be used in a language tag. 1725 Of particular note, many applications can benefit from the use of 1726 script subtags in language tags, as long as the use is consistent for 1727 a given context. Script subtags were not formally defined in RFC 1728 3066 and their use can affect matching and subtag identification by 1729 implementations of RFC 3066, as these subtags appear between the 1730 primary language and region subtags. For example, if a user requests 1731 content in an implementation of Section 2.5 of [RFC3066] using the 1732 language range "en-US", content labeled "en-Latn-US" will not match 1733 the request. Therefore it is important to know when script subtags 1734 will customarily be used and when they ought not be used. In the 1735 registry, the Suppress-Script field helps ensure greater 1736 compatibility between the language tags generated according to the 1737 rules in this document and language tags and tag processors or 1738 consumers based on RFC 3066 by defining when users SHOULD NOT include 1739 a script subtag with a particular primary language subtag. 1741 Extended language subtags (type 'extlang' in the registry, see 1742 Section 3.1) also appear between the primary language and region 1743 subtags and are reserved for future standardization. Applications 1744 might benefit from their judicious use in forming language tags in 1745 the future. Similar recommendations are expected to apply to their 1746 use as apply to script subtags. 1748 Standards, protocols and applications that reference this document 1749 normatively but apply different rules to the ones given in this 1750 section MUST specify how the procedure varies from the one given 1751 here. 1753 The choice of subtags used to form a language tag SHOULD be guided by 1754 the following rules: 1756 1. Use as precise a tag as possible, but no more specific than is 1757 justified. Avoid using subtags that are not important for 1758 distinguishing content in an application. 1760 * For example, 'de' might suffice for tagging an email written 1761 in German, while "de-CH-1996" is probably unnecessarily 1762 precise for such a task. 1764 2. The script subtag SHOULD NOT be used to form language tags unless 1765 the script adds some distinguishing information to the tag. The 1766 field 'Suppress-Script' in the primary language record in the 1767 registry indicates which script subtags do not add distinguishing 1768 information for most applications. 1770 * For example, the subtag 'Latn' should not be used with the 1771 primary language 'en' because nearly all English documents are 1772 written in the Latin script and it adds no distinguishing 1773 information. However, if a document were written in English 1774 mixing Latin script with another script such as Braille 1775 ('Brai'), then it might be appropriate to choose to indicate 1776 both scripts to aid in content selection, such as the 1777 application of a style sheet. 1779 3. If a tag or subtag has a 'Preferred-Value' field in its registry 1780 entry, then the value of that field SHOULD be used to form the 1781 language tag in preference to the tag or subtag in which the 1782 preferred value appears. 1784 * For example, use 'he' for Hebrew in preference to 'iw'. 1786 4. The 'und' (Undetermined) primary language subtag SHOULD NOT be 1787 used to label content, even if the language is unknown. Omitting 1788 the language tag altogether is preferred to using a tag with a 1789 primary language subtag of 'und'. The 'und' subtag MAY be useful 1790 for protocols that require a language tag to be provided. The 1791 'und' subtag MAY also be useful when matching language tags in 1792 certain situations. 1794 5. The 'mul' (Multiple) primary language subtag SHOULD NOT be used 1795 whenever the protocol allows the separate tags for multiple 1796 languages, as is the case for the Content-Language header in 1797 HTTP. The 'mul' subtag conveys little useful information: 1798 content in multiple languages SHOULD individually tag the 1799 languages where they appear or otherwise indicate the actual 1800 language in preference to the 'mul' subtag. 1802 6. The same variant subtag SHOULD NOT be used more than once within 1803 a language tag. 1805 * For example, do not use "de-DE-1901-1901". 1807 To ensure consistent backward compatibility, this document contains 1808 several provisions to account for potential instability in the 1809 standards used to define the subtags that make up language tags. 1810 These provisions mean that no language tag created under the rules in 1811 this document will become obsolete. 1813 4.2 Meaning of the Language Tag 1815 The relationship between the tag and the information it relates to is 1816 defined by the context in which the tag appears. Accordingly, this 1817 section gives only possible examples of its usage. 1819 o For a single information object, the associated language tags 1820 might be interpreted as the set of languages that is necessary for 1821 a complete comprehension of the complete object. Example: Plain 1822 text documents. 1824 o For an aggregation of information objects, the associated language 1825 tags could be taken as the set of languages used inside components 1826 of that aggregation. Examples: Document stores and libraries. 1828 o For information objects whose purpose is to provide alternatives, 1829 the associated language tags could be regarded as a hint that the 1830 content is provided in several languages, and that one has to 1831 inspect each of the alternatives in order to find its language or 1832 languages. In this case, the presence of multiple tags might not 1833 mean that one needs to be multi-lingual to get complete 1834 understanding of the document. Example: MIME multipart/ 1835 alternative. 1837 o In markup languages, such as HTML and XML, language information 1838 can be added to each part of the document identified by the markup 1839 structure (including the whole document itself). For example, one 1840 could write C'est la vie. inside a 1841 Norwegian document; the Norwegian-speaking user could then access 1842 a French-Norwegian dictionary to find out what the marked section 1843 meant. If the user were listening to that document through a 1844 speech synthesis interface, this formation could be used to signal 1845 the synthesizer to appropriately apply French text-to-speech 1846 pronunciation rules to that span of text, instead of applying the 1847 inappropriate Norwegian rules. 1849 Language tags are related when they contain a similar sequence of 1850 subtags. For example, if a language tag B contains language tag A as 1851 a prefix, then B is typically "narrower" or "more specific" than A. 1852 Thus "zh-Hant-TW" is more specific than "zh-Hant". 1854 This relationship is not guaranteed in all cases: specifically, 1855 languages that begin with the same sequence of subtags are NOT 1856 guaranteed to be mutually intelligible, although they might be. For 1857 example, the tag "az" shares a prefix with both "az-Latn" 1858 (Azerbaijani written using the Latin script) and "az-Cyrl" 1859 (Azerbaijani written using the Cyrillic script). A person fluent in 1860 one script might not be able to read the other, even though the text 1861 might be identical. Content tagged as "az" most probably is written 1862 in just one script and thus might not be intelligible to a reader 1863 familiar with the other script. 1865 4.3 Length Considerations 1867 [RFC3066] did not provide an upper limit on the size of language 1868 tags. While RFC 3066 did define the semantics of particular subtags 1869 in such a way that most language tags consisted of language and 1870 region subtags with a combined total length of up to six characters, 1871 larger registered tags were not only possible but were actually 1872 registered. 1874 Neither the language tag syntax nor other requirements in this 1875 document impose a fixed upper limit on the number of subtags in a 1876 language tag (and thus an upper bound on the size of a tag). The 1877 language tag syntax suggests that, depending on the specific 1878 language, more subtags (and thus a longer tag) are sometimes 1879 necessary to completely identify the language for certain 1880 applications; thus it is possible to envision long or complex subtag 1881 sequences. 1883 4.3.1 Working with Limited Buffer Sizes 1885 Some applications and protocols are forced to allocate fixed buffer 1886 sizes or otherwise limit the length of a language tag. A conformant 1887 implementation or specification MAY refuse to support the storage of 1888 language tags which exceed a specified length. Any such limitation 1889 SHOULD be clearly documented, and such documentation SHOULD include 1890 what happens to longer tags (for example, whether an error value is 1891 generated or the language tag is truncated). A protocol that allows 1892 tags to be truncated at an arbitrary limit, without giving any 1893 indication of what that limit is, has the potential for causing harm 1894 by changing the meaning of tags in substantial ways. 1896 In practice, most language tags do not require more than a few 1897 subtags and will not approach reasonably sized buffer limitations: 1898 see Section 4.1. 1900 Some specifications or protocols have limits on tag length but do not 1901 have a fixed length limitation. For example, [RFC2231] has no 1902 explicit length limitation: the length available for the language tag 1903 is constrained by the length of other header components (such as the 1904 charset's name) coupled with the 76 character limit in [RFC2047]. 1905 Thus the "limit" might be 50 or more characters, but it could 1906 potentially be quite small. 1908 The considerations for assigning a buffer limit are: 1910 Implementations SHOULD NOT truncate language tags unless the 1911 meaning of the tag is purposefully being changed, or unless the 1912 tag does not fit into a limited buffer size specified by a 1913 protocol for storage or transmission. 1915 Implementations SHOULD warn the user when a tag is truncated since 1916 truncation changes the semantic meaning of the tag. 1918 Implementations of protocols or specifications that are space 1919 constrained but do not have a fixed limit SHOULD use the longest 1920 possible tag in preference to truncation. 1922 Protocols or specifications that specify limited buffer sizes for 1923 language tags MUST allow for language tags of up to 33 characters. 1925 Protocols or specifications that specify limited buffer sizes for 1926 language tags SHOULD allow for language tags of at least 42 1927 characters. 1929 The following illustration shows how the 42-character recommendation 1930 was derived. The combination of language and extended language 1931 subtags was chosen for future compatibility. At up to 15 characters, 1932 this combination is longer than the longest possible primary language 1933 subtag (8 characters): 1935 language = 3 (ISO 639-2; ISO 639-1 requires 2) 1936 extlang1 = 4 (each subsequent subtag includes '-') 1937 extlang2 = 4 (unlikely: needs prefix="language-extlang1") 1938 extlang3 = 4 (extremely unlikely) 1939 script = 5 (if not suppressed: see Section 4.1) 1940 region = 4 (UN M.49; ISO 3166 requires 3) 1941 variant1 = 9 (MUST have language as a prefix) 1942 variant2 = 9 (MUST have language-variant1 as a prefix) 1944 total = 42 characters 1946 Figure 7: Derivation of the Limit on Tag Length 1948 4.3.2 Truncation of Language Tags 1950 Truncation of a language tag alters the meaning of the tag, and thus 1951 SHOULD be avoided. However, truncation of language tags is sometimes 1952 necessary due to limited buffer sizes. Such truncation MUST NOT 1953 permit a subtag to be chopped off in the middle or the formation of 1954 invalid tags (for example, one ending with the "-" character). 1956 This means that applications or protocols which truncate tags MUST do 1957 so by progressively removing subtags along with their preceding "-" 1958 from the right side of the language tag until the tag is short enough 1959 for the given buffer. If the resulting tag ends with a single- 1960 character subtag, that subtag and its preceding "-" MUST also be 1961 removed. For example: 1963 Tag to truncate: zh-Hant-CN-variant1-a-extend1-x-wadegile-private1 1964 1. zh-Latn-CN-variant1-a-extend1-x-wadegile 1965 2. zh-Latn-CN-variant1-a-extend1 1966 3. zh-Latn-CN-variant1 1967 4. zh-Latn-CN 1968 5. zh-Latn 1969 6. zh 1971 Figure 8: Example of Tag Truncation 1973 4.4 Canonicalization of Language Tags 1975 Since a particular language tag is sometimes used by many processes, 1976 language tags SHOULD always be created or generated in a canonical 1977 form. 1979 A language tag is in canonical form when: 1981 1. The tag is well-formed according the rules in Section 2.1 and 1982 Section 2.2. 1984 2. Subtags of type 'Region' that have a Preferred-Value mapping in 1985 the IANA registry (see Section 3.1) SHOULD be replaced with their 1986 mapped value. Note: In rare cases the mapped value will also 1987 have a Preferred-Value. 1989 3. Redundant or grandfathered tags that have a Preferred-Value 1990 mapping in the IANA registry (see Section 3.1) MUST be replaced 1991 with their mapped value. These items are either deprecated 1992 mappings created before the adoption of this document (such as 1993 the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are 1994 the result of later registrations or additions to this document 1995 (for example, "zh-guoyu" might be mapped to a language-extlang 1996 combination such as "zh-cmn" by some future update of this 1997 document). 1999 4. Other subtags that have a Preferred-Value mapping in the IANA 2000 registry (see Section 3.1) MUST be replaced with their mapped 2001 value. These items consist entirely of clerical corrections to 2002 ISO 639-1 in which the deprecated subtags have been maintained 2003 for compatibility purposes. 2005 5. If more than one extension subtag sequence exists, the extension 2006 sequences are ordered into case-insensitive ASCII order by 2007 singleton subtag. 2009 Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical 2010 form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in 2011 canonical form. 2013 Example: The language tag "en-BU" (English as used in Burma) is not 2014 canonical because the 'BU' subtag has a canonical mapping to 'MM' 2015 (Myanmar), although the tag "en-BU" maintains its validity. 2017 Canonicalization of language tags does not imply anything about the 2018 use of upper or lowercase letters when processing or comparing 2019 subtags (and as described in Section 2.1). All comparisons MUST be 2020 performed in a case-insensitive manner. 2022 When performing canonicalization of language tags, processors MAY 2023 regularize the case of the subtags (that is, this process is 2024 OPTIONAL), following the case used in the registry. Note that this 2025 corresponds to the following casing rules: uppercase all non-initial 2026 two-letter subtags; titlecase all non-initial four-letter subtags; 2027 lowercase everything else. 2029 Note: Case folding of ASCII letters in certain locales, unless 2030 carefully handled, sometimes produces non-ASCII character values. 2031 The Unicode Character Database file "SpecialCasing.txt" defines the 2032 specific cases that are known to cause problems with this. In 2033 particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is 2034 uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). 2035 Implementers SHOULD specify a locale-neutral casing operation to 2036 ensure that case folding of subtags does not produce this value, 2037 which is illegal in language tags. For example, if one were to 2038 uppercase the region subtag 'in' using Turkish locale rules, the 2039 sequence U+0130 U+004E would result instead of the expected 'IN'. 2041 Note: if the field 'Deprecated' appears in a registry record without 2042 an accompanying 'Preferred-Value' field, then that tag or subtag is 2043 deprecated without a replacement. Validating processors SHOULD NOT 2044 generate tags that include these values, although the values are 2045 canonical when they appear in a language tag. 2047 An extension MUST define any relationships that exist between the 2048 various subtags in the extension and thus MAY define an alternate 2049 canonicalization scheme for the extension's subtags. Extensions MAY 2050 define how the order of the extension's subtags are interpreted. For 2051 example, an extension could define that its subtags are in canonical 2052 order when the subtags are placed into ASCII order: that is, "en-a- 2053 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might 2054 define that the order of the subtags influences their semantic 2055 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- 2056 aaa-bbb-ccc"). However, extension specifications SHOULD be designed 2057 so that they are tolerant of the typical processes described in 2058 Section 3.7. 2060 4.5 Considerations for Private Use Subtags 2062 Private use subtags, like all other subtags, MUST conform to the 2063 format and content constraints in the ABNF. Private use subtags have 2064 no meaning outside the private agreement between the parties that 2065 intend to use or exchange language tags that employ them. The same 2066 subtags MAY be used with a different meaning under a separate private 2067 agreement. They SHOULD NOT be used where alternatives exist and 2068 SHOULD NOT be used in content or protocols intended for general use. 2070 Private use subtags are simply useless for information exchange 2071 without prior arrangement. The value and semantic meaning of private 2072 use tags and of the subtags used within such a language tag are not 2073 defined by this document. 2075 Subtags defined in the IANA registry as having a specific private use 2076 meaning convey more information that a purely private use tag 2077 prefixed by the singleton subtag 'x'. For applications this 2078 additional information MAY be useful. 2080 For example, the region subtags 'AA', 'ZZ' and in the ranges 2081 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY 2082 be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a 2083 great deal of public, interchangeable information about the language 2084 material (that it is Chinese in the simplified Chinese script and is 2085 suitable for some geographic region 'XQ'). While the precise 2086 geographic region is not known outside of private agreement, the tag 2087 conveys far more information than an opaque tag such as "x-someLang", 2088 which contains no information about the language subtag or script 2089 subtag outside of the private agreement. 2091 However, in some cases content tagged with private use subtags MAY 2092 interact with other systems in a different and possibly unsuitable 2093 manner compared to tags that use opaque, privately defined subtags, 2094 so the choice of the best approach sometimes depends on the 2095 particular domain in question. 2097 5. IANA Considerations 2099 This section deals with the processes and requirements necessary for 2100 IANA to undertake to maintain the subtag and extension registries as 2101 defined by this document and in accordance with the requirements of 2102 [RFC2434]. 2104 The impact on the IANA maintainers of the two registries defined by 2105 this document will be a small increase in the frequency of new 2106 entries or updates. 2108 5.1 Language Subtag Registry 2110 Upon adoption of this document, the registry will be initialized by a 2111 companion document: [initial-registry]. The criteria and process for 2112 selecting the initial set of records is described in that document. 2113 The initial set of records represents no impact on IANA, since the 2114 work to create it will be performed externally. 2116 The new registry MUST be listed under "Language Tags" at 2117 , replacing the existing 2118 registrations defined by [RFC3066]. The existing set of registration 2119 forms and RFC 3066 registrations MUST be relabeled as "Language Tags 2120 (Obsolete)" and maintained (but not added to or modified). 2122 Future work on the Language Subtag Registry SHALL be limited to 2123 inserting or replacing whole records preformatted for IANA by the 2124 Language Subtag Reviewer as described in Section 3.3 of this document 2125 and archiving the forwarded registration form. 2127 Each record MUST be sent to iana@iana.org with a subject line 2128 indicating whether the enclosed record is an insertion of a new 2129 record (indicated by the word "INSERT" in the subject line) or a 2130 replacement of an existing record (indicated by the word "MODIFY" in 2131 the subject line). Records MUST NOT be deleted from the registry. 2132 IANA MUST place any inserted or modified records into the appropriate 2133 section of the language subtag registry, grouping the records by 2134 their 'Type' field. Inserted records MAY be placed anywhere in the 2135 appropriate section; there is no guarantee of the order of the 2136 records beyond grouping them together by 'Type'. Modified records 2137 MUST overwrite the record they replace. 2139 Included in any request to insert or modify records MUST be a new 2140 File-Date record. This record MUST be placed first in the registry. 2141 In the event that the File-Date record present in the registry has a 2142 later date then the record being inserted or modified, the existing 2143 record MUST be preserved. 2145 5.2 Extensions Registry 2147 The Language Tag Extensions registry will also be generated and sent 2148 to IANA as described in Section 3.7. This registry can contain at 2149 most 35 records and thus changes to this registry are expected to be 2150 very infrequent. 2152 Future work by IANA on the Language Tag Extensions Registry is 2153 limited to two cases. First, the IESG MAY request that new records 2154 be inserted into this registry from time to time. These requests 2155 MUST include the record to insert in the exact format described in 2156 Section 3.7. In addition, there MAY be occasional requests from the 2157 maintaining authority for a specific extension to update the contact 2158 information or URLs in the record. These requests MUST include the 2159 complete, updated record. IANA is not responsible for validating the 2160 information provided, only that it is properly formatted. It should 2161 reasonably be seen to come from the maintaining authority named in 2162 the record present in the registry. 2164 6. Security Considerations 2166 Language tags used in content negotiation, like any other information 2167 exchanged on the Internet, might be a source of concern because they 2168 might be used to infer the nationality of the sender, and thus 2169 identify potential targets for surveillance. 2171 This is a special case of the general problem that anything sent is 2172 visible to the receiving party and possibly to third parties as well. 2173 It is useful to be aware that such concerns can exist in some cases. 2175 The evaluation of the exact magnitude of the threat, and any possible 2176 countermeasures, is left to each application protocol (see BCP 72 2177 [RFC3552] for best current practice guidance on security threats and 2178 defenses). 2180 The language tag associated with a particular information item is of 2181 no consequence whatsoever in determining whether that content might 2182 contain possible homographs. The fact that a text is tagged as being 2183 in one language or using a particular script subtag provides no 2184 assurance whatsoever that it does not contain characters from scripts 2185 other than the one(s) associated with or specified by that language 2186 tag. 2188 Since there is no limit to the number of variant, private use, and 2189 extension subtags, and consequently no limit on the possible length 2190 of a tag, implementations need to guard against buffer overflow 2191 attacks. See Section 4.3 for details on language tag truncation, 2192 which can occur as a consequence of defenses against buffer overflow. 2194 Although the specification of valid subtags for an extension (see: 2195 Section 3.7) MUST be available over the Internet, implementations 2196 SHOULD NOT mechanically depend on it being always accessible, to 2197 prevent denial-of-service attacks. 2199 7. Character Set Considerations 2201 The syntax in this document requires that language tags use only the 2202 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most 2203 character sets, so the composition of language tags should not have 2204 any character set issues. 2206 Rendering of characters based on the content of a language tag is not 2207 addressed in this memo. Historically, some languages have relied on 2208 the use of specific character sets or other information in order to 2209 infer how a specific character should be rendered (notably this 2210 applies to language and culture specific variations of Han ideographs 2211 as used in Japanese, Chinese, and Korean). When language tags are 2212 applied to spans of text, rendering engines sometimes use that 2213 information in deciding which font to use in the absence of other 2214 information, particularly where languages with distinct writing 2215 traditions use the same characters. 2217 8. Changes from RFC 3066 2219 The main goals for this revision of language tags were the following: 2221 *Compatibility.* All RFC 3066 language tags (including those in the 2222 IANA registry) remain valid in this specification. The changes in 2223 this document represent additional constraints on language tags. 2224 That is, in no case is the syntax more permissive and processors 2225 based on the ABNF and other provisions of RFC 3066 (such as those 2226 described in [XMLSchema]) will be able to process the tags described 2227 by this document. In addition, this document defines language tags 2228 in such as way as to ensure future compatibility. 2230 *Stability.* Because of changes in the past in the underlying ISO 2231 standards, a valid RFC 3066 language tag could become invalid or have 2232 its meaning change. This has the potential of invalidating content 2233 that may have an extensive shelf-life. In this specification, once a 2234 language tag is valid, it remains valid forever. 2236 *Validity.* The structure of language tags defined by this document 2237 makes it possible to determine if a particular tag is well-formed 2238 without regard for the actual content or "meaning" of the tag as a 2239 whole. This is important because the registry grows and underlying 2240 standards change over time. In addition, it must be possible to 2241 determine if a tag is valid (or not) for a given point in time in 2242 order to provide reproducible, testable results. This process must 2243 not be error-prone; otherwise implementations might give different 2244 results. By having an authoritative registry with specific 2245 versioning information, the validity of language tags at any point in 2246 time can be precisely determined (instead of interpolating values 2247 from many separate sources). 2249 *Utility.* It is sometimes important to be able to differentiate 2250 between written forms of a language -- for many implementations this 2251 is more important than distinguishing between the spoken variants of 2252 a language. Languages are written in a wide variety of different 2253 scripts, so this document provides for the generative use of ISO 2254 15924 script codes. Like the generative use of ISO language and 2255 country codes in RFC 3066, this allows combinations to be produced 2256 without resorting to the registration process. The addition of UN 2257 M.49 codes provides for the generation of language tags with regional 2258 scope, which is also required by some applications. 2260 The recast of the registry from containing whole language tags to 2261 subtags is a key part of this. An important feature of RFC 3066 was 2262 that it allowed generative use of subtags. This allows people to 2263 meaningfully use generated tags, without the delays in registering 2264 whole tags or the need to register all of the combinations that might 2265 be useful. 2267 The choice of placing the extended language and script subtags 2268 between the primary language and region subtags was widely debated. 2269 This design was chosen because the prevalent matching and content 2270 negotiation schemes rely on the subtags being arranged in order of 2271 increasing specificity. That is, the subtags that mark a greater 2272 barrier to mutual intelligibility appear left-most in a tag. For 2273 example, when selecting content written in Azerbaijani, the script 2274 (Arabic, Cyrillic, or Latin) represents a greater barrier to 2275 understanding than any regional variations (those associated with 2276 Azerbaijan or Iran, for example). Individuals who prefer documents 2277 in a particular script, but can deal with the minor regional 2278 differences, can therefore select appropriate content. Applications 2279 that do not deal with written content will continue to omit these 2280 subtags. 2282 *Extensibility.* Because of the widespread use of language tags, it 2283 is disruptive to have periodic revisions of the core specification, 2284 even in the face of demonstrated need. The extension mechanism 2285 provides for a way for independent RFCs to define extensions to 2286 language tags. These extensions have a very constrained, well- 2287 defined structure that prevent extensions from interfering with 2288 implementations of language tags defined in this document. 2290 The document also anticipates features of ISO 639-3 with the addition 2291 of the extended language subtags, as well as the possibility of other 2292 ISO 639 parts becoming useful for the formation of language tags in 2293 the future. 2295 The use and definition of private use tags has also been modified, to 2296 allow people to use private use subtags to extend or modify defined 2297 tags and to move as much information as possible out of private use 2298 and into the regular structure. 2300 The goal for each of these modifications is to reduce or eliminate 2301 the need for future revisions of this document. 2303 The specific changes in this document to meet these goals are: 2305 o Defines the ABNF and rules for subtags so that the category of all 2306 subtags can be determined without reference to the registry. 2308 o Adds the concept of well-formed vs. validating processors, 2309 defining the rules by which an implementation can claim to be one 2310 or the other. 2312 o Replaces the IANA language tag registry with a language subtag 2313 registry that provides a complete list of valid subtags in the 2314 IANA registry. This allows for robust implementation and ease of 2315 maintenance. The language subtag registry becomes the canonical 2316 source for forming language tags. 2318 o Provides a process that guarantees stability of language tags, by 2319 handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in 2320 the event that they register a previously used value for a new 2321 purpose. 2323 o Allows ISO 15924 script code subtags and allows them to be used 2324 generatively. Defines a method for indicating in the registry 2325 when script subtags are necessary for a given language tag. 2327 o Adds the concept of a variant subtag and allows variants to be 2328 used generatively. 2330 o Adds the ability to use a class of UN M.49 tags for supra-national 2331 regions and to resolve conflicts in the assignment of ISO 3166 2332 codes. 2334 o Defines the private use tags in ISO 639, ISO 15924, and ISO 3166 2335 as the mechanism for creating private use language, script, and 2336 region subtags respectively. 2338 o Adds a well-defined extension mechanism. 2340 o Defines an extended language subtag, possibly for use with certain 2341 anticipated features of ISO 639-3. 2343 9. References 2345 9.1 Normative References 2347 [ISO10646] 2348 International Organization for Standardization, "ISO/IEC 2349 10646:2003. Information technology -- Universal Multiple- 2350 Octet Coded Character Set (UCS)", 2003. 2352 [ISO15924] 2353 International Organization for Standardization, "ISO 2354 15924:2004. Information and documentation -- Codes for the 2355 representation of names of scripts", January 2004. 2357 [ISO3166-1] 2358 International Organization for Standardization, "ISO 3166- 2359 1:1997. Codes for the representation of names of countries 2360 and their subdivisions -- Part 1: Country codes", 1997. 2362 [ISO639-1] 2363 International Organization for Standardization, "ISO 639- 2364 1:2002. Codes for the representation of names of languages 2365 -- Part 1: Alpha-2 code", 2002. 2367 [ISO639-2] 2368 International Organization for Standardization, "ISO 639- 2369 2:1998. Codes for the representation of names of languages 2370 -- Part 2: Alpha-3 code, first edition", 1998. 2372 [ISO646] International Organization for Standardization, "ISO/IEC 2373 646:1991, Information technology -- ISO 7-bit coded 2374 character set for information interchange.", 1991. 2376 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 2377 3", BCP 9, RFC 2026, October 1996. 2379 [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in 2380 the IETF Standards Process", BCP 11, RFC 2028, 2381 October 1996. 2383 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2384 Requirement Levels", BCP 14, RFC 2119, March 1997. 2386 [RFC2234bis] 2387 Crocker, D. and P. Overell, "Augmented BNF for Syntax 2388 Specifications: ABNF", draft-crocker-abnf-rfc2234bis-00 2389 (work in progress), March 2005. 2391 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2392 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2393 October 1998. 2395 [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of 2396 Understanding Concerning the Technical Work of the 2397 Internet Assigned Numbers Authority", RFC 2860, June 2000. 2399 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2400 Timestamps", RFC 3339, July 2002. 2402 [UN_M.49] Statistics Division, United Nations, "Standard Country or 2403 Area Codes for Statistical Use", UN Standard Country or 2404 Area Codes for Statistical Use, Revision 4 (United Nations 2405 publication, Sales No. 98.XVII.9, June 1999. 2407 9.2 Informative References 2409 [RFC1766] Alvestrand, H., "Tags for the Identification of 2410 Languages", RFC 1766, March 1995. 2412 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 2413 Part Three: Message Header Extensions for Non-ASCII Text", 2414 RFC 2047, November 1996. 2416 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 2417 Word Extensions: Character Sets, Languages, and 2418 Continuations", RFC 2231, November 1997. 2420 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 2421 10646", RFC 2781, February 2000. 2423 [RFC3066] Alvestrand, H., "Tags for the Identification of 2424 Languages", BCP 47, RFC 3066, January 2001. 2426 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 2427 Text on Security Considerations", BCP 72, RFC 3552, 2428 July 2003. 2430 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 2431 Standard, Version 4.1.0, defined by: The Unicode Standard, 2432 Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321- 2433 18578-1), as amended by Unicode 4.0.1 2434 (http://www.unicode.org/versions/Unicode4.0.1) and by 2435 Unicode 4.1.0 2436 (http://www.unicode.org/versions/Unicode4.1.0).", 2437 March 2005. 2439 [XML10] Bray (et al), T., "Extensible Markup Language (XML) 1.0", 2440 02 2004. 2442 [XMLSchema] 2443 Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part 2: 2444 Datatypes Second Edition", 10 2004, < 2445 http://www.w3.org/TR/xmlschema-2/>. 2447 [initial-registry] 2448 Ewell, D., Ed., "Initial Language Subtag Registry", 2449 June 2005, . 2452 [iso639.principles] 2453 ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory 2454 Committee: Working principles for ISO 639 maintenance", 2455 March 2000, 2456 . 2459 [record-jar] 2460 Raymond, E., "The Art of Unix Programming", 2003, 2461 . 2463 Authors' Addresses 2465 Addison Phillips (editor) 2466 Quest Software 2468 Email: addison.phillips@quest.com 2469 URI: http://www.inter-locale.com 2471 Mark Davis (editor) 2472 IBM 2474 Email: mark.davis@us.ibm.com 2476 Appendix A. Acknowledgements 2478 Any list of contributors is bound to be incomplete; please regard the 2479 following as only a selection from the group of people who have 2480 contributed to make this document what it is today. 2482 The contributors to RFC 3066 and RFC 1766, the precursors of this 2483 document, made enormous contributions directly or indirectly to this 2484 document and are generally responsible for the success of language 2485 tags. 2487 The following people (in alphabetical order) contributed to this 2488 document or to RFCs 1766 and 3066: 2490 Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet, 2491 Nathaniel Borenstein, Karen Broome, Eric Brunner, Sean M. Burke, M.T. 2492 Carrasco Benitez, Jeremy Carroll, John Clews, Jim Conklin, Peter 2493 Constable, John Cowan, Mark Crispin, Dave Crocker, Elwyn Davies, 2494 Martin Duerst, Frank Ellerman, Michael Everson, Doug Ewell, Ned 2495 Freed, Tim Goodwin, Dirk-Willem van Gulik, Marion Gunn, Joel Halpren, 2496 Elliotte Rusty Harold, Paul Hoffman, Scott Hollenbeck, Richard 2497 Ishida, Olle Jarnefors, Kent Karlsson, John Klensin, Erkki 2498 Kolehmainen, Alain LaBonte, Eric Mader, Ira McDonald, Keith Moore, 2499 Chris Newman, Masataka Ohta, Dylan Pierce, Randy Presuhn, George 2500 Rhoten, Felix Sasaki, Markus Scherer, Keld Jorn Simonsen, Thierry 2501 Sourbier, Otto Stolz, Tex Texin, Andrea Vine, Rhys Weatherley, Misha 2502 Wolf, Francois Yergeau and many, many others. 2504 Very special thanks must go to Harald Tveit Alvestrand, who 2505 originated RFCs 1766 and 3066, and without whom this document would 2506 not have been possible. Special thanks must go to Michael Everson, 2507 who has served as Language Tag Reviewer for almost the complete 2508 period since the publication of RFC 1766. Special thanks to Doug 2509 Ewell, for his production of the first complete subtag registry, and 2510 his work in producing a test parser for verifying language tags. 2512 Appendix B. Examples of Language Tags (Informative) 2514 Simple language subtag: 2516 de (German) 2518 fr (French) 2520 ja (Japanese) 2522 i-enochian (example of a grandfathered tag) 2524 Language subtag plus Script subtag: 2526 zh-Hant (Chinese written using the Traditional Chinese script) 2528 zh-Hans (Chinese written using the Simplified Chinese script) 2530 sr-Cyrl (Serbian written using the Cyrillic script) 2532 sr-Latn (Serbian written using the Latin script) 2534 Language-Script-Region: 2536 zh-Hans-CN (Chinese written using the Simplified script as used in 2537 mainland China) 2539 sr-Latn-CS (Serbian written using the Latin script as used in 2540 Serbia and Montenegro) 2542 Language-Variant: 2544 sl-rozaj (Resian dialect of Slovenian 2546 sl-nedis (Nadiza dialect of Slovenian) 2548 Language-Region-Variant: 2550 de-CH-1901 (German as used in Switzerland using the 1901 variant 2551 [orthography]) 2553 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) 2555 Language-Script-Region-Variant: 2557 sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the 2558 Latin script as used in Italy. Note that this tag is NOT 2559 RECOMMENDED because subtag 'sl' has a Suppress-Script value of 2560 'Latn') 2562 Language-Region: 2564 de-DE (German for Germany) 2566 en-US (English as used in the United States) 2568 es-419 (Spanish appropriate for the Latin America and Caribbean 2569 region using the UN region code) 2571 Private use subtags: 2573 de-CH-x-phonebk 2575 az-Arab-x-AZE-derbend 2577 Extended language subtags (examples ONLY: extended languages MUST be 2578 defined by revision or update to this document): 2580 zh-min 2582 zh-min-nan-Hant-CN 2584 Private use registry values: 2586 x-whatever (private use using the singleton 'x') 2588 qaa-Qaaa-QM-x-southern (all private tags) 2590 de-Qaaa (German, with a private script) 2592 sr-Latn-QM (Serbian, Latin-script, private region) 2594 sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro) 2596 Tags that use extensions (examples ONLY: extensions MUST be defined 2597 by revision or update to this document or by RFC): 2599 en-US-u-islamCal 2601 zh-CN-a-myExt-x-private 2602 en-a-myExt-b-another 2604 Some Invalid Tags: 2606 de-419-DE (two region tags) 2608 a-DE (use of a single character subtag in primary position; note 2609 that there are a few grandfathered tags that start with "i-" that 2610 are valid) 2612 ar-a-aaa-b-bbb-a-ccc (two extensions with same single letter 2613 prefix) 2615 Intellectual Property Statement 2617 The IETF takes no position regarding the validity or scope of any 2618 Intellectual Property Rights or other rights that might be claimed to 2619 pertain to the implementation or use of the technology described in 2620 this document or the extent to which any license under such rights 2621 might or might not be available; nor does it represent that it has 2622 made any independent effort to identify any such rights. Information 2623 on the procedures with respect to rights in RFC documents can be 2624 found in BCP 78 and BCP 79. 2626 Copies of IPR disclosures made to the IETF Secretariat and any 2627 assurances of licenses to be made available, or the result of an 2628 attempt made to obtain a general license or permission for the use of 2629 such proprietary rights by implementers or users of this 2630 specification can be obtained from the IETF on-line IPR repository at 2631 http://www.ietf.org/ipr. 2633 The IETF invites any interested party to bring to its attention any 2634 copyrights, patents or patent applications, or other proprietary 2635 rights that may cover technology that may be required to implement 2636 this standard. Please address the information to the IETF at 2637 ietf-ipr@ietf.org. 2639 Disclaimer of Validity 2641 This document and the information contained herein are provided on an 2642 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2643 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2644 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2645 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2646 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2647 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2649 Copyright Statement 2651 Copyright (C) The Internet Society (2005). This document is subject 2652 to the rights, licenses and restrictions contained in BCP 78, and 2653 except as set forth therein, the authors retain all their rights. 2655 Acknowledgment 2657 Funding for the RFC Editor function is currently provided by the 2658 Internet Society.