idnits 2.17.1 

draft-ietf-ltru-4646bis-15.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 3528.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3539.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3546.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3552.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document obsoletes RFC4646, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 9, 2008) is 5799 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-3'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-5'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646'

  ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281)

  ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226)

  ** Downref: Normative reference to an Informational RFC: RFC 2860

  ** Downref: Normative reference to an Informational RFC: RFC 4645

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX14'

  -- Obsolete informational reference (is this intentional?): RFC 1766
     (Obsoleted by RFC 3066, RFC 3282)

  -- Obsolete informational reference (is this intentional?): RFC 3066
     (Obsoleted by RFC 4646, RFC 4647)

  -- Obsolete informational reference (is this intentional?): RFC 4646
     (Obsoleted by RFC 5646)


     Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 19 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                   A. Phillips, Ed.
3	Internet-Draft                                                    Lab126
4	Obsoletes: 4646 (if approved)                              M. Davis, Ed.
5	Intended status: BCP                                              Google
6	Expires: December 11, 2008                                  June 9, 2008

8	                     Tags for Identifying Languages
9	                       draft-ietf-ltru-4646bis-15

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on December 11, 2008.

36	Abstract

38	   This document describes the structure, content, construction, and
39	   semantics of language tags for use in cases where it is desirable to
40	   indicate the language used in an information object.  It also
41	   describes how to register values for use in language tags and the
42	   creation of user-defined extensions for private interchange.

44	Table of Contents

46	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
47	   2.  The Language Tag . . . . . . . . . . . . . . . . . . . . . . .  5
48	     2.1.  Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .  5
49	     2.2.  Language Subtag Sources and Interpretation . . . . . . . .  8
50	       2.2.1.  Primary Language Subtag  . . . . . . . . . . . . . . .  9
51	       2.2.2.  Extended Language Subtags  . . . . . . . . . . . . . . 11
52	       2.2.3.  Script Subtag  . . . . . . . . . . . . . . . . . . . . 12
53	       2.2.4.  Region Subtag  . . . . . . . . . . . . . . . . . . . . 13
54	       2.2.5.  Variant Subtags  . . . . . . . . . . . . . . . . . . . 15
55	       2.2.6.  Extension Subtags  . . . . . . . . . . . . . . . . . . 16
56	       2.2.7.  Private Use Subtags  . . . . . . . . . . . . . . . . . 18
57	       2.2.8.  Grandfathered Registrations  . . . . . . . . . . . . . 18
58	       2.2.9.  Classes of Conformance . . . . . . . . . . . . . . . . 19
59	   3.  Registry Format and Maintenance  . . . . . . . . . . . . . . . 21
60	     3.1.  Format of the IANA Language Subtag Registry  . . . . . . . 21
61	       3.1.1.  File Format  . . . . . . . . . . . . . . . . . . . . . 21
62	       3.1.2.  Record Definitions . . . . . . . . . . . . . . . . . . 22
63	       3.1.3.  Subtag and Tag Fields  . . . . . . . . . . . . . . . . 25
64	       3.1.4.  Description Field  . . . . . . . . . . . . . . . . . . 25
65	       3.1.5.  Deprecated Field . . . . . . . . . . . . . . . . . . . 27
66	       3.1.6.  Preferred-Value Field  . . . . . . . . . . . . . . . . 27
67	       3.1.7.  Prefix Field . . . . . . . . . . . . . . . . . . . . . 29
68	       3.1.8.  Suppress-Script Field  . . . . . . . . . . . . . . . . 29
69	       3.1.9.  Macrolanguage Field  . . . . . . . . . . . . . . . . . 30
70	       3.1.10. Scope Field  . . . . . . . . . . . . . . . . . . . . . 30
71	       3.1.11. Comments Field . . . . . . . . . . . . . . . . . . . . 31
72	     3.2.  Language Subtag Reviewer . . . . . . . . . . . . . . . . . 32
73	     3.3.  Maintenance of the Registry  . . . . . . . . . . . . . . . 32
74	     3.4.  Stability of IANA Registry Entries . . . . . . . . . . . . 33
75	     3.5.  Registration Procedure for Subtags . . . . . . . . . . . . 38
76	     3.6.  Possibilities for Registration . . . . . . . . . . . . . . 42
77	     3.7.  Extensions and the Extensions Registry . . . . . . . . . . 45
78	     3.8.  Update of the Language Subtag Registry . . . . . . . . . . 48
79	   4.  Formation and Processing of Language Tags  . . . . . . . . . . 49
80	     4.1.  Choice of Language Tag . . . . . . . . . . . . . . . . . . 49
81	       4.1.1.  Tagging Encompassed Languages  . . . . . . . . . . . . 53
82	       4.1.2.  Using Extended Language Subtags  . . . . . . . . . . . 53

84	     4.2.  Meaning of the Language Tag  . . . . . . . . . . . . . . . 55
85	     4.3.  Lists of Languages . . . . . . . . . . . . . . . . . . . . 57
86	     4.4.  Length Considerations  . . . . . . . . . . . . . . . . . . 58
87	       4.4.1.  Working with Limited Buffer Sizes  . . . . . . . . . . 58
88	       4.4.2.  Truncation of Language Tags  . . . . . . . . . . . . . 59
89	     4.5.  Canonicalization of Language Tags  . . . . . . . . . . . . 60
90	     4.6.  Considerations for Private Use Subtags . . . . . . . . . . 62
91	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 63
92	     5.1.  Language Subtag Registry . . . . . . . . . . . . . . . . . 63
93	     5.2.  Extensions Registry  . . . . . . . . . . . . . . . . . . . 64
94	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 66
95	   7.  Character Set Considerations . . . . . . . . . . . . . . . . . 67
96	   8.  Changes from RFC 4646  . . . . . . . . . . . . . . . . . . . . 68
97	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 72
98	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 72
99	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 73
100	   Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 75
101	   Appendix B.  Examples of Language Tags (Informative) . . . . . . . 76
102	   Appendix C.  Examples of Registration Forms  . . . . . . . . . . . 79
103	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 81
104	   Intellectual Property and Copyright Statements . . . . . . . . . . 82

106	1.  Introduction

108	   Human beings on our planet have, past and present, used a number of
109	   languages.  There are many reasons why one would want to identify the
110	   language used when presenting or requesting information.

112	   A user's language preferences often need to be identified so that
113	   appropriate processing can be applied.  For example, the user's
114	   language preferences in a Web browser can be used to select Web pages
115	   appropriately.  Language preferences can also be used to select among
116	   tools (such as dictionaries) to assist in the processing or
117	   understanding of content in different languages.

119	   In addition, knowledge about the particular language used by some
120	   piece of information content might be useful or even required by some
121	   types of processing; for example, spell-checking, computer-
122	   synthesized speech, Braille transcription, or high-quality print
123	   renderings.

125	   One means of indicating the language used is by labeling the
126	   information content with an identifier or "tag".  These tags can be
127	   used to specify user preferences when selecting information content,
128	   or for labeling additional attributes of content and associated
129	   resources.

131	   Tags can also be used to indicate additional language attributes of
132	   content.  For example, indicating specific information about the
133	   dialect, writing system, or orthography used in a document or
134	   resource may enable the user to obtain information in a form that
135	   they can understand, or it can be important in processing or
136	   rendering the given content into an appropriate form or style.

138	   This document specifies a particular identifier mechanism (the
139	   language tag) and a registration function for values to be used to
140	   form tags.  It also defines a mechanism for private use values and
141	   future extension.

143	   This document replaces [RFC4646], which replaced [RFC3066] and its
144	   predecessor [RFC1766].  For a list of changes in this document, see
145	   Section 8.

147	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
148	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
149	   document are to be interpreted as described in [RFC2119].

151	2.  The Language Tag

153	   Language tags are used to help identify languages, whether spoken,
154	   written, signed, or otherwise signaled, for the purpose of
155	   communication.  This includes constructed and artificial languages,
156	   but excludes languages not intended primarily for human
157	   communication, such as programming languages.

159	2.1.  Syntax

161	   The language tag is composed of one or more parts, known as
162	   "subtags".  Each subtag consists of a sequence of alphanumeric
163	   characters.  Subtags are distinguished and separated from one another
164	   by a hyphen ("-", ABNF [RFC5234] %x2D).  Usually a language tag
165	   contains a "primary language" subtag, followed by a (possibly empty)
166	   series of subsequent subtags, each of which refines or narrows the
167	   range of languages identified by the overall tag.

169	   Most subtags are distinguished by length, position in the tag, and
170	   content: subtags can be recognized solely by these features.  This
171	   makes it possible to construct a parser that can extract and assign
172	   some semantic information to the subtags, even if the specific subtag
173	   values are not recognized.  Thus, a parser need not have a list of
174	   valid tags or subtags (that is, a copy of some version of the IANA
175	   Language Subtag Registry) in order to perform common searching and
176	   matching operations.  The grandfathered tags registered under RFC
177	   3066 [RFC3066], a fixed list that can never change, are the only
178	   exception to this ability to infer meaning from subtag structure.

180	   The syntax of the language tag in ABNF [RFC5234] is:

182	   Language-Tag  = langtag
183	                 / privateuse             ; private use tag
184	                 / irregular              ; tags grandfathered by rule

186	   langtag       = (language
187	                    ["-" script]
188	                    ["-" region]
189	                    *("-" variant)
190	                    *("-" extension)
191	                    ["-" privateuse])

193	   language      = 2*3ALPHA               ; shortest ISO 639 code
194	                   [extlang]              ; sometimes followed by
195	                                          ;   extended language subtags
196	                 / 4ALPHA                 ; reserved for future use
197	                 / 5*8ALPHA               ; registered language subtag

199	   extlang       = "-" 3ALPHA             ; selected ISO 639 codes
200	                   *2[ "-" 3ALPHA]        ; permanently reserved

202	   script        = 4ALPHA                 ; ISO 15924 code

204	   region        = 2ALPHA                 ; ISO 3166-1 code
205	                 / 3DIGIT                 ; UN M.49 code

207	   variant       = 5*8alphanum            ; registered variants
208	                 / (DIGIT 3alphanum)

210	   extension     = singleton 1*("-" (2*8alphanum))

212	   singleton     = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT
213	                 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9"
214	                 ; Single alphanumerics
215	                 ; "x" is reserved for private use

217	   privateuse    = "x" 1*("-" (1*8alphanum))

219	   irregular     = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default"
220	                 / "i-enochian" / "i-hak" / "i-klingon" / "i-lux"
221	                 / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao"
222	                 / "i-tay" / "i-tsu" / "sgn-BE-FR" / "sgn-BE-NL"
223	                 / "sgn-CH-DE"

225	   alphanum      = (ALPHA / DIGIT)       ; letters and numbers

227	                        Figure 1: Language Tag ABNF

229	   All subtags have a maximum length of eight characters and whitespace
230	   is not permitted in a language tag.  There is a subtlety in the ABNF
231	   production 'variant': variants starting with a digit MAY be four
232	   characters long, while those starting with a letter MUST be at least
233	   five characters long.  For examples of language tags, see Appendix B.

235	   Note Well: the ABNF syntax does not distinguish between upper and
236	   lowercase.  The appearance of upper and lowercase letters in the
237	   various ABNF productions above do not affect how implementations
238	   interpret tags.  That is, the tag "I-AMI" matches the item "i-ami" in
239	   the 'irregular' production.  At all times, the tags and their
240	   subtags, including private use and extensions, are to be treated as
241	   case insensitive: there exist conventions for the capitalization of
242	   some of the subtags, but these MUST NOT be taken to carry meaning.

244	   For example:

246	   o  [ISO639-1] recommends that language codes be written in lowercase
247	      ('mn' Mongolian).

249	   o  [ISO3166-1] recommends that country codes be capitalized ('MN'
250	      Mongolia).

252	   o  [ISO15924] recommends that script codes use lowercase with the
253	      initial letter capitalized ('Cyrl' Cyrillic).

255	   However, in the tags defined by this document, the uppercase US-ASCII
256	   letters in the range 'A' through 'Z' are considered equivalent and
257	   mapped directly to their US-ASCII lowercase equivalents in the range
258	   'a' through 'z'.  Thus, the tag "mn-Cyrl-MN" is not distinct from
259	   "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of
260	   these variations conveys the same meaning: Mongolian written in the
261	   Cyrillic script as used in Mongolia.

263	   Although case distinctions do not carry meaning in language tags,
264	   consistent formatting and presentation of the tags will aid users.
265	   The format of the tags and subtags in the registry is RECOMMENDED.
266	   In this format, all subtags, including all those following singletons
267	   (that is, in extension or private-use sequences) are in lowercase.
268	   The exceptions to this are: all other non-initial two-letter subtags
269	   are uppercase and all other non-initial four-letter subtags are
270	   titlecase.

272	   Note that although [RFC5234] refers to octets, the language tags
273	   described in this document are sequences of characters from the US-
274	   ASCII [ISO646] repertoire.  Language tags MAY be used in documents
275	   and applications that use other encodings, so long as these encompass
276	   the relevant part of the US-ASCII repertoire.  An example of this
277	   would be an XML document that uses the UTF-16LE [RFC2781] encoding of

279	   [Unicode].

281	2.2.  Language Subtag Sources and Interpretation

283	   The namespace of language tags and their subtags is administered by
284	   the Internet Assigned Numbers Authority (IANA) [RFC2860] according to
285	   the rules in Section 5 of this document.  The Language Subtag
286	   Registry maintained by IANA is the source for valid subtags: other
287	   standards referenced in this section provide the source material for
288	   that registry.

290	   Terminology used in this document:

292	   o  "Tag" refers to a complete language tag, such as "sr-Latn-RS" or
293	      "az-Arab-IR".  Examples of tags in this document are enclosed in
294	      double-quotes ("en-US").

296	   o  "Subtag" refers to a specific section of a tag, delimited by
297	      hyphen, such as the subtag 'Hant' in "zh-Hant-CN".  Examples of
298	      subtags in this document are enclosed in single quotes ('Hant').

300	   o  "Code" refers to values defined in external standards (and which
301	      are used as subtags in this document).  For example, 'Hant' is an
302	      [ISO15924] script code that was used to define the 'Hant' script
303	      subtag for use in a language tag.  Examples of codes in this
304	      document are enclosed in single quotes ('en', 'Hant').

306	   The definitions in this section apply to the various subtags within
307	   the language tags defined by this document, excepting those
308	   "grandfathered" tags defined in Section 2.2.8.

310	   Language tags are designed so that each subtag type has unique length
311	   and content restrictions.  These make identification of the subtag's
312	   type possible, even if the content of the subtag itself is
313	   unrecognized.  This allows tags to be parsed and processed without
314	   reference to the latest version of the underlying standards or the
315	   IANA registry and makes the associated exception handling when
316	   parsing tags simpler.

318	   Subtags in the IANA registry that do not come from an underlying
319	   standard can only appear in specific positions in a tag.
320	   Specifically, they can only occur as primary language subtags or as
321	   variant subtags.

323	   Note that sequences of private use and extension subtags MUST occur
324	   at the end of the sequence of subtags and MUST NOT be interspersed
325	   with subtags defined elsewhere in this document.

327	   Single-letter and single-digit subtags are reserved for current or
328	   future use.  These include the following current uses:

330	   o  The single-letter subtag 'x' is reserved to introduce a sequence
331	      of private use subtags.  The interpretation of any private use
332	      subtags is defined solely by private agreement and is not defined
333	      by the rules in this section or in any standard or registry
334	      defined in this document.

336	   o  All other single-letter subtags are reserved to introduce
337	      standardized extension subtag sequences as described in
338	      Section 3.7.

340	   o  The single-letter subtag 'i' is used by some grandfathered tags,
341	      such as "i-default", where it always appears in the first position
342	      and cannot be confused with an extension.

344	2.2.1.  Primary Language Subtag

346	   The primary language subtag is the first subtag in a language tag
347	   (with the exception of private use and certain grandfathered tags)
348	   and cannot be omitted.  The following rules apply to the primary
349	   language subtag:

351	   1.  All two-character primary language subtags were defined in the
352	       IANA registry according to the assignments found in the standard
353	       "ISO 639-1:2002, Codes for the representation of names of
354	       languages -- Part 1: Alpha-2 code" [ISO639-1], or using
355	       assignments subsequently made by the ISO 639-1 registration
356	       authority (RA) or governing standardization bodies.

358	   2.  All three-character primary language subtags in the IANA registry
359	       were defined according to the assignments found in one of these
360	       additional ISO 639 parts or assignments subsequently made by the
361	       relevant ISO 639 registration authorities or governing
362	       standardization bodies:

364	       A.  "ISO 639-2:1998 - Codes for the representation of names of
365	           languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2]

367	       B.  "ISO 639-3:2007 - Codes for the representation of names of
368	           languages -- Part 3: Alpha-3 code for comprehensive coverage
369	           of languages" [ISO639-3]

371	       C.  "ISO 639-5:2008 - Codes for the representation of names of
372	           languages -- Part 5: Alpha-3 code for language families and
373	           groups" [ISO639-5]

375	   3.  The subtags in the range 'qaa' through 'qtz' are reserved for
376	       private use in language tags.  These subtags correspond to codes
377	       reserved by ISO 639-2 for private use.  These codes MAY be used
378	       for non-registered primary language subtags (instead of using
379	       private use subtags following 'x-').  Please refer to Section 4.6
380	       for more information on private use subtags.

382	   4.  All four-character language subtags are reserved for possible
383	       future standardization.

385	   5.  All language subtags of 5 to 8 characters in length in the IANA
386	       registry were defined via the registration process in Section 3.5
387	       and MAY be used to form the primary language subtag.  At the time
388	       this document was created, there were no examples of this kind of
389	       subtag and future registrations of this type will be discouraged:
390	       primary languages are strongly RECOMMENDED for registration with
391	       ISO 639, and proposals rejected by ISO 639/RA-JAC will be closely
392	       scrutinized before they are registered with IANA.

394	   6.  The single-character subtag 'x' as the primary subtag indicates
395	       that the language tag consists solely of subtags whose meaning is
396	       defined by private agreement.  For example, in the tag "x-fr-CH",
397	       the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the
398	       French language or the country of Switzerland (or any other value
399	       in the IANA registry) unless there is a private agreement in
400	       place to do so.  See Section 4.6.

402	   7.  The single-character subtag 'i' is used by some grandfathered
403	       tags (see Section 2.2.8) such as "i-klingon" and "i-bnn".  (Other
404	       grandfathered tags have a primary language subtag in their first
405	       position.)

407	   8.  Other values MUST NOT be assigned to the primary subtag except by
408	       revision or update of this document.

410	   Note: For languages that have both an ISO 639-1 two-character code
411	   and a three character code (assigned by ISO 639-2, ISO 639-3, or ISO
412	   639-5), only the ISO 639-1 two-character code is defined in the IANA
413	   registry.

415	   Note: For languages that have no ISO 639-1 two-character code and for
416	   which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
417	   (Bibliographic) codes differ, only the Terminology code is defined in
418	   the IANA registry.  At the time this document was created, all
419	   languages that had both kinds of three-character code were also
420	   assigned a two-character code; it is expected that future assignments
421	   of this nature will not occur.

423	   Note: To avoid problems with versioning and subtag choice as
424	   experienced during the transition between RFC 1766 and RFC 3066, as
425	   well as the canonical nature of subtags defined by this document, the
426	   ISO 639 Registration Authority Joint Advisory Committee (ISO 639/
427	   RA-JAC) has included the following statement in [iso639.prin]:

429	      "A language code already in ISO 639-2 at the point of freezing ISO
430	      639-1 shall not later be added to ISO 639-1.  This is to ensure
431	      consistency in usage over time, since users are directed in
432	      Internet applications to employ the alpha-3 code when an alpha-2
433	      code for that language is not available."

435	   In order to avoid instability in the canonical form of tags, if a
436	   two-character code is added to ISO 639-1 for a language for which a
437	   three-character code was already included in either ISO 639-2 or ISO
438	   639-3, the two-character code MUST NOT be registered.  See
439	   Section 3.4.

441	   For example, if some content were tagged with 'haw' (Hawaiian), which
442	   currently has no two-character code, the tag would not be invalidated
443	   if ISO 639-1 were to assign a two-character code to the Hawaiian
444	   language at a later date.

446	   Note: An example of independent primary language subtag registration
447	   might include: one of the grandfathered IANA registrations is
448	   "i-enochian".  The subtag 'enochian' could be registered in the IANA
449	   registry as a primary language subtag (assuming that ISO 639 does not
450	   register this language first), making tags such as "enochian-AQ" and
451	   "enochian-Latn" valid.

453	2.2.2.  Extended Language Subtags

455	   Extended language subtags are used to identify certain specially-
456	   selected languages that, for various historical reasons, are closely
457	   identified with an existing primary language subtag.  Extended
458	   language subtags are always used with their enclosing primary
459	   language subtag (indicated with a 'Prefix' field in the registry)
460	   when used to form the language tag.  All languages that have an
461	   extended language subtag in the registry also have an identical
462	   primary language subtag record in the registry.  This primary
463	   language subtag is RECOMMENDED for forming the language tag.  The
464	   following rules apply to the extended language subtags:

466	   1.  Extended language subtags consist solely of three-letter subtags.
467	       All extended language subtag records defined in the registry were
468	       defined in the IANA registry according to the assignments found
469	       in [ISO639-3].  Language collections and groupings, such as
470	       defined in [ISO639-5] are specifically excluded from being
471	       extended language subtags.

473	   2.  Extended language subtag records MUST include exactly one
474	       'Prefix' field indicating an appropriate subtag or sequence of
475	       subtags for that extended language subtag.

477	   3.  Extended language subtag records MUST include a 'Preferred-Value'
478	       and 'Deprecated' field.  The 'Preferred-Value' and 'Subtag'
479	       fields MUST be identical.

481	   4.  Although the ABNF production 'extlang' permits up to three
482	       extended language tags in the language tag, extended language
483	       subtags MUST NOT include another extended language subtag in
484	       their Prefix.  That is, the second and third extended language
485	       subtag positions in a language tag are permanently reserved and
486	       tags that include subtags in that position are invalid.

488	   For example, the macrolanguage Chinese ('zh') encompasses a number of
489	   languages.  For compatibility reasons, each of these languages has
490	   both a primary and extended language subtag in the registry.  Some
491	   examples of these include Gan Chinese ('gan'), Cantonese Chinese
492	   ('yue') and Mandarin Chinese ('cmn').  Each is encompassed by the
493	   macrolanguage 'zh' (Chinese).  Therefore, they each have the prefix
494	   "zh" in their registry records.  Thus Gan Chinese is represented with
495	   tags beginning "zh-gan" or "gan"; Cantonese with tags beginning
496	   either "yue" or "zh-yue"; and Mandarin Chinese with "zh-cmn" or
497	   "cmn".  The language subtag 'zh' can still be used without an
498	   extended language subtag to label a resource as some unspecified
499	   variety of Chinese, while the primary language subtag ('gan', 'yue',
500	   'cmn') is preferred to using the extended language form ("zh-gan",
501	   "zh-yue", "zh-cmn").

503	2.2.3.  Script Subtag

505	   Script subtags are used to indicate the script or writing system
506	   variations that distinguish the written forms of a language or its
507	   dialects.  The following rules apply to the script subtags:

509	   1.  Script subtags MUST follow the primary language subtag and MUST
510	       precede any other type of subtag.

512	   2.  All four-character subtags were defined according to
513	       [ISO15924]--"Codes for the representation of the names of
514	       scripts": alpha-4 script codes, or subsequently assigned by the
515	       ISO 15924 registration authority or governing standardization
516	       bodies, denoting the script or writing system used in conjunction
517	       with this language.

519	   3.  The script subtags 'Qaaa' through 'Qabx' are reserved for private
520	       use in language tags.  These subtags correspond to codes reserved
521	       by ISO 15924 for private use.  These codes MAY be used for non-
522	       registered script values.  Please refer to Section 4.6 for more
523	       information on private use subtags.

525	   4.  Script subtags MUST NOT be registered using the process in
526	       Section 3.5 of this document.  Variant subtags MAY be considered
527	       for registration for that purpose.

529	   5.  There MUST be at most one script subtag in a language tag, and
530	       the script subtag SHOULD be omitted when it adds no
531	       distinguishing value to the tag or when the primary language
532	       subtag's record includes a Suppress-Script field listing the
533	       applicable script subtag.

535	   Example: "sr-Latn" represents Serbian written using the Latin script.

537	2.2.4.  Region Subtag

539	   Region subtags are used to indicate linguistic variations associated
540	   with or appropriate to a specific country, territory, or region.
541	   Typically, a region subtag is used to indicate regional dialects or
542	   usage, or region-specific spelling conventions.  A region subtag can
543	   also be used to indicate that content is expressed in a way that is
544	   appropriate for use throughout a region, for instance, Spanish
545	   content tailored to be useful throughout Latin America.

547	   The following rules apply to the region subtags:

549	   1.  Region subtags MUST follow any language or script subtags and
550	       MUST precede any other type of subtag.

552	   2.  All two-character subtags following the primary subtag were
553	       defined in the IANA registry according to the assignments found
554	       in [ISO3166-1] ("Codes for the representation of names of
555	       countries and their subdivisions -- Part 1: Country codes") using
556	       the list of alpha-2 country codes, or using assignments
557	       subsequently made by the ISO 3166-1 maintenance agency or
558	       governing standardization bodies.  In addition, the codes that
559	       are "exceptionally reserved" (as opposed to "assigned") in ISO
560	       3166-1 were also defined in the registry, with the exception of
561	       'UK', which is an exact synonym for the assigned code 'GB'.

563	   3.  All three-character subtags consisting of digit (numeric)
564	       characters following the primary subtag were defined in the IANA
565	       registry according to the assignments found in UN Standard
566	       Country or Area Codes for Statistical Use [UN_M.49] or
567	       assignments subsequently made by the governing standards body.
568	       Note that not all of the UN M.49 codes are defined in the IANA
569	       registry.  The following rules define which codes are entered
570	       into the registry as valid subtags:

572	       A.  UN numeric codes assigned to 'macro-geographical
573	           (continental)' or sub-regions MUST be registered in the
574	           registry.  These codes are not associated with an assigned
575	           ISO 3166-1 alpha-2 code and represent supra-national areas,
576	           usually covering more than one nation, state, province, or
577	           territory.

579	       B.  UN numeric codes for 'economic groupings' or 'other
580	           groupings' MUST NOT be registered in the IANA registry and
581	           MUST NOT be used to form language tags.

583	       C.  When ISO 3166-1 reassigns a code formerly used for one
584	           country or area to another country or area and that code
585	           already is present in the registry, the UN numeric code for
586	           that country or area MUST be registered in the registry as
587	           described in Section 3.4 and MUST be used to form language
588	           tags that represent the country or region for which it is
589	           defined (rather than the recycled ISO 3166-1 code).

591	       D.  UN numeric codes for countries or areas for which there is an
592	           associated ISO 3166-1 alpha-2 code in the registry MUST NOT
593	           be entered into the registry and MUST NOT be used to form
594	           language tags.  Note that the ISO 3166-based subtag in the
595	           registry MUST actually be associated with the UN M.49 code in
596	           question.

598	       E.  UN numeric codes and ISO 3166-1 alpha-2 codes for countries
599	           or areas listed as eligible for registration in [RFC4645] but
600	           not presently registered MAY be entered into the IANA
601	           registry via the process described in Section 3.5.  Once
602	           registered, these codes MAY be used to form language tags.

604	       F.  All other UN numeric codes for countries or areas that do not
605	           have an associated ISO 3166-1 alpha-2 code MUST NOT be
606	           entered into the registry and MUST NOT be used to form
607	           language tags.  For more information about these codes, see
608	           Section 3.4.

610	   4.  Note: The alphanumeric codes in Appendix X of the UN document
611	       MUST NOT be entered into the registry and MUST NOT be used to
612	       form language tags.  (At the time this document was created,
613	       these values matched the ISO 3166-1 alpha-2 codes.)

615	   5.  There MUST be at most one region subtag in a language tag and the
616	       region subtag MAY be omitted, as when it adds no distinguishing
617	       value to the tag.

619	   6.  The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
620	       reserved for private use in language tags.  These subtags
621	       correspond to codes reserved by ISO 3166 for private use.  These
622	       codes MAY be used for private use region subtags (instead of
623	       using a private use subtag sequence).  Please refer to
624	       Section 4.6 for more information on private use subtags.

626	   "de-AT" represents German ('de') as used in Austria ('AT').

628	   "sr-Latn-RS" represents Serbian ('sr') written using Latin script
629	   ('Latn') as used in Serbia ('RS').

631	   "es-419" represents Spanish ('es') appropriate to the UN-defined
632	   Latin America and Caribbean region ('419').

634	2.2.5.  Variant Subtags

636	   Variant subtags are used to indicate additional, well-recognized
637	   variations that define a language or its dialects that are not
638	   covered by other available subtags.  The following rules apply to the
639	   variant subtags:

641	   1.  Variant subtags MUST follow any language, script, or region
642	       subtags, but MUST precede any extension or private use subtag
643	       sequences.

645	   2.  Variant subtags, as a collection, are not associated with any
646	       particular external standard.  The meaning of variant subtags in
647	       the registry is defined in the course of the registration process
648	       defined in Section 3.5.  Note that any particular variant subtag
649	       might be associated with some external standard.  However,
650	       association with a standard is not required for registration.

652	   3.  More than one variant MAY be used to form the language tag.

654	   4.  Variant subtags MUST be registered with IANA according to the
655	       rules in Section 3.5 of this document before being used to form
656	       language tags.  In order to distinguish variants from other types
657	       of subtags, registrations MUST meet the following length and
658	       content restrictions:

660	       1.  Variant subtags that begin with a letter (a-z, A-Z) MUST be
661	           at least five characters long.

663	       2.  Variant subtags that begin with a digit (0-9) MUST be at
664	           least four characters long.

666	   5.  The same variant subtag MUST NOT be used more than once within a
667	       language tag.

669	       *  For example, the tag "de-DE-1901-1901" is not valid.

671	   Variant subtag records in the language subtag registry MAY include
672	   one or more 'Prefix' fields.  The 'Prefix' indicates a sequence of
673	   subtags that would make a suitable prefix (with other subtags, as
674	   appropriate) in forming a language tag with the variant.  That is,
675	   each of the subtags in the prefix SHOULD appear, in order, before the
676	   variant.  For example, the subtag 'nedis' has a Prefix of "sl",
677	   making it suitable for forming language tags such as "sl-nedis" and
678	   "sl-IT-nedis", but not suitable for use in a tag such as "zh-nedis"
679	   or "it-IT-nedis".

681	   "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian.

683	   "de-CH-1996" represents German as used in Switzerland and as written
684	   using the spelling reform beginning in the year 1996 C.E.

686	   Most variants that share a prefix are mutually exclusive.  For
687	   example, the German orthographic variations '1996' and '1901' SHOULD
688	   NOT be used in the same tag, as they represent the dates of different
689	   spelling reforms.  A variant that can meaningfully be used in
690	   combination with another variant SHOULD include a 'Prefix' field in
691	   its registry record that lists that other variant.  For example, if
692	   another German variant 'example' were created that made sense to use
693	   with '1996', then 'example' should include two Prefix fields: "de"
694	   and "de-1996".

696	2.2.6.  Extension Subtags

698	   Extensions provide a mechanism for extending language tags for use in
699	   various applications.  They are intended to identify information
700	   which is commonly used in association with languages or language
701	   tags, but which is not part of language identification.  See
702	   Section 3.7.  The following rules apply to extensions:

704	   1.   An extension MUST follow at least a primary language subtag.
705	        That is, a language tag cannot begin with an extension.
706	        Extensions extend language tags, they do not override or replace
707	        them.  For example, "a-value" is not a well-formed language tag,
708	        while "de-a-value" is.

710	   2.   Extension subtags are separated from the other subtags defined
711	        in this document by a single-character subtag ("singleton").
712	        The singleton MUST be one allocated to a registration authority
713	        via the mechanism described in Section 3.7 and MUST NOT be the
714	        letter 'x', which is reserved for private use subtag sequences.

716	   3.   Note: Private use subtag sequences starting with the singleton
717	        subtag 'x' are described in Section 2.2.7 below.

719	   4.   Each singleton subtag MUST appear at most one time in each tag
720	        (other than as a private use subtag).  That is, singleton
721	        subtags MUST NOT be repeated.  For example, the tag "en-a-bbb-a-
722	        ccc" is invalid because the subtag 'a' appears twice.  Note that
723	        the tag "en-a-bbb-x-a-ccc" is valid because the second
724	        appearance of the singleton 'a' is in a private use sequence.

726	   5.   Extension subtags MUST meet all of the requirements for the
727	        content and format of subtags defined in this document.

729	   6.   Extension subtags MUST meet whatever requirements are set by the
730	        document that defines their singleton prefix and whatever
731	        requirements are provided by the maintaining authority.

733	   7.   Each extension subtag MUST be from two to eight characters long
734	        and consist solely of letters or digits, with each subtag
735	        separated by a single '-'.

737	   8.   Each singleton MUST be followed by at least one extension
738	        subtag.  For example, the tag "tlh-a-b-foo" is invalid because
739	        the first singleton 'a' is followed immediately by another
740	        singleton 'b'.

742	   9.   Extension subtags MUST follow all language, script, region, and
743	        variant subtags in a tag.

745	   10.  All subtags following the singleton and before another singleton
746	        are part of the extension.  Example: In the tag "fr-a-Latn", the
747	        subtag 'Latn' does not represent the script subtag 'Latn'
748	        defined in the IANA Language Subtag Registry.  Its meaning is
749	        defined by the extension 'a'.

751	   11.  In the event that more than one extension appears in a single
752	        tag, the tag SHOULD be canonicalized as described in
753	        Section 4.5.

755	   For example, if the prefix singleton 'r' and the shown subtags were
756	   defined, then the following tag would be a valid example: "en-Latn-
757	   GB-boont-r-extended-sequence-x-private"

759	2.2.7.  Private Use Subtags

761	   Private use subtags are used to indicate distinctions in language
762	   important in a given context by private agreement.  The following
763	   rules apply to private use subtags:

765	   1.  Private use subtags are separated from the other subtags defined
766	       in this document by the reserved single-character subtag 'x'.

768	   2.  Private use subtags MUST conform to the format and content
769	       constraints defined in the ABNF for all subtags.

771	   3.  Private use subtags MUST follow all language, script, region,
772	       variant, and extension subtags in the tag.  Another way of saying
773	       this is that all subtags following the singleton 'x' MUST be
774	       considered private use.  Example: The subtag 'US' in the tag "en-
775	       x-US" is a private use subtag.

777	   4.  A tag MAY consist entirely of private use subtags.

779	   5.  No source is defined for private use subtags.  Use of private use
780	       subtags is by private agreement only.

782	   6.  Private use subtags are NOT RECOMMENDED where alternatives exist
783	       or for general interchange.  See Section 4.6 for more information
784	       on private use subtag choice.

786	   For example: The Unicode Consortium defines a set of private use
787	   extensions in LDML ([UTS35], Locale Data Markup Language, the Unicode
788	   standard for defining locale data) such as in the tag "es-419-x-ldml-
789	   collatio-traditio", which indicates Latin American Spanish with
790	   traditional order for sorted lists.

792	2.2.8.  Grandfathered Registrations

794	   Prior to RFC 4646, whole language tags were registered according to
795	   the rules in RFC 1766 and/or RFC 3066.  These registered tags
796	   maintain their validity.  Of those tags, those that were made
797	   obsolete or redundant by the advent of RFC 4646, by this document, or
798	   by subsequent registration of subtags are maintained in the registry
799	   in records as "redundant" records.  Those tags that do not match the
800	   'langtag' production in the ABNF in this document or that contain
801	   subtags that do not individually appear in the registry are
802	   maintained in the registry in records of the "grandfathered" type.

804	   Grandfathered tags contain one or more subtags that are not defined
805	   in the Language Subtag Registry (see Section 3).  Redundant tags
806	   consist entirely of subtags defined above and whose independent
807	   registration was superseded by [RFC4646].  For more information see
808	   Section 3.8.

810	   Some grandfathered tags are "regular" in that they match the
811	   'langtag' production in Figure 1.  In some cases, these tags could
812	   become redundant if their (currently unregistered) subtags were to be
813	   registered (as variants, for example).  In other cases, although the
814	   subtags match the language tag pattern, the meaning assigned to the
815	   various subtags is prohibited by rules elsewhere in this document.
816	   Those tags can never become redundant.

818	   The remaining grandfathered tags are "irregular" and do not match the
819	   'langtag' production.  These are listed in the 'irregular' production
820	   in Figure 1.  These grandfathered tags can never become redundant.
821	   Many of these tags have been superseded by other registrations: their
822	   record contains a Preferred-Value field that really ought to be used
823	   to form language tags representing that value.

825	2.2.9.  Classes of Conformance

827	   Implementations sometimes need to describe their capabilities with
828	   regard to the rules and practices described in this document.  Tags
829	   can be checked or verified in a number of ways, but two particular
830	   classes of tag conformance are formally defined here.

832	   A tag is considered "well-formed" if it conforms to the ABNF
833	   (Section 2.1).  Language tags may be well-formed in terms of syntax
834	   but not valid in terms of content.  Irregular grandfathered tags are
835	   now listed in the 'irregular' production.

837	   A tag is considered "valid" if it satisfies these conditions:

839	   o  The tag is well-formed.

841	   o  The tag is either a grandfathered tag, or all of its language,
842	      extlang, script, region, and variant subtags appear in the IANA
843	      language subtag registry as of the particular registry date.

845	   o  There are no duplicate singleton (extension) subtags and no
846	      duplicate variant subtags.

848	   Note that a tag's validity depends on the date of the registry used
849	   to validate the tag.  A more recent copy of the registry might
850	   contain a subtag that an older version does not.

852	   A tag is considered "valid" for a given extension (Section 3.7) (as
853	   of a particular version, revision, and date) if it meets the criteria
854	   for "valid" above and also satisfies this condition:

856	      Each subtag used in the extension part of the tag is valid
857	      according to the extension.

859	   Older language tag implementations sometimes reference [RFC3066].
860	   Again, all valid tags under that version also match this document's
861	   language tag ABNF.  However, a wider array of tags could be
862	   considered "well-formed" under that document.  The 'Language-Tag'
863	   production used in that document matches the following:

865	       obs-language-tag = primary-subtag *( "-" subtag )
866	       primary-subtag = 1*8ALPHA
867	       subtag = 1*8(ALPHA / DIGIT)

869	                  Figure 2: RFC 3066 Language Tag Syntax

871	   Subtags designated for private use or private-use sequences
872	   introduced by the 'x' subtag are available for cases in which no
873	   assigned subtags are available and registration is not a suitable
874	   option.  For example, one might use a tag such as "no-QQ", where 'QQ'
875	   is one of a range of private-use ISO 3166-1 codes to indicate an
876	   otherwise-undefined region.  Users MUST NOT assign and use subtags
877	   that do not appear in the registry other than in private-use
878	   sequences (such the subtag 'personal' in the tag "en-x-personal").
879	   Not only is such assignment nonconformant, it also risks collision
880	   with a future possible assignment or registration.

882	   Note well: although the 'Language-Tag' production appearing in this
883	   document is functionally equivalent to the one in [RFC4646], it has
884	   been changed to prevent certain errors in well-formedness arising
885	   from the old 'grandfathered' production.  This version of the ABNF is
886	   RECOMMENDED as a replacement for the older version.

888	3.  Registry Format and Maintenance

890	   This section defines the Language Subtag Registry and the maintenance
891	   and update procedures associated with it, as well as a registry for
892	   extensions to language tags (Section 3.7).

894	   The Language Subtag Registry contains a comprehensive list of all of
895	   the subtags valid in language tags.  This allows implementers a
896	   straightforward and reliable way to validate language tags.  The
897	   Language Subtag Registry will be maintained so that, except for
898	   extension subtags, it is possible to validate all of the subtags that
899	   appear in a language tag under the provisions of this document or its
900	   revisions or successors.  In addition, the meaning of the various
901	   subtags will be unambiguous and stable over time.  (The meaning of
902	   private use subtags, of course, is not defined by the IANA registry.)

904	3.1.  Format of the IANA Language Subtag Registry

906	   The IANA Language Subtag Registry ("the registry") is a machine-
907	   readable file in the format described in this section, plus copies of
908	   the registration forms approved in accordance with the process
909	   described in Section 3.5.

911	   Note: The existing registration forms for grandfathered and redundant
912	   tags taken from RFC 3066 have been maintained as part of the obsolete
913	   RFC 3066 registry.  The subtags added to the registry by either
914	   [RFC4645] or [registry-update] do not have separate registration
915	   forms (so no forms are archived for these additions).

917	3.1.1.  File Format

919	   The registry consists of a series of records stored in the record-jar
920	   format (described in [record-jar]).  Each record, in turn, consists
921	   of a series of fields that describe the various subtags and tags.
922	   The registry is a Unicode [Unicode] text file, using the UTF-8
923	   [RFC3629] character encoding.

925	   Each field can be considered a single, logical line of Unicode
926	   [Unicode] characters, comprising a field-name and a field-body
927	   separated by a COLON character (%x3A).  Each field is terminated by
928	   the newline sequence CRLF.  The text in each field MUST be in Unicode
929	   Normalization Form C (NFC).

931	   A collection of fields forms a 'record'.  Records are separated by
932	   lines containing only the sequence "%%" (%x25.25).

934	   Although fields are logically a single line of text, each line of
935	   text in the file format is limited to 72 bytes in length.  To
936	   accommodate this, the field-body can be split into a multiple-line
937	   representation; this is called "folding".  Folding is done according
938	   to customary conventions for line-wrapping.  This is typically on
939	   whitespace boundaries, but can occur between other characters when
940	   the value does not include spaces, such as when a language does not
941	   use whitespace between words.  In any event, there MUST NOT be breaks
942	   inside a multibyte UTF-8 sequence nor in the middle of a combining
943	   character sequence.  For more information, see [UAX14].

945	   Although the file format uses the UTF-8 encoding fields are
946	   restricted to the printable characters from the US-ASCII [ISO646]
947	   repertoire unless otherwise indicated in the specific field
948	   description below.

950	   The format of the registry is described by the following ABNF (per
951	   [RFC5234]):

953	   registry   = record *("%%" CRLF record)
954	   record     = 1*( field-name *SP ":" *SP field-body CRLF )
955	   field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)]
956	   field-body = *([[*SP CRLF] 1*SP] 1*CHARS)
957	   CHARS      = (%x21-10FFFF)      ; Unicode code points

959	                      Figure 3: Registry Format ABNF

961	   The sequence '..' (%x2E.2E) in a field-body denotes a range of
962	   values.  Such a range represents all subtags of the same length that
963	   are in alphabetic or numeric order within that range, including the
964	   values explicitly mentioned.  For example 'a..c' denotes the values
965	   'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and
966	   '13'.

968	   All fields whose field-body contains a date value use the "full-date"
969	   format specified in [RFC3339].  For example: "2004-06-28" represents
970	   June 28, 2004, in the Gregorian calendar.

972	3.1.2.  Record Definitions

974	   There are three types of records in the registry: "File-Date",
975	   "Subtag", and "Tag" records.

977	   The first record in the registry is a "File-Date" record.  This
978	   record contains the single field whose field-name is "File-Date" (see
979	   Figure 3).  The field-body of this record contains the last
980	   modification date of this copy of the registry, making it possible to
981	   compare different versions of the registry.  The registry on the IANA
982	   website is the most current.  Versions with an older date than that
983	   one are not up-to-date.

985	   File-Date: 2004-06-28
986	   %%

988	                 Figure 4: Example of the File-Date Record

990	   Subsequent records represent either subtags or tags in the registry.
991	   "Subtag" records contain a field with a field-name of "Subtag",
992	   while, unsurprisingly, "Tag" records contain a field with a field-
993	   name of "Tag".  Each of the fields in each record MUST occur no more
994	   than once, unless otherwise noted below.  Each record MUST contain
995	   the following fields:

997	   o  'Type'

999	      *  Type's field-body MUST consist of one of the following strings:
1000	         "language", "extlang", "script", "region", "variant",
1001	         "grandfathered", and "redundant" and denotes the type of tag or
1002	         subtag.

1004	   o  Either 'Subtag' or 'Tag'

1006	      *  Subtag's field-body contains the subtag being defined.  This
1007	         field MUST only appear in records of whose 'Type' has one of
1008	         these values: "language", "extlang", "script", "region", or
1009	         "variant".

1011	      *  Tag's field-body contains a complete language tag.  This field
1012	         MUST only appear in records whose 'Type' has one of these
1013	         values: "grandfathered" or "redundant".  Note that the field-
1014	         body will always follow the 'grandfathered' production in the
1015	         ABNF in Section 2.1

1017	   o  Description

1019	      *  Description's field-body contains a non-normative description
1020	         of the subtag or tag.

1022	   o  Added

1024	      *  Added's field-body contains the date the record was registered
1025	         or, in the case of grandfathered or redundant tags, the date
1026	         the corresponding tag was registered under the rules of
1027	         [RFC1766] or [RFC3066].

1029	   Each record MAY also contain the following fields:

1031	   o  Preferred-Value
1032	      *  Preferred-Value's field body contains a canonical mapping from
1033	         this record's value to a modern equivalent that is preferred in
1034	         it's place.  Depending on the value of the 'Type' field, this
1035	         value can take different forms:

1037	         +  For fields of type 'language', 'Preferred-Value' contains
1038	            the primary language subtag that is preferred when forming
1039	            the language tag.

1041	         +  For fields of type 'script', 'region', or 'variant',
1042	            'Preferred-Value' contains the subtag of the same 'Type'
1043	            that is preferred for forming the language tag.

1045	         +  For fields of type 'extlang', 'grandfathered', or
1046	            'redundant', 'Preferred-Value' contains an extended language
1047	            range that is preferred for forming the language tag.  That
1048	            is, each of the subtags that appears in the value MUST
1049	            appear in the replacement tag; additional fields can be
1050	            included in a language tag as described elsewhere in this
1051	            document.  For example, the replacement for the
1052	            grandfathered tag "zh-min-nan" (Min Nan Chinese) is "zh-
1053	            nan", which can be used as the basis for tags such as "zh-
1054	            nan-Hant" or "zh-nan-TW".

1056	   o  Deprecated

1058	      *  Deprecated's field-body contains the date the record was
1059	         deprecated.  In some cases this value is before that of the
1060	         associated 'Added' field in the registry.

1062	   o  Prefix

1064	      *  Prefix's field-body contains a language tag with which this
1065	         subtag MAY be used to form a new language tag, perhaps with
1066	         other subtags as well.  The Prefix's subtags appear before the
1067	         subtag.  This field MUST only appear in records whose 'Type'
1068	         field-body is either 'extlang' or 'variant'.  For example, the
1069	         'Prefix' for the variant 'nedis' is 'sl', meaning that the tags
1070	         "sl-nedis" and "sl-IT-nedis" are appropriate while the tag "is-
1071	         nedis" is not.

1073	   o  Comments

1075	      *  Comments's field-body contains additional information about the
1076	         subtag, as deemed appropriate for understanding the registry
1077	         and implementing language tags using the subtag or tag.

1079	   o  Suppress-Script

1081	      *  Suppress-Script's field-body contains a script subtag that
1082	         SHOULD NOT be used to form language tags with the associated
1083	         primary language subtag.  This field MUST only appear in
1084	         records whose 'Type' field-body is 'language'.  See
1085	         Section 4.1.

1087	   o  Macrolanguage

1089	      *  Macrolanguage's field-body contains a primary language subtag
1090	         defined by ISO 639 as a "macrolanguage" that encompasses this
1091	         language subtag.  This field MUST only appear in records whose
1092	         'Type' field-body is either 'language' or 'extlang'.

1094	   o  Scope

1096	      *  Scope's field-body contains information about a primary or
1097	         extended language subtag indicating its type according to ISO
1098	         639.  The values permitted in this field are 'individual',
1099	         'macrolanguage', 'collection', or 'special'.  This field MUST
1100	         only appear in records whose 'Type' field-body is either
1101	         'language' or 'extlang'.

1103	   Future versions of this document might add additional fields to the
1104	   registry, so implementations SHOULD ignore fields found in the
1105	   registry that are not defined in this document.

1107	3.1.3.  Subtag and Tag Fields

1109	   The 'Subtag' field MUST NOT use uppercase letters to form the subtag,
1110	   with two exceptions.  Subtags whose 'Type' field is 'script' (in
1111	   other words, subtags defined by ISO 15924) MUST use titlecase.
1112	   Subtags whose 'Type' field is 'region' (in other words, the non-
1113	   numeric region subtags defined by ISO 3166-1) MUST use all uppercase.
1114	   These exceptions mirror the use of case in the underlying standards.

1116	   Each subtag in the tags contained in a 'Tag' field MUST be formatted
1117	   using the rules in the preceding paragraph.  That is, all subtags are
1118	   lowercase except for subtags that represent script or region codes.

1120	3.1.4.  Description Field

1122	   The field 'Description' contains a description of the tag or subtag
1123	   in the record.  The 'Description' field MAY appear more than once per
1124	   record, that is, there can be multiple descriptions for a given
1125	   record.  The 'Description' field MAY include the full range of
1126	   Unicode characters.  At least one of the 'Description' fields MUST be
1127	   written or transcribed into the Latin script; additional
1128	   'Description' fields MAY also include a description in a non-Latin
1129	   script.  Each 'Description' field MUST be unique, both within the
1130	   record in which it appears and for the collection of records of the
1131	   same type.  Moreover, formatting variations of the same description
1132	   MUST NOT occur in that specific record or in any other record of the
1133	   same type.  For example, while the ISO 639-1 code 'fy' contains both
1134	   the descriptions "Western Frisian" and "Frisian, Western", only one
1135	   of these descriptions appears in the registry.

1137	   The 'Description' field is used for identification purposes.  It
1138	   doesn't necessarily represent the actual native name of the item in
1139	   the record, nor are any of the descriptions guaranteed to be in any
1140	   particular language (such as English or French, for example).

1142	   For subtags taken from a source standard (such as ISO 639 or ISO
1143	   15924), the 'Description' value(s) SHOULD also be taken from the
1144	   source standard.  Multiple descriptions in the source standard MUST
1145	   be split into separate 'Description' fields.  The source standard's
1146	   descriptions MAY be edited, either prior to insertion or via the
1147	   registration process.  For fields of type 'language', the first
1148	   'Description' field appearing in the Registry corresponds whenever
1149	   possible to the Reference Name assigned by ISO 639-3.  This helps
1150	   facilitate cross-referencing between ISO 639 and the registry.

1152	   When creating or updating a record due to the action of one of the
1153	   source standards, the Language Subtag Reviewer SHOULD remove
1154	   duplicate or redundant descriptions and MAY edit descriptions to
1155	   correct irregularities in formatting (such as misspellings,
1156	   inappropriate apostrophes or other punctuation, or excessive or
1157	   missing spaces) prior to submitting the proposed record to the ietf-
1158	   languages list.

1160	   Note: Descriptions in registry entries that correspond to ISO 639,
1161	   ISO 15924, ISO 3166-1, or UN M.49 codes are intended only to indicate
1162	   the meaning of that identifier as defined in the source standard at
1163	   the time it was added to the registry.  The description does not
1164	   replace the content of the source standard itself.  The descriptions
1165	   are not intended to be the localized English names for the subtags.
1166	   Localization or translation of language tag and subtag descriptions
1167	   is out of scope of this document.

1169	   Descriptions SHOULD contain all and only that information necessary
1170	   to distinguish one subtag from others that it might be confused with.
1171	   They are not intended to provide general background information, nor
1172	   to provide all possible alternate names or designations.

1174	3.1.5.  Deprecated Field

1176	   The field 'Deprecated' MAY be added, changed, or removed from any
1177	   record via the maintenance process described in Section 3.3 or via
1178	   the registration process described in Section 3.5.  Usually, the
1179	   addition of a 'Deprecated' field is due to the action of one of the
1180	   standards bodies, such as ISO 3166, withdrawing a code.  Although
1181	   valid in language tags, subtags and tags with a 'Deprecated' field
1182	   are deprecated and validating processors SHOULD NOT generate these
1183	   subtags.  Note that a record that contains a 'Deprecated' field and
1184	   no corresponding 'Preferred-Value' field has no replacement mapping.

1186	   In some historical cases, it might not have been possible to
1187	   reconstruct the original deprecation date.  For these cases, an
1188	   approximate date appears in the registry.  Some subtags and some
1189	   grandfathered or redundant tags were deprecated before the initial
1190	   creation of the registry.  The exact rules for this appear in Section
1191	   2 of [RFC4645].  Note that these records have a 'Deprecated' field
1192	   with an earlier date then the corresponding 'Added' field!

1194	3.1.6.  Preferred-Value Field

1196	   The field 'Preferred-Value' contains a mapping between the record in
1197	   which it appears and another tag or subtag.  The value in this field
1198	   is strongly RECOMMENDED as the best choice to represent the value of
1199	   this record when selecting a language tag.  These values form three
1200	   groups:

1202	   1.  ISO 639 language codes that were later withdrawn in favor of
1203	       other codes.  These values are mostly a historical curiosity.

1205	   2.  Subtags (other than language codes) taken from codes or values
1206	       that have been withdrawn in favor of a new code.  In particular,
1207	       this applies to region subtags taken from ISO 3166-1, because
1208	       sometimes a country will change its name or administration in
1209	       such a way that warrants a new region code.  In some cases,
1210	       countries have reverted to an older name, which might already be
1211	       encoded.

1213	   3.  Tags or subtags that have become obsolete because the values they
1214	       represent were later encoded.  Many of the grandfathered or
1215	       redundant tags were later encoded by ISO 639, for example, and
1216	       fit this pattern.

1218	   Records that contain a 'Preferred-Value' field MUST also have a
1219	   'Deprecated' field.  This field contains the date on which the tag or
1220	   subtag was deprecated in favor of the preferred value.

1222	   Note that 'Preferred-Value' mappings in records of type 'region'
1223	   sometimes do not represent exactly the same meaning as the original
1224	   value.  There are many reasons for a country code to be changed, and
1225	   the effect this has on the formation of language tags will depend on
1226	   the nature of the change in question.

1228	   A 'Preferred-Value' MAY be added to, changed, or removed from records
1229	   according to the rules in Section 3.3.  Addition, modification, or
1230	   removal of a 'Preferred-Value' field in a record does not imply that
1231	   content using the affected subtag needs to be retagged.

1233	   The 'Preferred-Value' field in records of type "grandfathered" and
1234	   "redundant" contains language ranges that are strongly RECOMMENDED
1235	   for use in place of the record's value.  In many cases, these
1236	   mappings were created via deprecation of the tags during the period
1237	   before [RFC4646] was adopted.  For example, the tag "no-nyn" was
1238	   deprecated in favor of the ISO 639-1-defined language code 'nn'.

1240	   The 'Preferred-Value' field in subtag records of type "extlang" use
1241	   the same format as grandfathered or redundant tags.  This allows the
1242	   subtag to be deprecated in favor of either a single primary language
1243	   subtag or a new language-extlang sequence.

1245	   Usually the addition, removal, or change of a Preferred-Value field
1246	   for a subtag is done to reflect changes in one of the source
1247	   standards.  For example, if an ISO 3166-1 region code is deprecated
1248	   in favor of another code, that SHOULD result in the addition of a
1249	   Preferred-Value field.

1251	   Changes to one subtag MAY affect other subtags as well: when
1252	   proposing changes to the registry, the Language Subtag Reviewer will
1253	   review the registry for such effects and propose the necessary
1254	   changes using the process in Section 3.5, although anyone MAY request
1255	   such changes.  For example:

1257	      Suppose that subtag 'XX' has a Preferred-Value of 'YY'.  If 'YY'
1258	      later changes to have a Preferred-Value of 'ZZ', then the
1259	      Preferred-Value for 'XX' MUST also change to be 'ZZ'.

1261	      Suppose that a registered language subtag 'dialect' represents a
1262	      language not yet available in any part of ISO 639.  The later
1263	      addition of a corresponding language code in ISO 639 SHOULD result
1264	      in the addition of a Preferred-Value for 'dialect'.

1266	3.1.7.  Prefix Field

1268	   The 'Prefix' field contains an extended language range whose subtags
1269	   are appropriate to use with this subtag: each of the subtags in one
1270	   of the subtag's Prefix fields SHOULD appear before the variant in a
1271	   valid tag.  For example, the variant subtag '1996' has a 'Prefix'
1272	   field of "de".  This means that tags starting with the sequence "de-"
1273	   are appropriate with this subtag, so "de-Latg-1996" and "de-CH-1996"
1274	   are both acceptable, while the tag "fr-1996" is an inappropriate
1275	   choice.

1277	   The field of type 'Prefix' MUST NOT be removed from any record.  The
1278	   field-body for this type of field MAY be modified, but only if the
1279	   modification broadens the meaning of the subtag.  That is, the field-
1280	   body can be replaced only by a prefix of itself.  For example, the
1281	   Prefix "be-Latn" (Belarusian, Latin script) could be replaced by the
1282	   Prefix "be" (Belarusian) but not by the Prefix "ru-Latn" (Russian,
1283	   Latin script).

1285	   Records of type 'variant' MAY have more than one field of type
1286	   'Prefix'.  Additional fields of this type MAY be added to a 'variant'
1287	   record via the registration process.  Fields of type 'extlang' MUST
1288	   have exactly one Prefix field.

1290	   The field-body of the 'Prefix' field MUST NOT conflict with any
1291	   'Prefix' already registered for a given record.  Such a conflict
1292	   would occur when no valid tag could be constructed that would contain
1293	   the prefix, such as when two subtags each have a 'Prefix' that
1294	   contains the other subtag.  For example, suppose that the subtag
1295	   'avariant' has the prefix "es-bvariant".  Then the subtag 'bvariant'
1296	   cannot given the prefix 'avariant', for that would require a tag of
1297	   the form "es-avariant-bvariant-avariant", which would not be valid.

1299	3.1.8.  Suppress-Script Field

1301	   The field 'Suppress-Script' contains a script subtag (whose record
1302	   appears in the registry).  The field 'Suppress-Script' MUST only
1303	   appear in records whose 'Type' field-body is 'language'.  This field
1304	   MUST NOT appear more than one time in a record.  This field indicates
1305	   a script used to write the overwhelming majority of documents for the
1306	   given language.  This script code therefore adds no distinguishing
1307	   information to a language tag.  This helps ensure greater
1308	   compatibility between the language tags generated according to the
1309	   rules in this document and language tags and tag processors or
1310	   consumers based on RFC 3066 by indicating that the script subtag
1311	   SHOULD NOT be used for most documents in that language.  For example,
1312	   virtually all Icelandic documents are written in the Latin script,
1313	   making the subtag 'Latn' redundant in the tag "is-Latn".

1315	   Many language subtag records do not have a Suppress-Script field.
1316	   The lack of a Suppress-Script might indicate that the language is
1317	   customarily written in more than one script or that the language is
1318	   not customarily written at all.  It might also mean that sufficient
1319	   information was not available when the record was created and thus
1320	   remains a candidate for future registration.

1322	3.1.9.  Macrolanguage Field

1324	   The field 'Macrolanguage' contains a primary language subtag (whose
1325	   record appears in the registry).  This field indicates a language
1326	   that encompasses this subtag's language according to assignments made
1327	   by ISO 639-3.

1329	   ISO 639-3 labels some languages in the registry as "macrolanguages".
1330	   ISO 639-3 defines the term "Macrolanguage" to mean "clusters of
1331	   closely-related language varieties that [...] can be considered
1332	   distinct individual languages, yet in certain usage contexts a single
1333	   language identity for all is needed".  These correspond to codes
1334	   registered in ISO 639-2 as individual languages that were found to
1335	   correspond to more than one language in ISO 639-3.

1337	   A language contained within a macrolanguage is called an "encompassed
1338	   language".  The record for each encompassed language contains a
1339	   'Macrolanguage' field in the registry; the macrolanguages themselves
1340	   are not specially marked.  Note that some encompassed languages have
1341	   ISO 639-1 or ISO 639-2 codes.

1343	   The Macrolanguage field can only occur in records of type 'language'
1344	   or 'extlang'.  Only values assigned by ISO 639-3 will be considered
1345	   for inclusion.  Macrolanguage fields MAY be added or removed via the
1346	   normal registration process whenever ISO 639-3 defines new values or
1347	   withdraws old values.  Macrolanguages are informational, and MAY be
1348	   removed or changed if ISO 639-3 changes the values.  For more
1349	   information on the use of this field and choosing between
1350	   macrolanguage and encompassed language subtags, see Section 4.1.1.

1352	   For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn'
1353	   (Norwegian Nynorsk) each have a Macrolanguage entry of 'no'
1354	   (Norwegian).  For more information see Section 4.1.

1356	3.1.10.  Scope Field

1358	   The field 'Scope' contains classification information about a primary
1359	   or extended language subtag derived from ISO 639.  This information
1360	   can sometimes be helpful in selecting language tags, since it
1361	   indicates the purpose of the code assignment within ISO 639.  The
1362	   available values are:

1364	   o  'individual' - A langauge that is not a macrolanguage, specical
1365	      value, or collection.  That is, it is what one would normally
1366	      consider to be a language.

1368	   o  'macrolanguage' - Indicates a macrolanguage as defined by ISO
1369	      639-3.  For more information on macrolanguages, see:
1370	      Section 3.1.9.

1372	   o  'collection' - Indicates a subtag that represents a collection of
1373	      languages.  Unlike a macrolanguage, a collection can contain
1374	      languages that are only loosely related.

1376	   o  'special' - Indicates a special language code.  These are used for
1377	      special language identifcation purposes.

1379	   The Scope field MUST appear in all records of type 'language' or
1380	   'extlang'.  Records of type 'extlang' MUST NOT include a value other
1381	   than 'individual'.  Note that most of the prefixes for extended
1382	   language subtags will have a Scope of 'macrolanguage' (although some
1383	   will not).

1385	   The Scope fied MAY be modified via the registration process, should
1386	   ISO 639 change the assignment's classification.  Such a change is
1387	   expected to be rare.

1389	   For example, the primary language subtag 'zh' (Chinese) has a Scope
1390	   of 'macrolanguage', while its enclosed language 'nan' (Min Nan
1391	   Chinese) has a Scope of 'individual'.  The special value 'und'
1392	   (Undetermined) has a Scope of 'special'.  The ISO 639-5 collection
1393	   'gem' (Germanic languages) has a Scope of 'collection'.

1395	3.1.11.  Comments Field

1397	   The field 'Comments' conveys additional information about the record
1398	   and MAY appear more than once per record.  The field-body MAY include
1399	   the full range of Unicode characters and is not restricted to any
1400	   particular script.  This field MAY be inserted or changed via the
1401	   registration process and no guarantee of stability is provided.

1403	   The content of this field is not restricted, except by the need to
1404	   register the information, the suitability of the request, and by
1405	   reasonable practical size limitations.  The primary reason for the
1406	   Comments field is subtag identification: to help distinguish the
1407	   subtag from others with which it might be confused.  In particular,
1408	   large amounts of information about the use, history, or general
1409	   background of a subtag are frowned upon as these generally belong and
1410	   are encouraged in registration request forms themselves, but do not
1411	   belong in the registry record proper.

1413	3.2.  Language Subtag Reviewer

1415	   The Language Subtag Reviewer moderates the ietf-languages mailing
1416	   list, responds to requests for registration, and performs the other
1417	   registry maintenance duties described in Section 3.3.  Only the
1418	   Language Subtag Reviewer is permitted to request IANA to change,
1419	   update, or add records to the Language Subtag Registry.  The Language
1420	   Subtag Reviewer MAY delegate list moderation and other clerical
1421	   duties as needed.

1423	   The Language Subtag Reviewer is appointed by the IESG for an
1424	   indefinite term, subject to removal or replacement at the IESG's
1425	   discretion.  The IESG will solicit nominees for the position (upon
1426	   adoption of this document or upon a vacancy) and then solicit
1427	   feedback on the nominees' qualifications.  Qualified candidates
1428	   should be familiar with BCP 47 and its requirements; be willing to
1429	   fairly, responsively, and judiciously administer the registration
1430	   process; and be suitably informed about the issues of language
1431	   identification so that the reviewer can assess the claims and draw
1432	   upon the contributions of language experts and subtag requesters.

1434	   The subsequent performance or decisions of the Language Subtag
1435	   Reviewer MAY be appealed to the IESG under the same rules as other
1436	   IETF decisions (see [RFC2026]).  The IESG can reverse or overturn the
1437	   decisions of the Language Subtag Reviewer, provide guidance, or take
1438	   other appropriate actions.

1440	3.3.  Maintenance of the Registry

1442	   Maintenance of the registry requires that as codes are assigned or
1443	   withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language
1444	   Subtag Reviewer MUST evaluate each change and determine the
1445	   appropriate course of action according to the rules in this document.
1446	   Such updates follow the registration process described in
1447	   Section 3.5.  Usually the Language Subtag Reviewer will start the
1448	   process for the new or updated record by filling in the registration
1449	   form and submitting it.  If a change to one of these standards takes
1450	   place and the Language Subtag Reviewer does not do this in a timely
1451	   manner, then any interested party MAY submit the form.  Thereafter
1452	   the registration process continues normally.

1454	   Note that some registrations affect other subtags--perhaps more than
1455	   one--as when a region subtag is being deprecated in favor of a new
1456	   value.  The Language Subtag Reviewer is responsible for ensuring that
1457	   any such changes are properly registered, with each change requiring
1458	   its own registration form.

1460	   The Language Subtag Reviewer MUST ensure that new subtags meet the
1461	   requirements elsewhere in this document (and most especially in
1462	   Section 3.4) or submit an appropriate registration form for an
1463	   alternate subtag as described in that section.  Each individual
1464	   subtag affected by a change MUST be sent to the ietf-languages list
1465	   with its own registration form and in a separate message.

1467	3.4.  Stability of IANA Registry Entries

1469	   The stability of entries and their meaning in the registry is
1470	   critical to the long-term stability of language tags.  The rules in
1471	   this section guarantee that a specific language tag's meaning is
1472	   stable over time and will not change.

1474	   These rules specifically deal with how changes to codes (including
1475	   withdrawal and deprecation of codes) maintained by ISO 639, ISO
1476	   15924, ISO 3166, and UN M.49 are reflected in the IANA Language
1477	   Subtag Registry.  Assignments to the IANA Language Subtag Registry
1478	   MUST follow the following stability rules:

1480	   1.   Values in the fields 'Type', 'Subtag', 'Tag', and 'Added' MUST
1481	        NOT be changed and are guaranteed to be stable over time.

1483	   2.   Values in the fields 'Preferred-Value' and 'Deprecated' MAY be
1484	        added, altered, or removed via the registration process.  These
1485	        changes SHOULD be limited to changes necessary to mirror changes
1486	        in one of the underlying standards (ISO 639, ISO 15924, ISO
1487	        3166-1, or UN M.49) and typically alteration or removal of a
1488	        Preferred-Value is limited specifically to region codes.

1490	   3.   Values in the 'Description' field MUST NOT be changed in a way
1491	        that would invalidate previously-existing tags.  They MAY be
1492	        broadened somewhat in scope, changed to add information, or
1493	        adapted to the most common modern usage.  For example, countries
1494	        occasionally change their names; a historical example of this
1495	        would be "Upper Volta" changing to "Burkina Faso".

1497	   4.   The field 'Prefix' MUST NOT be removed from any record in which
1498	        it appears.  This field SHOULD be included in the initial
1499	        registration of any records of type 'variant' and MUST be
1500	        included in any records of type 'extlang'.

1502	   5.   Values in the field 'Prefix' MAY be added to existing records of
1503	        type 'variant' via the registration process.  If a prefix is
1504	        added to a variant record, 'Comment' fields SHOULD be used to
1505	        explain different usages with the various prefixes.

1507	   6.   Values in the field 'Prefix' in records of type 'variant' MAY
1508	        also be modified, so long as the modifications broaden the set
1509	        of prefixes.  That is, a prefix MAY be replaced by one of its
1510	        own prefixes.  For example, the prefix "en-US" could be replaced
1511	        by "en", but not by the prefixes "en-Latn", "fr", or "en-US-
1512	        boont".  If one of those prefixes were needed, a new Prefix
1513	        SHOULD be registered.

1515	   7.   Values in the field 'Prefix' in records of type 'extlang' MUST
1516	        NOT be added, modified, or removed.

1518	   8.   The field 'Comments' MAY be added, changed, modified, or removed
1519	        via the registration process or any of the processes or
1520	        considerations described in this section.

1522	   9.   The field 'Suppress-Script' MAY be added or removed via the
1523	        registration process.

1525	   10.  The field 'Macrolanguage' MAY be added or removed via the
1526	        registration process, but only in response to changes made by
1527	        ISO 639.  The Macrolanguage field appears whenever a language
1528	        has a corresponding Macrolanguage in ISO 639.  That is, the
1529	        Macrolanguage fields in the registry exactly match those of ISO
1530	        639.  No other macrolanguage mappings will be considered for
1531	        registration.

1533	   11.  The field 'Scope' MUST NOT be added or removed from a primary or
1534	        extended language subtag after initial registration, howeiver it
1535	        MAY be modified in order to match any changes made by ISO 639.

1537	   12.  Primary and extended language subtags (other than independently
1538	        registered values created using the registration process) are
1539	        created according to the assignments of the various parts of ISO
1540	        639, as follows:

1542	        1.  Codes assigned by ISO 639-1 that do not conflict with
1543	            existing two-letter primary language subtags and which have
1544	            no corresponding three-letter primary defined in the
1545	            registry are entered into the IANA registry as new records
1546	            of type 'language'.  Note that languages given an ISO 639-1
1547	            code cannot be extended language subtags, even if enclosed
1548	            by a macrolanguage.

1550	        2.  Codes assigned by ISO 639-3 or ISO 639-5 that do not
1551	            conflict with existing three-letter primary language subtags
1552	            and which do not have ISO 639-1 codes assigned (or expected
1553	            to be assigned) are entered into the IANA registry as new
1554	            records of type 'language'.  Codes that have a defined
1555	            "macrolanguage" mapping at the time of their registration
1556	            MUST contain a "Macrolanguage" field.

1558	        3.  Codes assigned by ISO 639-3 MAY also be considered for an
1559	            extended language subtag registration.  Note that they MUST
1560	            be assigned a primary language subtag record of type
1561	            'language' even when an 'extlang' record is proposed.  When
1562	            considering extended language subtag assignment, these
1563	            criteria apply:

1565	            1.  Languages whose macrolanguage mapping at the time of
1566	                their creation that maps to a language that has extended
1567	                language records assigned SHOULD have an 'extlang'
1568	                record.  For example, any language with a macrolanguage
1569	                of 'zh' or 'ar'.

1571	            2.  'Extlang' records SHOULD NOT be created for languages if
1572	                other languages enclosed by the macrolanguage do not
1573	                also include 'extlang' records.  For example, if a new
1574	                Serbo-Croatian ('sh') language were registered, it would
1575	                not get an extlang record because other languages
1576	                enclosed such as Serbian ('sr') do not include one in
1577	                the registry.

1579	            3.  Sign languages SHOULD have an 'extlang' record with a
1580	                'Prefix' of 'sgn'.

1582	            4.  'Extlang' records MUST NOT be created for items already
1583	                in the registry.  Extended language subtags will only be
1584	                considered at the time of initial registration.

1586	            5.  Extended language subtag records MUST include the fields
1587	                'Prefix', 'Deprecated', and 'Preferred-Value' with
1588	                field-values assigned as described in Section 2.2.2.

1590	        4.  Any other codes assigned by ISO 639-2 that do not conflict
1591	            with existing three-letter primary or extended language
1592	            subtags and which do not have ISO 639-1 two-letter codes
1593	            assigned are entered into the IANA registry as new records
1594	            of type 'language'.  This type of registration is not
1595	            supposed to occur in the future.

1597	   13.  Codes assigned by ISO 15924 and ISO 3166-1 that do not conflict
1598	        with existing subtags of the associated type and whose meaning
1599	        is not the same as an existing subtag of the same type are
1600	        entered into the IANA registry as new records.

1602	   14.  Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that are
1603	        withdrawn by their respective maintenance or registration
1604	        authority remain valid in language tags.  A 'Deprecated' field
1605	        containing the date of withdrawal MUST be added to the record.

1607	        If a new record of the same type is added that represents a
1608	        replacement value, then a 'Preferred-Value' field MAY also be
1609	        added.  The registration process MAY be used to add comments
1610	        about the withdrawal of the code by the respective standard.

1612	        Example  The region code 'TL' was assigned to the country
1613	           'Timor-Leste', replacing the code 'TP' (which was assigned to
1614	           'East Timor' when it was under administration by Portugal).
1615	           The subtag 'TP' remains valid in language tags, but its
1616	           record contains the a 'Preferred-Value' of 'TL' and its field
1617	           'Deprecated' contains the date the new code was assigned
1618	           ('2004-07-06').

1620	   15.  Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that
1621	        conflict with existing subtags of the associated type, including
1622	        subtags that are deprecated, MUST NOT be entered into the
1623	        registry.  The following additional considerations apply to
1624	        subtag values that are reassigned:

1626	        A.  For ISO 639 codes, if the newly assigned code's meaning is
1627	            not represented by a subtag in the IANA registry, the
1628	            Language Subtag Reviewer, as described in Section 3.5, SHALL
1629	            prepare a proposal for entering in the IANA registry as soon
1630	            as practical a registered language subtag as an alternate
1631	            value for the new code.  The form of the registered language
1632	            subtag will be at the discretion of the Language Subtag
1633	            Reviewer and MUST conform to other restrictions on language
1634	            subtags in this document.

1636	        B.  For all subtags whose meaning is derived from an external
1637	            standard (that is, by ISO 639, ISO 15924, ISO 3166-1, or UN
1638	            M.49), if a new meaning is assigned to an existing code and
1639	            the new meaning broadens the meaning of that code, then the
1640	            meaning for the associated subtag MAY be changed to match.
1641	            The meaning of a subtag MUST NOT be narrowed, however, as
1642	            this can result in an unknown proportion of the existing
1643	            uses of a subtag becoming invalid.  Note: ISO 639
1644	            registration authority (RA) has adopted a similar stability
1645	            policy.

1647	        C.  For ISO 15924 codes, if the newly assigned code's meaning is
1648	            not represented by a subtag in the IANA registry, the
1649	            Language Subtag Reviewer, as described in Section 3.5, SHALL
1650	            prepare a proposal for entering in the IANA registry as soon
1651	            as practical a registered variant subtag as an alternate
1652	            value for the new code.  The form of the registered variant
1653	            subtag will be at the discretion of the Language Subtag
1654	            Reviewer and MUST conform to other restrictions on variant
1655	            subtags in this document.

1657	        D.  For ISO 3166-1 codes, if the newly assigned code's meaning
1658	            is associated with the same UN M.49 code as another 'region'
1659	            subtag, then the existing region subtag remains as the
1660	            preferred value for that region and no new entry is created.
1661	            A comment MAY be added to the existing region subtag
1662	            indicating the relationship to the new ISO 3166-1 code.

1664	        E.  For ISO 3166-1 codes, if the newly assigned code's meaning
1665	            is associated with a UN M.49 code that is not represented by
1666	            an existing region subtag, then the Language Subtag
1667	            Reviewer, as described in Section 3.5, SHALL prepare a
1668	            proposal for entering the appropriate UN M.49 country code
1669	            as an entry in the IANA registry.

1671	        F.  For ISO 3166-1 codes, if there is no associated UN numeric
1672	            code, then the Language Subtag Reviewer SHALL petition the
1673	            UN to create one.  If there is no response from the UN
1674	            within ninety days of the request being sent, the Language
1675	            Subtag Reviewer SHALL prepare a proposal for entering in the
1676	            IANA registry as soon as practical a registered variant
1677	            subtag as an alternate value for the new code.  The form of
1678	            the registered variant subtag will be at the discretion of
1679	            the Language Subtag Reviewer and MUST conform to other
1680	            restrictions on variant subtags in this document.  This
1681	            situation is very unlikely to ever occur.

1683	   16.  UN M.49 has codes for both countries and areas (such as '276'
1684	        for Germany) and geographical regions and sub-regions (such as
1685	        '150' for Europe).  UN M.49 country or area codes for which
1686	        there is no corresponding ISO 3166-1 code SHOULD NOT be
1687	        registered, except as a surrogate for an ISO 3166-1 code that is
1688	        blocked from registration by an existing subtag.  If such a code
1689	        becomes necessary, then the registration authority for ISO
1690	        3166-1 SHOULD first be petitioned to assign a code to the
1691	        region.  If the petition for a code assignment by ISO 3166-1 is
1692	        refused or not acted on in a timely manner, the registration
1693	        process described in Section 3.5 MAY then be used to register
1694	        the corresponding UN M.49 code.  This way, UN M.49 codes remain
1695	        available as the value of last resort in cases where ISO 3166-1
1696	        reassigns a deprecated value in the registry.

1698	   17.  Stability provisions apply to grandfathered tags with this
1699	        exception: should it become possible to compose one of the
1700	        grandfathered tags from registered subtags, then the field
1701	        'Type' in that record is changed from 'grandfathered' to
1702	        'redundant'.  Note that this will not affect language tags that
1703	        match the grandfathered tag, since these tags will now match
1704	        valid generative subtag sequences.  For example, the variant
1705	        subtag '1901' is registered, making the formerly-grandfathered
1706	        tags such as "de-1901" and "de-AT-1901" redundant as a result.
1707	        Of course, these tags, where applied to existing content or in
1708	        existing implementations, remain valid (all of their subtags are
1709	        in the registry, after all), while new tags or applications
1710	        using these subtags become possible.

1712	   Note: The redundant and grandfathered entries together are the
1713	   complete list of tags registered under [RFC3066].  The redundant tags
1714	   are those that can now be formed using the subtags defined in the
1715	   registry together with the rules of Section 2.2.  The grandfathered
1716	   entries include those that can never be legal under those same
1717	   provisions plus those tags that contain subtags not yet registered
1718	   or, perhaps, inappropriate for registration.

1720	   The set of redundant and grandfathered tags is permanent and stable:
1721	   new entries in this section MUST NOT be added and existing entries
1722	   MUST NOT be removed.  Records of type 'grandfathered' MAY have their
1723	   type converted to 'redundant'; see item 12 in Section 3.6 for more
1724	   information.  The decision-making process about which tags were
1725	   initially grandfathered and which were made redundant is described in
1726	   [RFC4645].

1728	   RFC 3066 tags that were deprecated prior to the adoption of [RFC4646]
1729	   are part of the list of grandfathered tags, and their component
1730	   subtags were not included as registered variants (although they
1731	   remain eligible for registration).  For example, the tag "art-lojban"
1732	   was deprecated in favor of the language subtag 'jbo'.

1734	3.5.  Registration Procedure for Subtags

1736	   The procedure given here MUST be used by anyone who wants to use a
1737	   subtag not currently in the IANA Language Subtag Registry.

1739	   Only subtags of type 'language' and 'variant' will be considered for
1740	   independent registration of new subtags.  Subtags needed for
1741	   stability and subtags necessary to keep the registry synchronized
1742	   with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits
1743	   defined by this document also use this process, as described in
1744	   Section 3.3.  Stability provisions are described in Section 3.4.

1746	   This procedure MAY also be used to register or alter the information
1747	   for the 'Comments', 'Deprecated', 'Description', 'Prefix',
1748	   'Preferred-Value', or 'Suppress-Script' fields in a subtag's record
1749	   as described in Section 3.4.  Changes to all other fields in the IANA
1750	   registry are NOT permitted.

1752	   Registering a new subtag or requesting modifications to an existing
1753	   tag or subtag starts with the requester filling out the registration
1754	   form reproduced below.  Note that each response is not limited in
1755	   size so that the request can adequately describe the registration.
1756	   The fields in the "Record Requested" section SHOULD follow the
1757	   requirements in Section 3.1.

1759	   LANGUAGE SUBTAG REGISTRATION FORM
1760	   1. Name of requester:
1761	   2. E-mail address of requester:
1762	   3. Record Requested:

1764	      Type:
1765	      Subtag:
1766	      Description:
1767	      Prefix:
1768	      Preferred-Value:
1769	      Deprecated:
1770	      Suppress-Script:
1771	      Macrolanguage:
1772	      Comments:

1774	   4. Intended meaning of the subtag:
1775	   5. Reference to published description
1776	      of the language (book or article):
1777	   6. Any other relevant information:

1779	              Figure 5: The Language Subtag Registration Form

1781	   Examples of completed registration forms can be found in Appendix C
1782	   or online at http://www.iana.org/assignments/lang-subtags-templates/.

1784	   The subtag registration form MUST be sent to
1785	   <ietf-languages@iana.org> for a two-week review period before it can
1786	   be submitted to IANA.  If modifications are made to the request
1787	   during the course of the registration process (such as corrections to
1788	   meet the requirements in Section 3.1) the modified form MUST also be
1789	   sent to <ietf-languages@iana.org> at least one week prior to
1790	   submission to IANA.

1792	   The ietf-languages list is an open list and can be joined by sending
1793	   a request to <ietf-languages-request@iana.org>.  The list can be
1794	   hosted by IANA or by any third party at the request of IESG.

1796	   Before forwarding a new registration to IANA, the Language Subtag
1797	   Reviewer MUST ensure that all requirements in this document are met
1798	   and that values in the 'Subtag' field match case according to the
1799	   description in Section 3.1.  The Reviewer MUST also ensure that an
1800	   appropriate File-Date record is included in the request, to assist
1801	   IANA when updating the registry (see Section 5.1).

1803	   Some fields in both the registration form as well as the registry
1804	   record itself permit the use of non-ASCII characters.  Registration
1805	   requests SHOULD use the UTF-8 encoding for consistency and clarity.
1806	   However, since some mail clients do not support this encoding, other
1807	   encodings MAY be used for the registration request.  The Language
1808	   Subtag Reviewer is responsible for ensuring that the proper Unicode
1809	   characters appear in both the archived request form and the registry
1810	   record.  In the case of a transcription or encoding error by IANA,
1811	   the Language Subtag Reviewer will request that the registry be
1812	   repaired, providing any necessary information to assist IANA.

1814	   Extended language subtags (type 'extlang'), by definition, are always
1815	   enclosed by another language.  All records of type 'extlang' MUST,
1816	   therefore, contain a 'Prefix' field at the time of registration.
1817	   This prefix value can never be altered or removed.

1819	   Variant subtags are usually registered for use with a particular
1820	   range of language tags.  For example, the subtag 'rozaj' is intended
1821	   for use with language tags that start with the primary language
1822	   subtag "sl", since Resian is a dialect of Slovenian.  Thus, the
1823	   subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj"
1824	   or "sl-IT-rozaj".  This information is stored in the 'Prefix' field
1825	   in the registry.  Variant registration requests SHOULD include at
1826	   least one 'Prefix' field in the registration form.

1828	   The 'Prefix' field for a given registered variant subtag exists in
1829	   the IANA registry as a guide to usage.  Additional prefixes MAY be
1830	   added by filing an additional registration form.  In that form, the
1831	   "Any other relevant information:" field MUST indicate that it is the
1832	   addition of a prefix.

1834	   Requests to add a 'Prefix' field to a variant subtag that imply a
1835	   different semantic meaning SHOULD be rejected.  For example, a
1836	   request to add the prefix "de" to the subtag '1994' so that the tag
1837	   "de-1994" represented some German dialect or orthographic form would
1838	   be rejected.  The '1994' subtag represents a particular Slovenian
1839	   orthography and the additional registration would change or blur the
1840	   semantic meaning assigned to the subtag.  A separate subtag SHOULD be
1841	   proposed instead.

1843	   The 'Description' field MUST contain a description of the tag being
1844	   registered written or transcribed into the Latin script; it MAY also
1845	   include a description in a non-Latin script.  The 'Description' field
1846	   is used for identification purposes and doesn't necessarily represent
1847	   the actual native name of the language or variation or to be in any
1848	   particular language.

1850	   While the 'Description' field itself is not guaranteed to be stable
1851	   and errata corrections MAY be undertaken from time to time, attempts
1852	   to provide translations or transcriptions of entries in the registry
1853	   itself will probably be frowned upon by the community or rejected
1854	   outright, as changes of this nature have an impact on the provisions
1855	   in Section 3.4.

1857	   When the two-week period has passed, the Language Subtag Reviewer
1858	   MUST take one of the following actions:

1860	   o  Explicitly accept the request and forward the form containing the
1861	      record to be inserted or modified to iana@iana.org according to
1862	      the procedure described in Section 3.3.

1864	   o  Explicitly reject the request because of significant objections
1865	      raised on the list or due to problems with constraints in this
1866	      document (which MUST be explicitly cited).

1868	   o  Extend the review period by granting an additional two-week
1869	      increment to permit further discussion.  After each two-week
1870	      increment, the Language Subtag Reviewer MUST indicate on the list
1871	      whether the registration has been accepted, rejected, or extended.

1873	   Note that the Language Subtag Reviewer MAY raise objections on the
1874	   list if he or she so desires.  The important thing is that the
1875	   objection MUST be made publicly.

1877	   Sometimes the request needs to be modified as a result of discussion
1878	   during the review period or due to requirements in this document.
1879	   The applicant, Language Subtag Reviewer, or others are free to submit
1880	   a modified version of the completed registration form, which will be
1881	   considered in lieu of the original request with the explicit approval
1882	   of the applicant.  Such changes do not restart the two-week
1883	   discussion period, although an application containing the final
1884	   record submitted to IANA MUST appear on the list at least one week
1885	   prior to the Language Subtag Reviewer forwarding the record to IANA.
1886	   The applicant is also free to modify a rejected application with
1887	   additional information and submit it again; this starts a new two-
1888	   week comment period.

1890	   Registrations initiated due to the provisions of Section 3.3 or
1891	   Section 3.4 SHALL NOT be rejected altogether (since they have to
1892	   ultimately appear in the registry) and SHOULD be completed as quickly
1893	   as possible.  The review process allows list members to comment on
1894	   the specific information in the form and the record it contains and
1895	   thus help ensure that it is correct and consistent.  The Language
1896	   Subtag Reviewer MAY reject a specific version of the form, but MUST
1897	   include in the rejection a suitable replacement, extending the review
1898	   period as described above, until the form is in a format worthy of
1899	   reviewer's approval.

1901	   Decisions made by the Language Subtag Reviewer MAY be appealed to the
1902	   IESG [RFC2028] under the same rules as other IETF decisions
1903	   [RFC2026].  This includes a decision to extend the review period or
1904	   the failure to announce a decision in a clear and timely manner.

1906	   The approved records appear in the Language Subtag Registry.  The
1907	   approved registration forms are available online under
1908	   http://www.iana.org/assignments/lang-subtags-templates/.

1910	   Updates or changes to existing records follow the same procedure as
1911	   new registrations.  The Language Subtag Reviewer decides whether
1912	   there is consensus to update the registration following the two week
1913	   review period; normally, objections by the original registrant will
1914	   carry extra weight in forming such a consensus.

1916	   Registrations are permanent and stable.  Once registered, subtags
1917	   will not be removed from the registry and will remain a valid way in
1918	   which to specify a specific language or variant.

1920	   Note: The purpose of the "Reference to published description" section
1921	   in the registration form is to aid in verifying whether a language is
1922	   registered or what language or language variation a particular subtag
1923	   refers to.  In most cases, reference to an authoritative grammar or
1924	   dictionary of that language will be useful; in cases where no such
1925	   work exists, other well-known works describing that language or in
1926	   that language MAY be appropriate.  The Language Subtag Reviewer
1927	   decides what constitutes "good enough" reference material.  This
1928	   requirement is not intended to exclude particular languages or
1929	   dialects due to the size of the speaker population or lack of a
1930	   standardized orthography.  Minority languages will be considered
1931	   equally on their own merits.

1933	3.6.  Possibilities for Registration

1935	   Possibilities for registration of subtags or information about
1936	   subtags include:

1938	   o  Primary language subtags for languages not listed in ISO 639 that
1939	      are not variants of any listed or registered language MAY be
1940	      registered.  At the time this document was created, there were no
1941	      examples of this form of subtag.  Before attempting to register a
1942	      language subtag, there MUST be an attempt to register the language
1943	      with ISO 639.  Subtags MUST NOT be registered for languages
1944	      defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3,
1945	      or that are under consideration by the ISO 639 registration
1946	      authorities, or that have never been attempted for registration
1947	      with those authorities.  If ISO 639 has previously rejected a
1948	      language for registration, it is reasonable to assume that there
1949	      must be additional, very compelling evidence of need before it
1950	      will be registered as a primary language subtag in the IANA
1951	      registry (to the extent that it is very unlikely that any subtags
1952	      will be registered of this type).

1954	   o  Dialect or other divisions or variations within a language, its
1955	      orthography, writing system, regional or historical usage,
1956	      transliteration or other transformation, or distinguishing
1957	      variation MAY be registered as variant subtags.  An example is the
1958	      'rozaj' subtag (the Resian dialect of Slovenian).

1960	   o  The addition or maintenance of fields (generally of an
1961	      informational nature) in Tag or Subtag records as described in
1962	      Section 3.1 and subject to the stability provisions in
1963	      Section 3.4.  This includes descriptions, comments, deprecation
1964	      and preferred values for obsolete or withdrawn codes, or the
1965	      addition of script or macrolanguage information to primary
1966	      language subtags.

1968	   o  The addition of records and related field value changes necessary
1969	      to reflect assignments made by ISO 639, ISO 15924, ISO 3166-1, and
1970	      UN M.49 as described in Section 3.4.

1972	   Subtags proposed for registration that would cause all or part of a
1973	   grandfathered tag to become redundant but whose meaning conflicts
1974	   with or alters the meaning of the grandfathered tag MUST be rejected.

1976	   This document leaves the decision on what subtags or changes to
1977	   subtags are appropriate (or not) to the registration process
1978	   described in Section 3.5.

1980	   Note: four-character primary language subtags are reserved to allow
1981	   for the possibility of alpha4 codes in some future addition to the
1982	   ISO 639 family of standards.

1984	   ISO 639 defines a registration authority for additions to and changes
1985	   in the list of languages in ISO 639.  This agency is:

1987	   International Information Centre for Terminology (Infoterm)
1988	   Aichholzgasse 6/12, AT-1120
1989	   Wien, Austria
1990	   Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72
1991	   ISO 639-2 defines a registration authority for additions to and
1992	   changes in the list of languages in ISO 639-2.  This agency is:

1994	   Library of Congress
1995	   Network Development and MARC Standards Office
1996	   Washington, D.C. 20540 USA
1997	   Phone: +1 202 707 6237 Fax: +1 202 707 0115
1998	   URL: http://www.loc.gov/standards/iso639-2

2000	   ISO 639-3 defines a registration authority for additions to and
2001	   changes in the list of languages in ISO 639-3.  This agency is:

2003	   SIL International
2004	   ISO 639-3 Registrar
2005	   7500 W. Camp Wisdom Rd.
2006	   Dallas, TX 75236 USA
2007	   Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546
2008	   Email: iso639-3@sil.org
2009	   URL: http://www.sil.org/iso639-3

2011	   ISO 639-5 defines a registration authority for additions to and
2012	   changes in the list of languages in ISO 639-5.  This agency is the
2013	   same as for ISO 639-2 and is:

2015	   Library of Congress
2016	   Network Development and MARC Standards Office
2017	   Washington, D.C. 20540 USA
2018	   Phone: +1 202 707 6237 Fax: +1 202 707 0115
2019	   URL: http://www.loc.gov/standards/iso639-5

2021	   The maintenance agency for ISO 3166-1 (country codes) is:

2023	   ISO 3166 Maintenance Agency
2024	   c/o International Organization for Standardization
2025	   Case postale 56
2026	   CH-1211 Geneva 20 Switzerland
2027	   Phone: +41 22 749 72 33 Fax: +41 22 749 73 49
2028	   URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html

2030	   The registration authority for ISO 15924 (script codes) is:

2032	   Unicode Consortium Box 391476
2033	   Mountain View, CA 94039-1476, USA
2034	   URL: http://www.unicode.org/iso15924

2036	   The Statistics Division of the United Nations Secretariat maintains
2037	   the Standard Country or Area Codes for Statistical Use and can be
2038	   reached at:

2040	   Statistical Services Branch
2041	   Statistics Division
2042	   United Nations, Room DC2-1620
2043	   New York, NY 10017, USA

2045	   Fax: +1-212-963-0623
2046	   E-mail: statistics@un.org
2047	   URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm

2049	3.7.  Extensions and the Extensions Registry

2051	   Extension subtags are those introduced by single-character subtags
2052	   ("singletons") other than 'x'.  They are reserved for the generation
2053	   of identifiers that contain a language component and are compatible
2054	   with applications that understand language tags.

2056	   The structure and form of extensions are defined by this document so
2057	   that implementations can be created that are forward compatible with
2058	   applications that might be created using singletons in the future.
2059	   In addition, defining a mechanism for maintaining singletons will
2060	   lend stability to this document by reducing the likely need for
2061	   future revisions or updates.

2063	   Single-character subtags are assigned by IANA using the "IETF
2064	   Consensus" policy defined by [RFC2434].  This policy requires the
2065	   development of an RFC, which SHALL define the name, purpose,
2066	   processes, and procedures for maintaining the subtags.  The
2067	   maintaining or registering authority, including name, contact email,
2068	   discussion list email, and URL location of the registry, MUST be
2069	   indicated clearly in the RFC.  The RFC MUST specify or include each
2070	   of the following:

2072	   o  The specification MUST reference the specific version or revision
2073	      of this document that governs its creation and MUST reference this
2074	      section of this document.

2076	   o  The specification and all subtags defined by the specification
2077	      MUST follow the ABNF and other rules for the formation of tags and
2078	      subtags as defined in this document.  In particular, it MUST
2079	      specify that case is not significant and that subtags MUST NOT
2080	      exceed eight characters in length.

2082	   o  The specification MUST specify a canonical representation.

2084	   o  The specification of valid subtags MUST be available over the
2085	      Internet and at no cost.

2087	   o  The specification MUST be in the public domain or available via a
2088	      royalty-free license acceptable to the IETF and specified in the
2089	      RFC.

2091	   o  The specification MUST be versioned, and each version of the
2092	      specification MUST be numbered, dated, and stable.

2094	   o  The specification MUST be stable.  That is, extension subtags,
2095	      once defined by a specification, MUST NOT be retracted or change
2096	      in meaning in any substantial way.

2098	   o  The specification MUST include in a separate section the
2099	      registration form reproduced in this section (below) to be used in
2100	      registering the extension upon publication as an RFC.

2102	   o  IANA MUST be informed of changes to the contact information and
2103	      URL for the specification.

2105	   IANA will maintain a registry of allocated single-character
2106	   (singleton) subtags.  This registry MUST use the record-jar format
2107	   described by the ABNF in Section 3.1.  Upon publication of an
2108	   extension as an RFC, the maintaining authority defined in the RFC
2109	   MUST forward this registration form to iesg@ietf.org, who MUST
2110	   forward the request to iana@iana.org.  The maintaining authority of
2111	   the extension MUST maintain the accuracy of the record by sending an
2112	   updated full copy of the record to iana@iana.org with the subject
2113	   line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes.  Only
2114	   the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY
2115	   be modified in these updates.

2117	   Failure to maintain this record, maintain the corresponding registry,
2118	   or meet other conditions imposed by this section of this document MAY
2119	   be appealed to the IESG [RFC2028] under the same rules as other IETF
2120	   decisions (see [RFC2026]) and MAY result in the authority to maintain
2121	   the extension being withdrawn or reassigned by the IESG.

2123	   %%
2124	   Identifier:
2125	   Description:
2126	   Comments:
2127	   Added:
2128	   RFC:
2129	   Authority:
2130	   Contact_Email:
2131	   Mailing_List:
2132	   URL:
2133	   %%

2135	    Figure 6: Format of Records in the Language Tag Extensions Registry

2137	   'Identifier' contains the single-character subtag (singleton)
2138	   assigned to the extension.  The Internet-Draft submitted to define
2139	   the extension SHOULD specify which letter or digit to use, although
2140	   the IESG MAY change the assignment when approving the RFC.

2142	   'Description' contains the name and description of the extension.

2144	   'Comments' is an OPTIONAL field and MAY contain a broader description
2145	   of the extension.

2147	   'Added' contains the date the extension's RFC was published in the
2148	   "full-date" format specified in [RFC3339].  For example: 2004-06-28
2149	   represents June 28, 2004, in the Gregorian calendar.

2151	   'RFC' contains the RFC number assigned to the extension.

2153	   'Authority' contains the name of the maintaining authority for the
2154	   extension.

2156	   'Contact_Email' contains the email address used to contact the
2157	   maintaining authority.

2159	   'Mailing_List' contains the URL or subscription email address of the
2160	   mailing list used by the maintaining authority.

2162	   'URL' contains the URL of the registry for this extension.

2164	   The determination of whether an Internet-Draft meets the above
2165	   conditions and the decision to grant or withhold such authority rests
2166	   solely with the IESG and is subject to the normal review and appeals
2167	   process associated with the RFC process.

2169	   Extension authors are strongly cautioned that many (including most
2170	   well-formed) processors will be unaware of any special relationships
2171	   or meaning inherent in the order of extension subtags.  Extension
2172	   authors SHOULD avoid subtag relationships or canonicalization
2173	   mechanisms that interfere with matching or with length restrictions
2174	   that sometimes exist in common protocols where the extension is used.
2175	   In particular, applications MAY truncate the subtags in doing
2176	   matching or in fitting into limited lengths, so it is RECOMMENDED
2177	   that the most significant information be in the most significant
2178	   (left-most) subtags and that the specification gracefully handle
2179	   truncated subtags.

2181	   When a language tag is to be used in a specific, known, protocol, it
2182	   is RECOMMENDED that the language tag not contain extensions not
2183	   supported by that protocol.  In addition, note that some protocols
2184	   MAY impose upper limits on the length of the strings used to store or
2185	   transport the language tag.

2187	3.8.  Update of the Language Subtag Registry

2189	   Upon adoption of this document the IANA Language Subtag Registry will
2190	   need an update so that it contains the complete set of subtags valid
2191	   in a language tag.  This collection of subtags, along with a
2192	   description of the process used to create it, is described by
2193	   [registry-update].  IANA will publish the updated version of the
2194	   registry described by this document using the instructions and
2195	   content of [registry-update].  Once published by IANA, the
2196	   maintenance procedures, rules, and registration processes described
2197	   in this document will be available for new registrations or updates.

2199	   Registrations that are in process under the rules defined in
2200	   [RFC4646] when this document is adopted MUST be completed under the
2201	   rules contained in this document.

2203	4.  Formation and Processing of Language Tags

2205	   This section addresses how to use the information in the registry
2206	   with the tag syntax to choose, form, and process language tags.

2208	4.1.  Choice of Language Tag

2210	   The guiding principle in forming language tags is to "tag content
2211	   wisely."  Sometimes there is a choice between several possible tags
2212	   for the same content.  The choice of which tag to use depends on the
2213	   content and application in question and some amount of judgment might
2214	   be necessary when selecting a tag.

2216	   Interoperability is best served when the same language tag is used
2217	   consistently to represent the same language.  If an application has
2218	   requirements that make the rules here inapplicable, then that
2219	   application risks damaging interoperability.  It is strongly
2220	   RECOMMENDED that users not define their own rules for language tag
2221	   choice.

2223	   Standards, protocols, and applications that reference this document
2224	   normatively but apply different rules to the ones given in this
2225	   section MUST specify how language tag selection varies from the
2226	   guidelines given here.

2228	   To ensure consistent backward compatibility, this document contains
2229	   several provisions to account for potential instability in the
2230	   standards used to define the subtags that make up language tags.
2231	   These provisions mean that no valid language tag can become invalid,
2232	   nor will a language tag have a narrower scope in the future (it may
2233	   have a broader scope).  The most appropriate language tag for a given
2234	   application or content item might evolve over time, but once tagged
2235	   the content cannot become invalid.

2237	   A subtag SHOULD only be used when it adds useful distinguishing
2238	   information to the tag.  Extraneous subtags interfere with the
2239	   meaning, understanding, and processing of language tags.  In
2240	   particular, users and implementations SHOULD follow the 'Prefix' and
2241	   'Suppress-Script' fields in the registry (defined in Section 3.1):
2242	   these fields provide guidance on when specific additional subtags
2243	   SHOULD be used or avoided in a language tag.

2245	   Some applications can benefit from the use of script subtags in
2246	   language tags, as long as the use is consistent for a given context.
2247	   Script subtags are never appropriate for unwritten content (such as
2248	   audio recordings).

2250	   Script subtags were first formally defined in BCP 47 by [RFC4646].

2252	   Their use can affect matching and subtag identification for
2253	   implementations of previous versions of BCP 47 (i.e.  [RFC1766] or
2254	   [RFC3066]), as these subtags appear between the primary language and
2255	   region subtags.  For example, if an implementation selects content
2256	   using Basic Filtering [RFC4647] (originally described in Section 2.5
2257	   of [RFC3066]) and the user requested the language range "en-US",
2258	   content labeled "en-Latn-US" will not match the request and thus not
2259	   be selected.  Therefore, it is important to know when script subtags
2260	   will customarily be used and when they ought not be used.  In the
2261	   registry, the Suppress-Script field helps ensure greater
2262	   compatibility between the language tags by defining when users SHOULD
2263	   NOT include a script subtag with a particular primary language
2264	   subtag.

2266	   The choice of subtags used to form a language tag SHOULD follow these
2267	   guidelines:

2269	   1.  Use as precise a tag as possible, but no more specific than is
2270	       justified.  Avoid using subtags that are not important for
2271	       distinguishing content in an application.

2273	       *  For example, 'de' might suffice for tagging an email written
2274	          in German, while "de-CH-1996" is probably unnecessarily
2275	          precise for such a task.

2277	       *  Note that some subtag sequences might not represent the
2278	          language a casual user might expect, especially if when
2279	          relying on the subtag's description in the registry.  For
2280	          example, the Swiss German (Schweizerdeutsch) language is
2281	          represented by "gsw-CH" and not by "de-CH".  This latter tag
2282	          represents German ('de') as used in Switzerland ('CH'), also
2283	          known as Swiss High German (Schweizer Hochdeutsch).  Both are
2284	          real languages and distinguishing between them could be
2285	          important to an application.

2287	   2.  The script subtag SHOULD NOT be used to form language tags unless
2288	       the script adds some distinguishing information to the tag.  The
2289	       field 'Suppress-Script' in the primary language record in the
2290	       registry indicates script subtags that do not add distinguishing
2291	       information for most applications.  For example:

2293	       *  The subtag 'Latn' should not be used with the primary language
2294	          'en' because nearly all English documents are written in the
2295	          Latin script and it adds no distinguishing information.
2296	          However, if a document were written in English mixing Latin
2297	          script with another script such as Braille ('Brai'), then it
2298	          might be appropriate to choose to indicate both scripts to aid
2299	          in content selection, such as the application of a style
2300	          sheet.

2302	       *  When labeling content that is unwritten (such as a recording
2303	          of human speech), the script subtag should not be used, even
2304	          if the language is customarily written in several scripts.
2305	          Thus the subtitles to a movie might use the tag "uz-Arab"
2306	          (Uzbek, Arabic script), but the audio track for the same
2307	          language would be tagged simply "uz".  (The tag "uz-Zxxx"
2308	          could also be used where content is not written, as the subtag
2309	          'Zxxx' represents the "Code for unwritten documents".)

2311	   3.  If a tag or subtag has a 'Preferred-Value' field in its registry
2312	       entry, then the value of that field SHOULD be used to form the
2313	       language tag in preference to the tag or subtag in which the
2314	       preferred value appears.

2316	       *  For example, use 'he' for Hebrew in preference to 'iw'.

2318	   4.  [ISO639-2] has defined several codes included in the subtag
2319	       registry that require additional care when choosing language
2320	       tags.  In most of these cases, where omitting the language tag is
2321	       permitted, such omission is preferable to using these codes.
2322	       Language tags SHOULD NOT incorporate these subtags as a prefix,
2323	       unless the additional information conveys some value to the
2324	       application.

2326	       1.  Use specific language subtags or subtag sequences in
2327	           preference to subtags for language collections.  A "language
2328	           collection" is a subtag derived from one of the [ISO639-5]
2329	           codes that represents multiple related languages.  These
2330	           codes are included as primary language subtags in the
2331	           registry.  For example, the code 'cmc' represents "Chamic
2332	           languages".  The registry contains values for each of the
2333	           approximately ten individual languages represented by this
2334	           collective code.  Some other examples include the subtags
2335	           Germanic languages ('gem') or Algonquian languages ('alg').
2336	           Since these codes are interpreted inclusively, content tagged
2337	           with "en" (English), "de" (German), or "gsw" (Swiss German,
2338	           Alemannic) could also (but SHOULD NOT) be tagged with "gem"
2339	           (Germanic languages).  Subtags derived from collection codes
2340	           SHOULD NOT be used be used unless more specific language
2341	           information is not available.  Note that matching
2342	           implementations generally do not understand the relationship
2343	           between the collection and its encompassed languages, and so
2344	           users ought not assume a subtag based on a language
2345	           collection is a useful means for selecting content in its
2346	           encompassed languages.

2348	       2.  The 'mul' (Multiple) primary language subtag identifies
2349	           content in multiple languages.  This subtag SHOULD NOT be
2350	           used when a list of languages (such as Content-Language) or
2351	           individual tags for each content element can be used instead.

2353	       3.  The 'und' (Undetermined) primary language subtag identifies
2354	           linguistic content whose language is not determined.  This
2355	           subtag SHOULD NOT be used unless a language tag is required
2356	           and language information is not available or cannot be
2357	           determined.  Omitting the language tag (where permitted) is
2358	           preferred.  The 'und' subtag MAY be useful for protocols that
2359	           require a language tag to be provided or where a primary
2360	           language subtag is required (such as in "und-Latn").  The
2361	           'und' subtag MAY also be useful when matching language tags
2362	           in certain situations.

2364	       4.  The 'zxx' (Non-Linguistic, Not Applicable) primary language
2365	           subtag identifies content for which a language classification
2366	           is inappropriate or does not apply.  Some examples might
2367	           include instrumental or electronic music; sound recordings
2368	           consisting of nonverbal sounds; audiovisual materials with no
2369	           narration, dialog, printed titles, or subtitles; machine-
2370	           readable data files consisting of machine languages or
2371	           character codes; or programming source code.

2373	       5.  The 'mis' (Uncoded) primary language subtag identifies
2374	           content whose language is known but which does not currently
2375	           have a corresponding subtag.  This subtag SHOULD NOT be used.
2376	           Because the addition of other codes in the future can render
2377	           its application invalid, it is inherently unstable and hence
2378	           incompatible with the stability goals of BCP 47.  It is
2379	           always preferable to use other subtags: either 'und' or (with
2380	           prior agreement) private use subtags.

2382	       6.  The grandfathered tag "i-default" (Default Language) was
2383	           originally registered according to [RFC1766] to meet the
2384	           needs of [RFC2277].  It is used to indicate not a specific
2385	           language, but rather, it identifies the condition or content
2386	           used where the language preferences of the user cannot be
2387	           established.  It SHOULD NOT be used except as a means of
2388	           labeling the default content for applications or protocols
2389	           that require default language content to be labeled with that
2390	           specific tag.  It MAY also be used by an application or
2391	           protocol to identify when the default language content is
2392	           being returned.

2394	4.1.1.  Tagging Encompassed Languages

2396	   Some primary language records in the registry have a "Macrolanguage"
2397	   field (Section 3.1.9) that contains a mapping from each "encompassed
2398	   language" to its macrolanguage.  The Macrolanguage mapping doesn't
2399	   define what the relationship between the encompassed language and its
2400	   macrolanguage is, nor does it define how languages encompassed by the
2401	   same macrolanguage are related to each other.&#12288;Two different
2402	   languages encompassed by the same macrolanguage may differ from one
2403	   another more than say, French and Spanish do.

2405	   The macrolanguages Chinese ('zh') and Arabic ('ar') are handled
2406	   differently.  See Section 4.1.2.

2408	   The more specific encompassed language subtag SHOULD be used to form
2409	   the language tag, although either the macrolanguage's primary
2410	   language subtag or the encompassed language's subtag MAY be used.
2411	   This means, for example, tagging Plains Cree with 'crk' rather than
2412	   'cre' (Cree); and so forth.

2414	   Each Macrolanguage's subtag, by definition, includes all of its
2415	   encompassed languages.  Since the relationship between encompassed
2416	   languages varies, users cannot assume that the macrolanguage subtag
2417	   means any particular encompassed language nor that any given pair of
2418	   encompassed languages are mutually intelligible or otherwise
2419	   interchangeable.

2421	   Applications MAY use macrolanguage information to improve matching or
2422	   language negotiation.  For example, the information that 'sr'
2423	   (Serbian) and 'hr' (Croatian) share a macrolanguage expresses a
2424	   closer relation between those languages than between, say, 'sr'
2425	   (Serbian) and 'ma' (Macedonian).  However, this relationship is not
2426	   guaranteed nor is it exclusive.  For example, Romanian ('ro') and
2427	   Moldavian ('mo') do not share a macrolanguage, but are far more
2428	   closely related to each other than Cantonese ('yue') and Wu ('wuu') ,
2429	   which do share a macrolanguage.

2431	4.1.2.  Using Extended Language Subtags

2433	   To accommodate language tag forms used prior to the adoption of this
2434	   document, language tags provide a special compatibility mechanism:
2435	   the extended language subtag.  Selected languages have been provided
2436	   with both primary and extended language subtags.  These include
2437	   macrolanguages, such as Malay ('ms') or Uzbek ('uz'), that have a
2438	   specific dominant variety that is generally synonymous with the
2439	   macrolanguage.  Other languages, such as the Chinese ('zh') and
2440	   Arabic ('ar') macrolanguages and the various sign languages ('sgn'),
2441	   have traditionally used their primary language subtag, possibly
2442	   coupled with various region subtags or as part of a registered
2443	   grandfathered tag, to indicate the language.

2445	   With the adoption of this document, specific ISO 639-3 subtags became
2446	   available to identify the languages contained within these diverse
2447	   language families or groupings.  This presents a choice of language
2448	   tags where previously none existed:

2450	   o  Each encompassed language's subtag SHOULD be used as the primary
2451	      language subtag.  For example, a document in Mandarin Chinese
2452	      would be tagged "cmn" (the subtag for Mandarin Chinese) in
2453	      preference to "zh" (Chinese).

2455	   o  If compatibility is desired or needed, the encompassed subtag MAY
2456	      be used as an extended language subtag.  For example, a document
2457	      in Mandarin Chinese could be tagged "zh-cmn" instead of either
2458	      "cmn" or "zh".

2460	   o  The macrolanguage or prefixing subtag MAY still be used to form
2461	      the tag instead of the more specific encompassed langauge subtag.
2462	      That is, tags such as "zh-HK" or "sgn-RU" are still valid.

2464	   Chinese ('zh') provides a useful illustration of this.  In the past,
2465	   various content has used tags beginning with the 'zh' subtag, with
2466	   application specific meaning being associated with region codes,
2467	   private-use sequences, or grandfathered registered values.  This is
2468	   because historically only the macrolanguage subtag 'zh' was available
2469	   for forming language tags.  However, the languages encompassed by the
2470	   Chinese subtag 'zh' are, in the main, not mutually intelligible when
2471	   spoken and the written forms of these languages also show wide
2472	   variation in form and usage.

2474	   To provide compatibility, Chinese languages encompassed by the 'zh'
2475	   subtag are in the registry as both primary language subtags and as
2476	   extended language subtags.  For example, the ISO 639-3 code for
2477	   Cantonese is 'yue'.  Content in Cantonese might historically have
2478	   used a tag such as "zh-HK" (since Cantonese is spoken commonly in
2479	   Hong Kong), although that tag actually means any type of Chinese as
2480	   used in Hong Kong.  With the availability of ISO 639-3 codes in the
2481	   registry, content in Cantonese can be directly tagged using the 'yue'
2482	   subtag.  The content can use it as a primary language subtag, as in
2483	   the tag "yue-HK" (Cantonese, Hong Kong).  Or it can use an extended
2484	   language subtag with 'zh', as in the tag "zh-yue-Hant" (Chinese,
2485	   Cantonese, Traditional script).

2487	   As noted above, applications can choose to use the macrolanguage
2488	   subtag to form the tag instead of using the more specific encompassed
2489	   language subtag.  For example, an application with large quantities
2490	   of data already using tags with the 'zh' (Chinese) subtag might
2491	   continue to use this more general subtag even for new data, even
2492	   though the content could be more precisely tagged with 'cmn'
2493	   (Mandarin), 'yue' (Cantonese), 'wuu' (Wu), and so on.  Similarly, an
2494	   application already using tags that start with the 'ar' (Arabic)
2495	   subtag might continue to use this more general subtag even for new
2496	   data, which could be more precisely be tagged with 'arb' (Standard
2497	   Arabic).

2499	   In some cases, the encompassed languages had tags registered for them
2500	   during the RFC 3066 era.  Those grandfathered tags not already
2501	   deprecated or rendered redundant were deprecated in the registry upon
2502	   adoption of this document.  As grandfathered values, they remain
2503	   valid for use and some content or applications might use them.  As
2504	   with other grandfathered tags, since implementations might not be
2505	   able to associate the grandfathered tags with the encompassed
2506	   language subtag equivalents that are recommended by this document,
2507	   implementations are encouraged to canonicalize tags for comparison
2508	   purposes.  Some examples of this include the tags "zh-hakka" (Hakka)
2509	   and "zh-guoyu" (Mandarin or Standard Chinese).

2511	   Sign languages share a mode of communication rather than a linguistic
2512	   heritage.  There are many sign languages which have developed
2513	   independently and the subtag 'sgn' indicates only the presence of a
2514	   sign language.  A number of sign languages also had grandfathered
2515	   tags registered for them during the RFC 3066 era.

2517	   Sign languages also provide this mechanism.  Thus, a document in
2518	   American Sign Language can be labeled either "ase" or "sgn-ase" (the
2519	   'ase' subtag is for the language called 'American Sign Language').

2521	4.2.  Meaning of the Language Tag

2523	   The meaning of a language tag is related to the meaning of the
2524	   subtags that it contains.  Each subtag, in turn, implies a certain
2525	   range of expectations one might have for related content, although it
2526	   is not a guarantee.  For example, the use of a script subtag such as
2527	   'Arab' (Arabic script) does not mean that the content contains only
2528	   Arabic characters.  It does mean that the language involved is
2529	   predominantly in the Arabic script.  Thus a language tag and its
2530	   subtags can encompass a very wide range of variation and yet remain
2531	   appropriate in each particular instance.

2533	   Validity of a tag is not the only factor determining its usefulness.
2534	   While every valid tag has a meaning, it might not represent any real-
2535	   world language usage.  This is unavoidable in a system in which
2536	   subtags can be combined freely.  For example, tags such as
2537	   "ar-Cyrl-CO" (Arabic, Cyrillic script, as used in Colombia ) or "tlh-
2538	   Kore-AQ-fonipa" (Klingon, Korean script, as used in Antarctica, IPA
2539	   phonetic transcription) are both valid and unlikely to represent a
2540	   useful combination of language attributes.

2542	   The meaning of a given tag doesn't depend on the context in which it
2543	   appears.  The relationship between a tag's meaning and the
2544	   information objects to which that tag is applied, however, can vary.

2546	   o  For a single information object, the associated language tags
2547	      might be interpreted as the set of languages that is necessary for
2548	      a complete comprehension of the complete object.  Example: Plain
2549	      text documents.

2551	   o  For an aggregation of information objects, the associated language
2552	      tags could be taken as the set of languages used inside components
2553	      of that aggregation.  Examples: Document stores and libraries.

2555	   o  For information objects whose purpose is to provide alternatives,
2556	      the associated language tags could be regarded as a hint that the
2557	      content is provided in several languages and that one has to
2558	      inspect each of the alternatives in order to find its language or
2559	      languages.  In this case, the presence of multiple tags might not
2560	      mean that one needs to be multi-lingual to get complete
2561	      understanding of the document.  Example: MIME multipart/
2562	      alternative.

2564	   o  In markup languages, such as HTML and XML, language information
2565	      can be added to each part of the document identified by the markup
2566	      structure (including the whole document itself).  For example, one
2567	      could write <span lang="fr">C'est la vie.</span> inside a
2568	      Norwegian document; the Norwegian-speaking user could then access
2569	      a French-Norwegian dictionary to find out what the marked section
2570	      meant.  If the user were listening to that document through a
2571	      speech synthesis interface, this formation could be used to signal
2572	      the synthesizer to appropriately apply French text-to-speech
2573	      pronunciation rules to that span of text, instead of applying the
2574	      inappropriate Norwegian rules.

2576	   o  Language tags form the basis for most implementations of locale
2577	      identifiers.  For example, see Unicode's CLDR (Common Locale Data
2578	      Repository) project.

2580	   Language tags are related when they contain a similar sequence of
2581	   subtags.  For example, if a language tag B contains language tag A as
2582	   a prefix, then B is typically "narrower" or "more specific" than A.
2583	   Thus, "zh-Hant-TW" is more specific than "zh-Hant".

2585	   This relationship is not guaranteed in all cases: specifically,
2586	   languages that begin with the same sequence of subtags are NOT
2587	   guaranteed to be mutually intelligible, although they might be.  For
2588	   example, the tag "az" shares a prefix with both "az-Latn"
2589	   (Azerbaijani written using the Latin script) and "az-Cyrl"
2590	   (Azerbaijani written using the Cyrillic script).  A person fluent in
2591	   one script might not be able to read the other, even though the text
2592	   might be identical.  Content tagged as "az" most probably is written
2593	   in just one script and thus might not be intelligible to a reader
2594	   familiar with the other script.

2596	   Similarly, not all subtags specify an actual distinction in language.
2597	   For example, the tags "en-US" and "en-CA" mean, roughly, English with
2598	   features generally thought to be characteristic of the United States
2599	   and Canada, respectively.  They do not imply that a significant
2600	   dialectical boundary exists between any arbitrarily selected point in
2601	   the United States and any arbitrarily selected point in Canada.
2602	   Neither does a particular region subtag imply that linguistic
2603	   distinctions do not exist within that region.

2605	4.3.  Lists of Languages

2607	   In some applications, a single content item might best be associated
2608	   with more than one language tag.  Examples of such a usage include:

2610	   o  A language priority list [RFC4647] describing a user's language
2611	      preferences.  This is a (possibly weighted) list of potentially-
2612	      unrelated varieties, expressing a preference, rather than as a
2613	      declaration about actual content.

2615	   o  Content items that contain multiple, distinct varieties.  Often
2616	      this is used to indicate an appropriate audience for a given
2617	      content item when multiple choices might be appropriate.  Examples
2618	      of this could include:

2620	      *  Metadata about the appropriate audience for a movie title.  For
2621	         example, a DVD might label its individual audio tracks 'de'
2622	         (German), 'fr' (French), and 'es' (Spanish), but the overall
2623	         title would list "de, fr, es" as its overall audience.

2625	      *  A French/English, English/French dictionary tagged as both "en"
2626	         and "fr" to specify that it applies equally to French and
2627	         English

2629	      *  A side-by-side or interlinear translation of a document, as is
2630	         commonly done with classical works in Latin or Greek

2632	   o  Content items that contain a single language but which require
2633	      multiple levels of specificity.  For example, a library might wish
2634	      to classify a particular work as both Norwegian ('no') and as
2635	      Nynorsk ('nn') for audiences capable of appreciating the
2636	      distinction or needing to select content more narrowly.

2638	4.4.  Length Considerations

2640	   There is no defined upper limit on the size of language tags.  While
2641	   historically most language tags have consisted of language and region
2642	   subtags with a combined total length of up to six characters, larger
2643	   tags have always been both possible and actually appeared in use.

2645	   Neither the language tag syntax nor other requirements in this
2646	   document impose a fixed upper limit on the number of subtags in a
2647	   language tag (and thus an upper bound on the size of a tag).  The
2648	   language tag syntax suggests that, depending on the specific
2649	   language, more subtags (and thus a longer tag) are sometimes
2650	   necessary to completely identify the language for certain
2651	   applications; thus, it is possible to envision long or complex subtag
2652	   sequences.

2654	4.4.1.  Working with Limited Buffer Sizes

2656	   Some applications and protocols are forced to allocate fixed buffer
2657	   sizes or otherwise limit the length of a language tag.  A conformant
2658	   implementation or specification MAY refuse to support the storage of
2659	   language tags that exceed a specified length.  Any such limitation
2660	   SHOULD be clearly documented, and such documentation SHOULD include
2661	   what happens to longer tags (for example, whether an error value is
2662	   generated or the language tag is truncated).  A protocol that allows
2663	   tags to be truncated at an arbitrary limit, without giving any
2664	   indication of what that limit is, has the potential for causing harm
2665	   by changing the meaning of tags in substantial ways.

2667	   In practice, most language tags do not require more than a few
2668	   subtags and will not approach reasonably sized buffer limitations;
2669	   see Section 4.1.

2671	   Some specifications or protocols have limits on tag length but do not
2672	   have a fixed length limitation.  For example, [RFC2231] has no
2673	   explicit length limitation: the length available for the language tag
2674	   is constrained by the length of other header components (such as the
2675	   charset's name) coupled with the 76-character limit in [RFC2047].
2676	   Thus, the "limit" might be 50 or more characters, but it could
2677	   potentially be quite small.

2679	   The considerations for assigning a buffer limit are:

2681	      Implementations SHOULD NOT truncate language tags unless the
2682	      meaning of the tag is purposefully being changed, or unless the
2683	      tag does not fit into a limited buffer size specified by a
2684	      protocol for storage or transmission.

2686	      Implementations SHOULD warn the user when a tag is truncated since
2687	      truncation changes the semantic meaning of the tag.

2689	      Implementations of protocols or specifications that are space
2690	      constrained but do not have a fixed limit SHOULD use the longest
2691	      possible tag in preference to truncation.

2693	      Protocols or specifications that specify limited buffer sizes for
2694	      language tags MUST allow for language tags of up to 33 characters.

2696	      Protocols or specifications that specify limited buffer sizes for
2697	      language tags SHOULD allow for language tags of at least 30
2698	      characters.  Note that RFC 4646 [RFC4646] recommended a field size
2699	      of 42 character because it included the permanently reserved (and
2700	      unused) 'extlang' production.  The current size recommendation
2701	      does not include the use of the 'extlang' field.  Protocols or
2702	      specifications that commonly use extensions or private use subtags
2703	      might wish to reserve or recommend a longer "minimum buffer" size.

2705	   The following illustration shows how the 30-character recommendation
2706	   was derived:

2708	   language      =  3 (ISO 639-2; ISO 639-1 requires 2)
2709	   script        =  5 (if not suppressed: see Section 4.1)
2710	   region        =  4 (UN M.49; ISO 3166-1 requires 3)
2711	   variant1      =  9 (needs 'language' as a prefix)
2712	   variant2      =  9 (needs 'language-variant1' as a prefix)

2714	   total         = 30 characters

2716	              Figure 7: Derivation of the Limit on Tag Length

2718	4.4.2.  Truncation of Language Tags

2720	   Truncation of a language tag alters the meaning of the tag, and thus
2721	   SHOULD be avoided.  However, truncation of language tags is sometimes
2722	   necessary due to limited buffer sizes.  Such truncation MUST NOT
2723	   permit a subtag to be chopped off in the middle or the formation of
2724	   invalid tags (for example, one ending with the "-" character).

2726	   This means that applications or protocols that truncate tags MUST do
2727	   so by progressively removing subtags along with their preceding "-"
2728	   from the right side of the language tag until the tag is short enough
2729	   for the given buffer.  If the resulting tag ends with a single-
2730	   character subtag, that subtag and its preceding "-" MUST also be
2731	   removed.  For example:

2733	   Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1
2734	   1. zh-Latn-CN-variant1-a-extend1-x-wadegile
2735	   2. zh-Latn-CN-variant1-a-extend1
2736	   3. zh-Latn-CN-variant1
2737	   4. zh-Latn-CN
2738	   5. zh-Latn
2739	   6. zh

2741	                    Figure 8: Example of Tag Truncation

2743	4.5.  Canonicalization of Language Tags

2745	   Since a particular language tag is sometimes used by many processes,
2746	   language tags SHOULD always be created or generated in a canonical
2747	   form.

2749	   A language tag is in canonical form when:

2751	   1.  The tag is well-formed according the rules in Section 2.1 and
2752	       Section 2.2.

2754	   2.  Subtags of type 'Region' that have a Preferred-Value mapping in
2755	       the IANA registry (see Section 3.1) MUST be replaced with their
2756	       mapped value.

2758	   3.  Redundant or grandfathered tags that have a Preferred-Value
2759	       mapping in the IANA registry (see Section 3.1) MUST be replaced
2760	       with their mapped value.  These items either are deprecated
2761	       mappings created before the adoption of this document (such as
2762	       the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are
2763	       the result of later registrations or additions to this document
2764	       (for example, "zh-hakka" was deprecated in favor of the ISO 639-3
2765	       code 'hak' when this document was adopted).

2767	   4.  Other subtags that have a Preferred-Value mapping in the IANA
2768	       registry (see Section 3.1) MUST be replaced with their mapped
2769	       value.  These items consist entirely of clerical corrections to
2770	       ISO 639-1 in which the deprecated subtags have been maintained
2771	       for compatibility purposes.

2773	   5.  If more than one extension subtag sequence exists, the extension
2774	       sequences are ordered into case-insensitive ASCII order by
2775	       singleton subtag.

2777	   Example: The language tag "en-a-aaa-b-ccc-bbb-x-xyz" is in canonical
2778	   form, while "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed and potentially
2779	   valid (extensions 'a' and 'b' are not defined as of the publication
2780	   of this document) but not in canonical form (the extensions are not
2781	   in alphabetical order).

2783	   Example: The language tag "en-BU" (English as used in Burma) is not
2784	   canonical because the 'BU' subtag has a canonical mapping to 'MM'
2785	   (Myanmar), although the tag "en-BU" maintains its validity.

2787	   Canonicalization of language tags does not imply anything about the
2788	   use of upper or lowercase letters when processing or comparing
2789	   subtags (and as described in Section 2.1).  All comparisons MUST be
2790	   performed in a case-insensitive manner.

2792	   When performing canonicalization of language tags, processors MAY
2793	   regularize the case of the subtags (that is, this process is
2794	   OPTIONAL), following the case used in the registry.  Note that this
2795	   corresponds to the following casing rules: uppercase all non-initial
2796	   two-letter subtags; titlecase all non-initial four-letter subtags;
2797	   lowercase everything else.

2799	   Note: Case folding of ASCII letters in certain locales, unless
2800	   carefully handled, sometimes produces non-ASCII character values.
2801	   The Unicode Character Database file "SpecialCasing.txt" defines the
2802	   specific cases that are known to cause problems with this.  In
2803	   particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is
2804	   uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE).
2805	   Implementers SHOULD specify a locale-neutral casing operation to
2806	   ensure that case folding of subtags does not produce this value,
2807	   which is illegal in language tags.  For example, if one were to
2808	   uppercase the region subtag 'in' using Turkish locale rules, the
2809	   sequence U+0130 U+004E would result instead of the expected 'IN'.

2811	   Note: if the field 'Deprecated' appears in a registry record without
2812	   an accompanying 'Preferred-Value' field, then that tag or subtag is
2813	   deprecated without a replacement.  Validating processors SHOULD NOT
2814	   generate tags that include these values, although the values are
2815	   canonical when they appear in a language tag.

2817	   An extension MUST define any relationships that exist between the
2818	   various subtags in the extension and thus MAY define an alternate
2819	   canonicalization scheme for the extension's subtags.  Extensions MAY
2820	   define how the order of the extension's subtags are interpreted.  For
2821	   example, an extension could define that its subtags are in canonical
2822	   order when the subtags are placed into ASCII order: that is, "en-a-
2823	   aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa".  Another extension might
2824	   define that the order of the subtags influences their semantic
2825	   meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
2826	   aaa-bbb-ccc").  However, extension specifications SHOULD be designed
2827	   so that they are tolerant of the typical processes described in
2828	   Section 3.7.

2830	4.6.  Considerations for Private Use Subtags

2832	   Private use subtags, like all other subtags, MUST conform to the
2833	   format and content constraints in the ABNF.  Private use subtags have
2834	   no meaning outside the private agreement between the parties that
2835	   intend to use or exchange language tags that employ them.  The same
2836	   subtags MAY be used with a different meaning under a separate private
2837	   agreement.  They SHOULD NOT be used where alternatives exist and
2838	   SHOULD NOT be used in content or protocols intended for general use.

2840	   Private use subtags are simply useless for information exchange
2841	   without prior arrangement.  The value and semantic meaning of private
2842	   use tags and of the subtags used within such a language tag are not
2843	   defined by this document.

2845	   Subtags defined in the IANA registry as having a specific private use
2846	   meaning convey more information that a purely private use tag
2847	   prefixed by the singleton subtag 'x'.  For applications, this
2848	   additional information MAY be useful.

2850	   For example, the region subtags 'AA', 'ZZ', and in the ranges
2851	   'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166-1 private use codes)
2852	   MAY be used to form a language tag.  A tag such as "zh-Hans-XQ"
2853	   conveys a great deal of public, interchangeable information about the
2854	   language material (that it is Chinese in the simplified Chinese
2855	   script and is suitable for some geographic region 'XQ').  While the
2856	   precise geographic region is not known outside of private agreement,
2857	   the tag conveys far more information than an opaque tag such as
2858	   "x-someLang", which contains no information about the language subtag
2859	   or script subtag outside of the private agreement.

2861	   However, in some cases content tagged with private use subtags MAY
2862	   interact with other systems in a different and possibly unsuitable
2863	   manner compared to tags that use opaque, privately defined subtags,
2864	   so the choice of the best approach sometimes depends on the
2865	   particular domain in question.

2867	5.  IANA Considerations

2869	   This section deals with the processes and requirements necessary for
2870	   IANA to undertake to maintain the subtag and extension registries as
2871	   defined by this document and in accordance with the requirements of
2872	   [RFC2434].

2874	   The impact on the IANA maintainers of the two registries defined by
2875	   this document will be a small increase in the frequency of new
2876	   entries or updates.  IANA also is required to create a new mailing
2877	   list (described below in Section 5.1) to announce registry changes
2878	   and updates.

2880	5.1.  Language Subtag Registry

2882	   Upon adoption of this document, IANA will update the registry using
2883	   instructions and content provided in a companion document:
2884	   [registry-update].  The criteria and process for selecting the
2885	   updated set of records are described in that document.  The updated
2886	   set of records represents no impact on IANA, since the work to create
2887	   it will be performed externally.

2889	   Future work on the Language Subtag Registry includes the following
2890	   activities:

2892	   o  Inserting or replacing whole records.  These records are
2893	      preformatted for IANA by the Language Subtag Reviewer, as
2894	      described in Section 3.3.

2896	   o  Archiving and making publicly available the registration forms.

2898	   o  Announcing each updated version of the registry on the
2899	      "ietf-languages-announcements@iana.org" mailing list.

2901	   Each registration form sent to IANA contains a single record for
2902	   incorporation into the registry.  The form will be sent to
2903	   "iana@iana.org" by the Language Subtag Reviewer.  It will have a
2904	   subject line indicating whether the enclosed form represents an
2905	   insertion of a new record (indicated by the word "INSERT" in the
2906	   subject line) or a replacement of an existing record (indicated by
2907	   the word "MODIFY" in the subject line).  At no time can a record be
2908	   deleted from the registry.

2910	   IANA will extract the record from the form and place the inserted or
2911	   modified record into the appropriate section of the language subtag
2912	   registry, grouping the records by their 'Type' field.  Inserted
2913	   records can be placed anywhere in the appropriate section; there is
2914	   no guarantee of the order of the records beyond grouping them
2915	   together by 'Type'.  Modified records overwrite the record they
2916	   replace.

2918	   Whenever an entry is created or modified in the registry, the 'File-
2919	   Date' record at the start of the registry is updated to reflect the
2920	   most recent modification date in the [RFC3339] "full-date" format:
2921	   included in any request to insert or modify records will be a new
2922	   File-Date record indicating the acceptance date of the record.  This
2923	   record is to be placed first in the registry, replacing the existing
2924	   File-Date record.  In the event that the File-Date record present in
2925	   the registry has a later date than the record being inserted or
2926	   modified, then the latest (most recent) record will be preserved.
2927	   IANA should attempt to process multiple registration requests in
2928	   order according to the File-Date in the form, since one registration
2929	   could otherwise cause a more recent change to be overwritten.

2931	   The updated registry file MUST use the UTF-8 character encoding and
2932	   IANA MUST check the registry file for proper encoding.  Non-ASCII
2933	   characters can be sent to IANA by attaching the registration form to
2934	   the email message or by using various encodings in the mail message
2935	   body (UTF-8 is recommended).  IANA will verify any unclear or
2936	   corrupted characters with the Language Subtag Reviewer prior to
2937	   posting the updated registry.

2939	   IANA will also archive and make publicly available from
2940	   "http://www.iana.org/assignments/lang-subtags-templates/" each
2941	   registration form.  Note that multiple registrations can pertain to
2942	   the same record in the registry.

2944	   Developers who are dependent upon the language subtag registry
2945	   sometimes would like to be informed of changes in the registry so
2946	   that they can update their implementations.  When any change is made
2947	   to the language subtag registry, IANA will send an announcement
2948	   message to "ietf-languages-announcements@iana.org" (a self-
2949	   subscribing list that only IANA can post to).

2951	5.2.  Extensions Registry

2953	   The Language Tag Extensions Registry can contain at most 35 records
2954	   and thus changes to this registry are expected to be very infrequent.

2956	   Future work by IANA on the Language Tag Extensions Registry is
2957	   limited to two cases.  First, the IESG MAY request that new records
2958	   be inserted into this registry from time to time.  These requests
2959	   MUST include the record to insert in the exact format described in
2960	   Section 3.7.  In addition, there MAY be occasional requests from the
2961	   maintaining authority for a specific extension to update the contact
2962	   information or URLs in the record.  These requests MUST include the
2963	   complete, updated record.  IANA is not responsible for validating the
2964	   information provided, only that it is properly formatted.  It should
2965	   reasonably be seen to come from the maintaining authority named in
2966	   the record present in the registry.

2968	6.  Security Considerations

2970	   Language tags used in content negotiation, like any other information
2971	   exchanged on the Internet, might be a source of concern because they
2972	   might be used to infer the nationality of the sender, and thus
2973	   identify potential targets for surveillance.

2975	   This is a special case of the general problem that anything sent is
2976	   visible to the receiving party and possibly to third parties as well.
2977	   It is useful to be aware that such concerns can exist in some cases.

2979	   The evaluation of the exact magnitude of the threat, and any possible
2980	   countermeasures, is left to each application protocol (see BCP 72
2981	   [RFC3552] for best current practice guidance on security threats and
2982	   defenses).

2984	   The language tag associated with a particular information item is of
2985	   no consequence whatsoever in determining whether that content might
2986	   contain possible homographs.  The fact that a text is tagged as being
2987	   in one language or using a particular script subtag provides no
2988	   assurance whatsoever that it does not contain characters from scripts
2989	   other than the one(s) associated with or specified by that language
2990	   tag.

2992	   Since there is no limit to the number of variant, private use, and
2993	   extension subtags, and consequently no limit on the possible length
2994	   of a tag, implementations need to guard against buffer overflow
2995	   attacks.  See Section 4.4 for details on language tag truncation,
2996	   which can occur as a consequence of defenses against buffer overflow.

2998	   Although the specification of valid subtags for an extension (see
2999	   Section 3.7) MUST be available over the Internet, implementations
3000	   SHOULD NOT mechanically depend on it being always accessible, to
3001	   prevent denial-of-service attacks.

3003	7.  Character Set Considerations

3005	   The syntax in this document requires that language tags use only the
3006	   characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
3007	   character sets, so the composition of language tags should not have
3008	   any character set issues.

3010	   Rendering of characters based on the content of a language tag is not
3011	   addressed in this memo.  Historically, some languages have relied on
3012	   the use of specific character sets or other information in order to
3013	   infer how a specific character should be rendered (notably this
3014	   applies to language- and culture-specific variations of Han
3015	   ideographs as used in Japanese, Chinese, and Korean).  When language
3016	   tags are applied to spans of text, rendering engines sometimes use
3017	   that information in deciding which font to use in the absence of
3018	   other information, particularly where languages with distinct writing
3019	   traditions use the same characters.

3021	8.  Changes from RFC 4646

3023	   The main goal for this revision of this document was to incorporate
3024	   two new parts of ISO 639 (ISO 639-3 and ISO 639-5) and their
3025	   attendant sets of language codes into the IANA Language Subtag
3026	   Registry.  This permits the identification of many more languages and
3027	   dialects than previously supported.

3029	   The specific changes in this document to meet these goals are:

3031	   o  Defines the incorporation of ISO 639-3 and ISO 639-5 codes for use
3032	      as primary and extended language subtags.  It also permanently
3033	      reserves and disallows the use of additional 'extlang' subtags.
3034	      The changes necessary to achieve this were:

3036	      *  Modified the ABNF comments.

3038	      *  Updated various registration and stability requirements
3039	         sections to reference ISO 639-3 and ISO 639-5 in addition to
3040	         ISO 639-1 and ISO 639-2.

3042	      *  Edited the text to eliminate references to extended language
3043	         subtags where they are no longer used.

3045	      *  Explained the change in the section on extended language
3046	         subtags.

3048	   o  Changed the ABNF related to grandfathered tags.  The irregular
3049	      tags are now listed.  Well-formed grandfathered tags are now
3050	      described by the 'langtag' production and the 'grandfathered'
3051	      production was removed as a result.  Also: added description of
3052	      both types of grandfathered tags to Section 2.2.8.

3054	   o  Added the paragraph on "collections" to Section 4.1.

3056	   o  Changed the capitalization rules for 'Tag' fields in Section 3.1.

3058	   o  Split section 3.1 up into subsections.

3060	   o  Modified section 3.5 to allow Suppress-Script fields to be added,
3061	      modified, or removed via the registration process.  This was an
3062	      erratum from RFC 4646.

3064	   o  Modified examples that used region code 'CS' (formerly Serbia and
3065	      Montenegro) to use 'RS' (Serbia) instead.

3067	   o  Modified the rules for creating and maintaining record
3068	      'Description' fields to prevent duplicates, including inverted
3069	      duplicates.

3071	   o  Removed the lengthy description of why RFC 4646 was created from
3072	      this section, which also caused the removal of the reference to
3073	      XML Schema.

3075	   o  Modified the text in section 2.1 to place more emphasis on the
3076	      fact that language tags are not case sensitive.

3078	   o  Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS"
3079	      and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the
3080	      Suppress-Script on 'Latn' with 'fr'.

3082	   o  Changed the requirements for well-formedness to make singleton
3083	      repetition checking optional (it is required for validity
3084	      checking) in Section 2.2.9.

3086	   o  Changed the text in Section 2.2.9 referring to grandfathered
3087	      checking to note that the list is now included in the ABNF.

3089	   o  Modified and added text to Section 3.2.  The job description was
3090	      placed first.  A note was added making clear that the Language
3091	      Subtag Reviewer may delegate various non-critical duties,
3092	      including list moderation.  Finally, additional text was added to
3093	      make the appointment process clear and to clarify that decisions
3094	      and performance of the reviewer are appealable.

3096	   o  Added text to Section 3.5 clarifying that the ietf-languages list
3097	      is operated by whomever the IESG appoints.

3099	   o  Added text to Section 3.1.4 clarifying that the first Description
3100	      in a 'language' record matches the corresponding Reference Name
3101	      for the language in ISO 639-3.

3103	   o  Modified Section 2.2.9 to define classes of conformance related to
3104	      specific tags (formerly 'well-formed' and 'valid' referred to
3105	      implementations).  Notes were added about the removal of 'extlang'
3106	      from the ABNF provided in RFC 4646, allowing for well-formedness
3107	      using this older definition.  Reference to RFC 3066 well-
3108	      formedness was also added.

3110	   o  Added text to the end of Section 3.1.2 noting that future versions
3111	      of this document might add new field types to the Registry format
3112	      and recommending that implementations ignore any unrecognized
3113	      fields.

3115	   o  Added text about what the lack of a Suppress-Script field means in
3116	      a record to Section 3.1.8.

3118	   o  Added text allowing the correction of misspellings and typographic
3119	      errors to Section 3.1.4.

3121	   o  Added text to Section 3.1.7 disallowing Prefix field conflicts
3122	      (such as circular prefix references).

3124	   o  Modified text in Section 3.5 to require the subtag reviewer to
3125	      announce his/her decision (or extension) following the two-week
3126	      period.  Also clarified that any decision or failure to decide can
3127	      be appealed.

3129	   o  Modified text in Section 4.1 to include the (heretofore anecdotal)
3130	      guiding principle of tag choice, and clarifying the non-use of
3131	      script subtags in non-written applications.  Also updated examples
3132	      in this section to use Chamic languages as an example of language
3133	      collections.

3135	   o  Prohibited multiple use of the same variant in a tag (i.e. "de-
3136	      1901-1901").  Previously this was only a recommendation
3137	      ("SHOULD").

3139	   o  Removed inappropriate [RFC2119] language from the illustration in
3140	      Section 4.4.1.

3142	   o  Replaced the example of deprecating "zh-gouyu" with "zh-
3143	      hakka"->"hak" in Section 4.5, noting that it was this document
3144	      that caused the change.

3146	   o  Replaced the section in Section 4.1 dealing with "mul"/"und" to
3147	      include the subtags 'zxx' and 'mis', as well as the tag
3148	      "i-default".  A normative reference to RFC 2277 was added, along
3149	      with an informative reference to MARC21.

3151	   o  Added text to Section 3.5 clarifying that any modifications of a
3152	      registration request must be sent to the ietf-languages list
3153	      before submission to IANA.

3155	   o  Changed the ABNF for the record-jar format from using the LWSP
3156	      production to use a folding whitespace production similar to obs-
3157	      FWS in [RFC5234].  This effectively prevents unintentional blank
3158	      lines inside a field.

3160	   o  Clarified and revised text in Section 3.3, Section 3.5, and
3161	      Section 5.1 to clarify that the Language Subtag Reviewer sends the
3162	      complete registration forms to IANA, that IANA extracts the record
3163	      from the form, and that the forms must also be archived separately
3164	      from the registry.

3166	   o  Added text to Section 5 requiring IANA to send an announcement to
3167	      an ietf-languages-announce list whenever the registry is updated.

3169	   o  Modification of the registry to use UTF-8 as its character
3170	      encoding.  This also entails additional instructions to IANA and
3171	      the Language Subtag Reviewer in the registration process.

3173	   o  Modified the rules in Section 2.2.4 so that "exceptionally
3174	      reserved" ISO 3166-1 codes other than 'UK' were included into the
3175	      registry.  In particular, this allows the code 'EU' (European
3176	      Union) to be used to form language tags or (more commonly) for
3177	      applications that use the registry for region codes to reference
3178	      this subtag.

3180	   o  Modified the IANA considerations section (Section 5) to remove
3181	      unnecessary normative [RFC2119] language.

3183	9.  References

3185	9.1.  Normative References

3187	   [ISO15924]
3188	              International Organization for Standardization, "ISO
3189	              15924:2004. Information and documentation -- Codes for the
3190	              representation of names of scripts", January 2004.

3192	   [ISO3166-1]
3193	              International Organization for Standardization, "ISO 3166-
3194	              1:2006. Codes for the representation of names of countries
3195	              and their subdivisions -- Part 1: Country codes",
3196	              November 2006.

3198	   [ISO639-1]
3199	              International Organization for Standardization, "ISO 639-
3200	              1:2002. Codes for the representation of names of languages
3201	              -- Part 1: Alpha-2 code", 2002.

3203	   [ISO639-2]
3204	              International Organization for Standardization, "ISO 639-
3205	              2:1998. Codes for the representation of names of languages
3206	              -- Part 2: Alpha-3 code, first edition", 1998.

3208	   [ISO639-3]
3209	              International Organization for Standardization, "ISO 639-
3210	              3:2007. Codes for the representation of names of languages
3211	              -- Part 3: Alpha-3 code for comprehensive coverage of
3212	              languages", 2007.

3214	   [ISO639-5]
3215	              International Organization for Standardization, "ISO 639-
3216	              5:1998. Codes for the representation of names of languages
3217	              -- Part 5: Alpha-3 code for language families and groups",
3218	              May 2008.

3220	   [ISO646]   International Organization for Standardization, "ISO/IEC
3221	              646:1991, Information technology -- ISO 7-bit coded
3222	              character set for information interchange.", 1991.

3224	   [RFC2026]  Bradner, S., "The Internet Standards Process -- Revision
3225	              3", BCP 9, RFC 2026, October 1996.

3227	   [RFC2028]  Hovey, R. and S. Bradner, "The Organizations Involved in
3228	              the IETF Standards Process", BCP 11, RFC 2028,
3229	              October 1996.

3231	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
3232	              Requirement Levels", BCP 14, RFC 2119, March 1997.

3234	   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
3235	              Languages", BCP 18, RFC 2277, January 1998.

3237	   [RFC2434]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
3238	              IANA Considerations Section in RFCs", BCP 26, RFC 2434,
3239	              October 1998.

3241	   [RFC2860]  Carpenter, B., Baker, F., and M. Roberts, "Memorandum of
3242	              Understanding Concerning the Technical Work of the
3243	              Internet Assigned Numbers Authority", RFC 2860, June 2000.

3245	   [RFC3339]  Klyne, G., Ed. and C. Newman, "Date and Time on the
3246	              Internet: Timestamps", RFC 3339, July 2002.

3248	   [RFC4645]  Ewell, D., "Initial Language Subtag Registry", RFC 4645,
3249	              September 2006.

3251	   [RFC4647]  Phillips, A. and M. Davis, "Matching of Language Tags",
3252	              BCP 47, RFC 4647, September 2006.

3254	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
3255	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

3257	   [UAX14]    Freitag, A., "Unicode Standard Annex #14: Line Breaking
3258	              Properties", August 2006,
3259	              <http://www.unicode.org/reports/tr14/>.

3261	   [UN_M.49]  Statistics Division, United Nations, "Standard Country or
3262	              Area Codes for Statistical Use", UN Standard Country or
3263	              Area Codes for Statistical Use, Revision 4 (United Nations
3264	              publication, Sales No. 98.XVII.9, June 1999.

3266	9.2.  Informative References

3268	   [RFC1766]  Alvestrand, H., "Tags for the Identification of
3269	              Languages", RFC 1766, March 1995.

3271	   [RFC2047]  Moore, K., "MIME (Multipurpose Internet Mail Extensions)
3272	              Part Three: Message Header Extensions for Non-ASCII Text",
3273	              RFC 2047, November 1996.

3275	   [RFC2231]  Freed, N. and K. Moore, "MIME Parameter Value and Encoded
3276	              Word Extensions:
3277	              Character Sets, Languages, and Continuations", RFC 2231,
3278	              November 1997.

3280	   [RFC2781]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO
3281	              10646", RFC 2781, February 2000.

3283	   [RFC3066]  Alvestrand, H., "Tags for the Identification of
3284	              Languages", RFC 3066, January 2001.

3286	   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
3287	              Text on Security Considerations", BCP 72, RFC 3552,
3288	              July 2003.

3290	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
3291	              10646", STD 63, RFC 3629, November 2003.

3293	   [RFC4646]  Phillips, A. and M. Davis, "Tags for Identifying
3294	              Languages", BCP 47, RFC 4646, September 2006.

3296	   [UTS35]    Davis, M., "Unicode Technical Standard #35: Locale Data
3297	              Markup Language (LDML)", December 2007,
3298	              <http://www.unicode.org/reports/tr35/>.

3300	   [Unicode]  Unicode Consortium, "The Unicode Consortium. The Unicode
3301	              Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003.
3302	              ISBN 0-321-49081-0)", January 2007.

3304	   [iso639.prin]
3305	              ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory
3306	              Committee:  Working principles for ISO 639 maintenance",
3307	              March 2000,
3308	              <http://www.loc.gov/standards/iso639-2/
3309	              iso639jac_n3r.html>.

3311	   [record-jar]
3312	              Raymond, E., "The Art of Unix Programming", 2003,
3313	              <urn:isbn:0-13-142901-9>.

3315	   [registry-update]
3316	              Ewell, D., Ed., "Update to the Language Subtag Registry",
3317	              September 2006, <http://www.ietf.org/internet-drafts/
3318	              draft-ietf-ltru-initial-registry-00.txt>.

3320	Appendix A.  Acknowledgements

3322	   Any list of contributors is bound to be incomplete; please regard the
3323	   following as only a selection from the group of people who have
3324	   contributed to make this document what it is today.

3326	   The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the
3327	   precursors of this document, made enormous contributions directly or
3328	   indirectly to this document and are generally responsible for the
3329	   success of language tags.

3331	   The following people contributed to this document:

3333	   Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan,
3334	   Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion
3335	   Gunn, Kent Karlsson, Chris Newman, Randy Presuhn, Stephen Silver, and
3336	   many, many others.

3338	   Very special thanks must go to Harald Tveit Alvestrand, who
3339	   originated RFCs 1766 and 3066, and without whom this document would
3340	   not have been possible.

3342	   Special thanks go to Michael Everson, who served as the Language Tag
3343	   Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as
3344	   the Language Subtag Reviewer since the adoption of RFC 4646.

3346	   Special thanks also to Doug Ewell, for his production of the first
3347	   complete subtag registry, his work to support and maintain new
3348	   registrations, and his careful editorship of both RFC 4645 and
3349	   [registry-update].

3351	Appendix B.  Examples of Language Tags (Informative)

3353	   Simple language subtag:

3355	      de (German)

3357	      fr (French)

3359	      ja (Japanese)

3361	      i-enochian (example of a grandfathered tag)

3363	   Language subtag plus Script subtag:

3365	      zh-Hant (Chinese written using the Traditional Chinese script)

3367	      zh-Hans (Chinese written using the Simplified Chinese script)

3369	      sr-Cyrl (Serbian written using the Cyrillic script)

3371	      sr-Latn (Serbian written using the Latin script)

3373	   Language-Script-Region:

3375	      zh-Hans-CN (Chinese written using the Simplified script as used in
3376	      mainland China)

3378	      sr-Latn-RS (Serbian written using the Latin script as used in
3379	      Serbia)

3381	   Language-Variant:

3383	      sl-rozaj (Resian dialect of Slovenian)

3385	      sl-nedis (Nadiza dialect of Slovenian)

3387	   Language-Region-Variant:

3389	      de-CH-1901 (German as used in Switzerland using the 1901 variant
3390	      [orthography])

3392	      sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)

3394	   Language-Script-Region-Variant:

3396	      hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as
3397	      used in Italy)

3399	   Language-Region:

3401	      de-DE (German for Germany)

3403	      en-US (English as used in the United States)

3405	      es-419 (Spanish appropriate for the Latin America and Caribbean
3406	      region using the UN region code)

3408	   Private use subtags:

3410	      de-CH-x-phonebk

3412	      az-Arab-x-AZE-derbend

3414	   Private use registry values:

3416	      x-whatever (private use using the singleton 'x')

3418	      qaa-Qaaa-QM-x-southern (all private tags)

3420	      de-Qaaa (German, with a private script)

3422	      sr-Latn-QM (Serbian, Latin-script, private region)

3424	      sr-Qaaa-RS (Serbian, private script, for Serbia)

3426	   Tags that use extensions (examples ONLY: extensions MUST be defined
3427	   by revision or update to this document or by RFC):

3429	      en-US-u-islamCal

3431	      zh-CN-a-myExt-x-private

3433	      en-a-myExt-b-another

3435	   Some Invalid Tags:

3437	      de-419-DE (two region tags)

3439	      a-DE (use of a single-character subtag in primary position; note
3440	      that there are a few grandfathered tags that start with "i-" that
3441	      are valid)
3442	      ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter
3443	      prefix)

3445	Appendix C.  Examples of Registration Forms
3446	   LANGUAGE SUBTAG REGISTRATION FORM
3447	   1. Name of requester: Han Steenwijk
3448	   2. E-mail address of requester: han.steenwijk @ unipd.it
3449	   3. Record Requested:

3451	   Type:        variant
3452	   Subtag:      biske
3453	   Description: The San Giorgio dialect of Resian
3454	   Description: The Bila dialect of Resian
3455	   Prefix:      sl-rozaj
3456	   Comments:    The dialect of San Giorgio/Bila is one of the
3457	      four major local dialects of Resian

3459	   4. Intended meaning of the subtag: The local variety of Resian as
3460	   spoken in San Giorgio/Bila

3462	   5. Reference to published description of the language (book or
3463	   article):
3464	    -- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich
3465	   govorov, Varsava - Peterburg: Vende - Kozancikov, 1875.

3467	LANGUAGE SUBTAG REGISTRATION FORM
3468	1. Name of requester: Jaska Zedlik
3469	2. E-mail address of requester: jz53 @ zedlik.com
3470	3. Record Requested:

3472	Type:   variant
3473	Subtag: tarask
3474	Description: Belarusian in Taraskievica orthography
3475	Prefix: be
3476	Comments: The subtag represents Branislau Taraskievic's Belarusian
3477	  orthography as published in "Bielaruski klasycny pravapis" by Juras
3478	  Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka
3479	  (Vilnia-Miensk 2005).

3481	4. Intended meaning of the subtag:

3483	The subtag is intended to represent the Belarusian orthography as
3484	published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk
3485	Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005).

3487	5. Reference to published description of the language (book or article):

3489	Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd.
3490	"Bielaruskaha kamitetu", 1929, 5th edition.

3492	Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier.
3493	Bielaruski klasycny pravapis. Vilnia-Miensk, 2005.

3495	6. Any other relevant information:

3497	Belarusian in Taraskievica orthography became widely used, especially in
3498	Belarusian-speaking Internet segment, but besides this some books and
3499	newspapers are also printed using this orthography of Belarusian.

3501	Authors' Addresses

3503	   Addison Phillips (editor)
3504	   Lab126

3506	   Email: addison@inter-locale.com
3507	   URI:   http://www.inter-locale.com

3509	   Mark Davis (editor)
3510	   Google

3512	   Email: mark.davis@google.com

3514	Full Copyright Statement

3516	   Copyright (C) The IETF Trust (2008).

3518	   This document is subject to the rights, licenses and restrictions
3519	   contained in BCP 78, and except as set forth therein, the authors
3520	   retain all their rights.

3522	   This document and the information contained herein are provided on an
3523	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
3524	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
3525	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
3526	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
3527	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
3528	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

3530	Intellectual Property

3532	   The IETF takes no position regarding the validity or scope of any
3533	   Intellectual Property Rights or other rights that might be claimed to
3534	   pertain to the implementation or use of the technology described in
3535	   this document or the extent to which any license under such rights
3536	   might or might not be available; nor does it represent that it has
3537	   made any independent effort to identify any such rights.  Information
3538	   on the procedures with respect to rights in RFC documents can be
3539	   found in BCP 78 and BCP 79.

3541	   Copies of IPR disclosures made to the IETF Secretariat and any
3542	   assurances of licenses to be made available, or the result of an
3543	   attempt made to obtain a general license or permission for the use of
3544	   such proprietary rights by implementers or users of this
3545	   specification can be obtained from the IETF on-line IPR repository at
3546	   http://www.ietf.org/ipr.

3548	   The IETF invites any interested party to bring to its attention any
3549	   copyrights, patents or patent applications, or other proprietary
3550	   rights that may cover technology that may be required to implement
3551	   this standard.  Please address the information to the IETF at
3552	   ietf-ipr@ietf.org.