idnits 2.17.1 

draft-ietf-ltru-4646bis-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 2973.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2984.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2991.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2997.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document obsoletes RFC4646, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 10, 2007) is 6196 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-3'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO646'

  ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281)

  ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226)

  ** Downref: Normative reference to an Informational RFC: RFC 2860

  ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234)

  ** Downref: Normative reference to an Informational RFC: RFC 4645

  -- Obsolete informational reference (is this intentional?): RFC 1766
     (Obsoleted by RFC 3066, RFC 3282)

  -- Obsolete informational reference (is this intentional?): RFC 3066
     (Obsoleted by RFC 4646, RFC 4647)

  -- Obsolete informational reference (is this intentional?): RFC 4646
     (Obsoleted by RFC 5646)


     Summary: 6 errors (**), 0 flaws (~~), 2 warnings (==), 18 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                   A. Phillips, Ed.
3	Internet-Draft                                               Yahoo! Inc.
4	Obsoletes: 4646 (if approved)                              M. Davis, Ed.
5	Intended status: Best Current                                     Google
6	Practice                                                    May 10, 2007
7	Expires: November 11, 2007

9	                     Tags for Identifying Languages
10	                       draft-ietf-ltru-4646bis-06

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	   This Internet-Draft will expire on November 11, 2007.

37	Copyright Notice

39	   Copyright (C) The IETF Trust (2007).

41	Abstract

43	   This document describes the structure, content, construction, and
44	   semantics of language tags for use in cases where it is desirable to
45	   indicate the language used in an information object.  It also
46	   describes how to register values for use in language tags and the
47	   creation of user-defined extensions for private interchange.

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
52	   2.  The Language Tag . . . . . . . . . . . . . . . . . . . . . . .  5
53	     2.1.  Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .  5
54	     2.2.  Language Subtag Sources and Interpretation . . . . . . . .  8
55	       2.2.1.  Primary Language Subtag  . . . . . . . . . . . . . . .  9
56	       2.2.2.  Extended Language Subtags  . . . . . . . . . . . . . . 11
57	       2.2.3.  Script Subtag  . . . . . . . . . . . . . . . . . . . . 12
58	       2.2.4.  Region Subtag  . . . . . . . . . . . . . . . . . . . . 13
59	       2.2.5.  Variant Subtags  . . . . . . . . . . . . . . . . . . . 15
60	       2.2.6.  Extension Subtags  . . . . . . . . . . . . . . . . . . 16
61	       2.2.7.  Private Use Subtags  . . . . . . . . . . . . . . . . . 17
62	       2.2.8.  Grandfathered Registrations  . . . . . . . . . . . . . 18
63	       2.2.9.  Classes of Conformance . . . . . . . . . . . . . . . . 18
64	   3.  Registry Format and Maintenance  . . . . . . . . . . . . . . . 20
65	     3.1.  Format of the IANA Language Subtag Registry  . . . . . . . 20
66	       3.1.1.  File Format  . . . . . . . . . . . . . . . . . . . . . 20
67	       3.1.2.  Record Definitions . . . . . . . . . . . . . . . . . . 21
68	       3.1.3.  Subtag and Tag Fields  . . . . . . . . . . . . . . . . 23
69	       3.1.4.  Description Field  . . . . . . . . . . . . . . . . . . 24
70	       3.1.5.  Deprecated Field . . . . . . . . . . . . . . . . . . . 25
71	       3.1.6.  Preferred-Value Field  . . . . . . . . . . . . . . . . 25
72	       3.1.7.  Prefix Field . . . . . . . . . . . . . . . . . . . . . 26
73	       3.1.8.  Comments Field . . . . . . . . . . . . . . . . . . . . 27
74	       3.1.9.  Suppress-Script Field  . . . . . . . . . . . . . . . . 27
75	     3.2.  Language Subtag Reviewer . . . . . . . . . . . . . . . . . 27
76	     3.3.  Maintenance of the Registry  . . . . . . . . . . . . . . . 28
77	     3.4.  Stability of IANA Registry Entries . . . . . . . . . . . . 29
78	     3.5.  Registration Procedure for Subtags . . . . . . . . . . . . 34
79	     3.6.  Possibilities for Registration . . . . . . . . . . . . . . 37
80	     3.7.  Extensions and Extensions Registry . . . . . . . . . . . . 39
81	     3.8.  Update of the Language Subtag Registry . . . . . . . . . . 42
82	   4.  Formation and Processing of Language Tags  . . . . . . . . . . 43
83	     4.1.  Choice of Language Tag . . . . . . . . . . . . . . . . . . 43
84	     4.2.  Meaning of the Language Tag  . . . . . . . . . . . . . . . 47
85	     4.3.  Length Considerations  . . . . . . . . . . . . . . . . . . 48
86	       4.3.1.  Working with Limited Buffer Sizes  . . . . . . . . . . 48
87	       4.3.2.  Truncation of Language Tags  . . . . . . . . . . . . . 49

89	     4.4.  Canonicalization of Language Tags  . . . . . . . . . . . . 50
90	     4.5.  Considerations for Private Use Subtags . . . . . . . . . . 52
91	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 53
92	     5.1.  Language Subtag Registry . . . . . . . . . . . . . . . . . 53
93	     5.2.  Extensions Registry  . . . . . . . . . . . . . . . . . . . 54
94	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 55
95	   7.  Character Set Considerations . . . . . . . . . . . . . . . . . 56
96	   8.  Changes from RFC 4646  . . . . . . . . . . . . . . . . . . . . 57
97	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 61
98	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 61
99	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 62
100	   Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 64
101	   Appendix B.  Examples of Language Tags (Informative) . . . . . . . 65
102	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 68
103	   Intellectual Property and Copyright Statements . . . . . . . . . . 69

105	1.  Introduction

107	   Human beings on our planet have, past and present, used a number of
108	   languages.  There are many reasons why one would want to identify the
109	   language used when presenting or requesting information.

111	   A user's language preferences often need to be identified so that
112	   appropriate processing can be applied.  For example, the user's
113	   language preferences in a Web browser can be used to select Web pages
114	   appropriately.  Language preferences can also be used to select among
115	   tools (such as dictionaries) to assist in the processing or
116	   understanding of content in different languages.

118	   In addition, knowledge about the particular language used by some
119	   piece of information content might be useful or even required by some
120	   types of processing; for example, spell-checking, computer-
121	   synthesized speech, Braille transcription, or high-quality print
122	   renderings.

124	   One means of indicating the language used is by labeling the
125	   information content with an identifier or "tag".  These tags can be
126	   used to specify user preferences when selecting information content,
127	   or for labeling additional attributes of content and associated
128	   resources.

130	   Tags can also be used to indicate additional language attributes of
131	   content.  For example, indicating specific information about the
132	   dialect, writing system, or orthography used in a document or
133	   resource may enable the user to obtain information in a form that
134	   they can understand, or it can be important in processing or
135	   rendering the given content into an appropriate form or style.

137	   This document specifies a particular identifier mechanism (the
138	   language tag) and a registration function for values to be used to
139	   form tags.  It also defines a mechanism for private use values and
140	   future extension.

142	   This document replaces [RFC4646], which replaced [RFC3066] and its
143	   predecessor [RFC1766].  For a list of changes in this document, see
144	   Section 8.

146	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
147	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
148	   document are to be interpreted as described in [RFC2119].

150	2.  The Language Tag

152	   Language tags are used to help identify languages, whether spoken,
153	   written, signed, or otherwise signaled, for the purpose of
154	   communication.  This includes constructed and artificial languages,
155	   but excludes languages not intended primarily for human
156	   communication, such as programming languages.

158	2.1.  Syntax

160	   The language tag is composed of one or more parts, known as
161	   "subtags".  Each subtag consists of a sequence of alphanumeric
162	   characters.  Subtags are distinguished and separated from one another
163	   by a hyphen ("-", ABNF [RFC4234] %x2D).  A language tag consists of a
164	   "primary language" subtag and a (possibly empty) series of subsequent
165	   subtags, each of which refines or narrows the range of languages
166	   identified by the overall tag.

168	   Usually, each type of subtag is distinguished by length, position in
169	   the tag, and content: subtags can be recognized solely by these
170	   features.  The only exception to this is a fixed list of
171	   grandfathered tags registered under RFC 3066 [RFC3066].  This makes
172	   it possible to construct a parser that can extract and assign some
173	   semantic information to the subtags, even if the specific subtag
174	   values are not recognized.  Thus, a parser need not have an up-to-
175	   date copy (or any copy at all) of the subtag registry to perform most
176	   searching and matching operations.

178	   The syntax of the language tag in ABNF [RFC4234] is:

180	   Language-Tag  = langtag
181	                 / privateuse             ; private use tag
182	                 / irregular              ; tags grandfathered by rule

184	   langtag       = (language
185	                    ["-" script]
186	                    ["-" region]
187	                    *("-" variant)
188	                    *("-" extension)
189	                    ["-" privateuse])

191	   language      = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code
192	                 / 4ALPHA                 ; reserved for future use
193	                 / 5*8ALPHA               ; registered language subtag

195	   extlang       = *3("-" 3ALPHA)         ; specific ISO 639-3 codes

197	   script        = 4ALPHA                 ; ISO 15924 code

199	   region        = 2ALPHA                 ; ISO 3166 code
200	                 / 3DIGIT                 ; UN M.49 code

202	   variant       = 5*8alphanum            ; registered variants
203	                 / (DIGIT 3alphanum)

205	   extension     = singleton 1*("-" (2*8alphanum))

207	   singleton     = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT
208	                 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9"
209	                 ; Single alphanumerics
210	                 ; "x" is reserved for private use

212	   privateuse    = "x" 1*("-" (1*8alphanum))

214	   irregular     = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default"
215	                 / "i-enochian" / "i-hak" / "i-klingon" / "i-lux"
216	                 / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao"
217	                 / "i-tay" / "i-tsu" / "sgn-BE-fr" / "sgn-BE-nl"
218	                 / "sgn-CH-de"

220	   alphanum      = (ALPHA / DIGIT)       ; letters and numbers

222	                        Figure 1: Language Tag ABNF

224	   All subtags have a maximum length of eight characters and whitespace
225	   is not permitted in a language tag.  There is a subtlety in the ABNF
226	   production 'variant': variants starting with a digit MAY be four
227	   characters long, while those starting with a letter MUST be at least
228	   five characters long.  For examples of language tags, see Appendix B.

230	   Note Well: the ABNF syntax does not distinguish between upper and
231	   lowercase.  The appearance of upper and lowercase letters in the
232	   varous ABNF productions above do not affect how implementations
233	   interpret tags.  That is, the tag "I-AMI" matches the item "i-ami" in
234	   the 'irregular' production.  At all times, the tags and their
235	   subtags, including private use and extensions, are to be treated as
236	   case insensitive: there exist conventions for the capitalization of
237	   some of the subtags, but these MUST NOT be taken to carry meaning.

239	   For example:

241	   o  [ISO639-1] recommends that language codes be written in lowercase
242	      ('mn' Mongolian).

244	   o  [ISO3166-1] recommends that country codes be capitalized ('MN'
245	      Mongolia).

247	   o  [ISO15924] recommends that script codes use lowercase with the
248	      initial letter capitalized ('Cyrl' Cyrillic).

250	   However, in the tags defined by this document, the uppercase US-ASCII
251	   letters in the range 'A' through 'Z' are considered equivalent and
252	   mapped directly to their US-ASCII lowercase equivalents in the range
253	   'a' through 'z'.  Thus, the tag "mn-Cyrl-MN" is not distinct from
254	   "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of
255	   these variations conveys the same meaning: Mongolian written in the
256	   Cyrillic script as used in Mongolia.

258	   Although case distinctions do not carry meaning in language tags,
259	   consistent formatting and presentation of the tags will aid users.
260	   The format of the tags and subtags in the registry is RECOMMENDED.
261	   In this format, all non-initial two-letter subtags are uppercase, all
262	   non-initial four-letter subtags are titlecase, and all other subtags
263	   are lowercase.

265	   Note that although [RFC4234] refers to octets, the language tags
266	   described in this document are sequences of characters from the US-
267	   ASCII [ISO646] repertoire.  Language tags MAY be used in documents
268	   and applications that use other encodings, so long as these encompass
269	   the US-ASCII repertoire.  An example of this would be an XML document
270	   that uses the UTF-16LE [RFC2781] encoding of [Unicode].

272	2.2.  Language Subtag Sources and Interpretation

274	   The namespace of language tags and their subtags is administered by
275	   the Internet Assigned Numbers Authority (IANA) [RFC2860] according to
276	   the rules in Section 5 of this document.  The Language Subtag
277	   Registry maintained by IANA is the source for valid subtags: other
278	   standards referenced in this section provide the source material for
279	   that registry.

281	   Terminology used in this document:

283	   o  Tag or tags refers to a complete language tag, such as
284	      "sr-Latn-RS" or "az-Arab-IR".  Examples of tags in this document
285	      are enclosed in double-quotes ("en-US").

287	   o  Subtag refers to a specific section of a tag, delimited by hyphen,
288	      such as the subtag 'Hant' in "zh-Hant-CN".  Examples of subtags in
289	      this document are enclosed in single quotes ('Hant').

291	   o  Code or codes refers to values defined in external standards (and
292	      which are used as subtags in this document).  For example, 'Hant'
293	      is an [ISO15924] script code that was used to define the 'Hant'
294	      script subtag for use in a language tag.  Examples of codes in
295	      this document are enclosed in single quotes ('en', 'Hant').

297	   The definitions in this section apply to the various subtags within
298	   the language tags defined by this document, excepting those
299	   "grandfathered" tags defined in Section 2.2.8.

301	   Language tags are designed so that each subtag type has unique length
302	   and content restrictions.  These make identification of the subtag's
303	   type possible, even if the content of the subtag itself is
304	   unrecognized.  This allows tags to be parsed and processed without
305	   reference to the latest version of the underlying standards or the
306	   IANA registry and makes the associated exception handling when
307	   parsing tags simpler.

309	   Subtags in the IANA registry that do not come from an underlying
310	   standard can only appear in specific positions in a tag.
311	   Specifically, they can only occur as primary language subtags or as
312	   variant subtags.

314	   Note that sequences of private use and extension subtags MUST occur
315	   at the end of the sequence of subtags and MUST NOT be interspersed
316	   with subtags defined elsewhere in this document.

318	   Single-letter and single-digit subtags are reserved for current or
319	   future use.  These include the following current uses:

321	   o  The single-letter subtag 'x' is reserved to introduce a sequence
322	      of private use subtags.  The interpretation of any private use
323	      subtags is defined solely by private agreement and is not defined
324	      by the rules in this section or in any standard or registry
325	      defined in this document.

327	   o  All other single-letter subtags are reserved to introduce
328	      standardized extension subtag sequences as described in
329	      Section 3.7.

331	   The single-letter subtag 'i' is used by some grandfathered tags, such
332	   as "i-default", where it always appears in the first position and
333	   cannot be confused with an extension.

335	2.2.1.  Primary Language Subtag

337	   The primary language subtag is the first subtag in a language tag
338	   (with the exception of private use and certain grandfathered tags)
339	   and cannot be omitted.  The following rules apply to the primary
340	   language subtag:

342	   1.  All two-character primary language subtags were defined in the
343	       IANA registry according to the assignments found in the standard
344	       ISO 639 Part 1, "ISO 639-1:2002, Codes for the representation of
345	       names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using
346	       assignments subsequently made by the ISO 639-1 registration
347	       authority (RA) or governing standardization bodies.

349	   2.  All three-character primary language subtags were defined in the
350	       IANA registry according to the assignments found in either ISO
351	       639 Part 2, "ISO 639-2:1998 - Codes for the representation of
352	       names of languages -- Part 2: Alpha-3 code - edition 1"
353	       [ISO639-2], ISO 639 Part 3, "Codes for the representation of
354	       names of languages -- Part 3: Alpha-3 code for comprehensive
355	       coverage of languages" [ISO639-3], or assignments subsequently
356	       made by the relevant ISO 639 registration authorities or
357	       governing standardization bodies.

359	   3.  The subtags in the range 'qaa' through 'qtz' are reserved for
360	       private use in language tags.  These subtags correspond to codes
361	       reserved by ISO 639-2 for private use.  These codes MAY be used
362	       for non-registered primary language subtags (instead of using
363	       private use subtags following 'x-').  Please refer to Section 4.5
364	       for more information on private use subtags.

366	   4.  All four-character language subtags are reserved for possible
367	       future standardization.

369	   5.  All language subtags of 5 to 8 characters in length in the IANA
370	       registry were defined via the registration process in Section 3.5
371	       and MAY be used to form the primary language subtag.  At the time
372	       this document was created, there were no examples of this kind of
373	       subtag and future registrations of this type will be discouraged:
374	       primary languages are strongly RECOMMENDED for registration with
375	       ISO 639, and proposals rejected by ISO 639/RA-JAC will be closely
376	       scrutinized before they are registered with IANA.

378	   6.  The single-character subtag 'x' as the primary subtag indicates
379	       that the language tag consists solely of subtags whose meaning is
380	       defined by private agreement.  For example, in the tag "x-fr-CH",
381	       the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the
382	       French language or the country of Switzerland (or any other value
383	       in the IANA registry) unless there is a private agreement in
384	       place to do so.  See Section 4.5.

386	   7.  The single-character subtag 'i' is used by some grandfathered
387	       tags (see Section 2.2.8) such as "i-klingon" and "i-bnn".  (Other
388	       grandfathered tags have a primary language subtag in their first
389	       position.)

391	   8.  Other values MUST NOT be assigned to the primary subtag except by
392	       revision or update of this document.

394	   Note: For languages that have both an ISO 639-1 two-character code
395	   and a three character code assigned by either ISO 639-2 or ISO 639-3,
396	   only the ISO 639-1 two-character code is defined in the IANA
397	   registry.

399	   Note: For languages that have no ISO 639-1 two-character code and for
400	   which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
401	   (Bibliographic) codes differ, only the Terminology code is defined in
402	   the IANA registry.  At the time this document was created, all
403	   languages that had both kinds of three-character code were also
404	   assigned a two-character code; it is expected that future assignments
405	   of this nature will not occur.

407	   Note: To avoid problems with versioning and subtag choice as
408	   experienced during the transition between RFC 1766 and RFC 3066, as
409	   well as the canonical nature of subtags defined by this document, the
410	   ISO 639 Registration Authority Joint Advisory Committee (ISO 639/
411	   RA-JAC) has included the following statement in [iso639.prin]:

413	      "A language code already in ISO 639-2 at the point of freezing ISO
414	      639-1 shall not later be added to ISO 639-1.  This is to ensure
415	      consistency in usage over time, since users are directed in
416	      Internet applications to employ the alpha-3 code when an alpha-2
417	      code for that language is not available."

419	   In order to avoid instability in the canonical form of tags, if a
420	   two-character code is added to ISO 639-1 for a language for which a
421	   three-character code was already included in either ISO 639-2 or ISO
422	   639-3, the two-character code MUST NOT be registered.  See
423	   Section 3.4.

425	   For example, if some content were tagged with 'haw' (Hawaiian), which
426	   currently has no two-character code, the tag would not be invalidated
427	   if ISO 639-1 were to assign a two-character code to the Hawaiian
428	   language at a later date.

430	   Note: An example of independent primary language subtag registration
431	   might include: one of the grandfathered IANA registrations is
432	   "i-enochian".  The subtag 'enochian' could be registered in the IANA
433	   registry as a primary language subtag (assuming that ISO 639 does not
434	   register this language first), making tags such as "enochian-AQ" and
435	   "enochian-Latn" valid.

437	2.2.2.  Extended Language Subtags

439	   Extended language subtags are used to identify languages that are
440	   encompassed by a "macrolanguage".  ISO 639-3 defines certain
441	   languages to be "macrolanguages"; that is, they are groups of very
442	   closely related languages which are treated as a single language in
443	   certain contexts.  In order to improve matching behavior and tagging
444	   consistency, each language encompassed by a ISO 639-3 macrolanguage
445	   is represented in the IANA registry using an extended language
446	   subtag, provided that it is not already represented using a language
447	   subtag.  The following rules apply to the extended language subtags:

449	   1.  These subtags were defined in the IANA registry according to
450	       assignments found in ISO 639 Part 3.

452	   2.  A sequence of up to three extended language subtags MAY appear in
453	       a language tag.  This sequence MUST follow the primary language
454	       subtag and precede any other subtags.

456	   3.  Each extended language subtag MUST only be used with the exact
457	       sequence of subtags that appears in the 'Prefix' field in its
458	       registry record.

460	   4.  Other values MUST NOT be assigned to the extended language subtag
461	       except by revision or update of this document.

463	   Extended language subtag records MUST include exactly one 'Prefix'
464	   field indicating an appropriate subtag or sequence of subtags for
465	   that extended language subtag.

467	   For example, the 'gan' and 'cmn' subtags represent the languages Gan
468	   Chinese and Mandarin Chinese.  Each is encompassed by the
469	   macrolanguage 'zh' (Chinese).  Therefore, they both have the prefix
470	   "zh" in their registry records.  Consequently, Gan Chinese is
471	   represented as "zh-gan" and Mandarin Chinese as "zh-cmn".  The
472	   language subtag 'zh' can still be used without an extended language
473	   subtag to label a resource as some unspecified variety of Chinese
474	   (which in practice will usually be Mandarin, the dominant variety of
475	   Chinese, but might also be some other variety).

477	   Now suppose that, in the future, the ISO 639-3 Registration Authority
478	   were to decide that Gan Chinese is actually two different closely
479	   related languages: it might reclassify 'gan' as a macrolanguage and
480	   introduce two new code elements.  In that case, these code elements
481	   would be added to the IANA registry as extended language subtags with
482	   prefixes of "zh-gan".  No change would be made to the registry record
483	   for 'gan'.

485	2.2.3.  Script Subtag

487	   Script subtags are used to indicate the script or writing system
488	   variations that distinguish the written forms of a language or its
489	   dialects.  The following rules apply to the script subtags:

491	   1.  All four-character subtags were defined according to
492	       [ISO15924]--"Codes for the representation of the names of
493	       scripts": alpha-4 script codes, or subsequently assigned by the
494	       ISO 15924 maintenance agency or governing standardization bodies,
495	       denoting the script or writing system used in conjunction with
496	       this language.

498	   2.  Script subtags MUST immediately follow the primary language
499	       subtag and all extended language subtags and MUST occur before
500	       any other type of subtag described below.

502	   3.  The script subtags 'Qaaa' through 'Qabx' are reserved for private
503	       use in language tags.  These subtags correspond to codes reserved
504	       by ISO 15924 for private use.  These codes MAY be used for non-
505	       registered script values.  Please refer to Section 4.5 for more
506	       information on private use subtags.

508	   4.  Script subtags MUST NOT be registered using the process in
509	       Section 3.5 of this document.  Variant subtags MAY be considered
510	       for registration for that purpose.

512	   5.  There MUST be at most one script subtag in a language tag, and
513	       the script subtag SHOULD be omitted when it adds no
514	       distinguishing value to the tag or when the primary language
515	       subtag's record includes a Suppress-Script field listing the
516	       applicable script subtag.

518	   Example: "sr-Latn" represents Serbian written using the Latin script.

520	2.2.4.  Region Subtag

522	   Region subtags are used to indicate linguistic variations associated
523	   with or appropriate to a specific country, territory, or region.
524	   Typically, a region subtag is used to indicate regional dialects or
525	   usage, or region-specific spelling conventions.  A region subtag can
526	   also be used to indicate that content is expressed in a way that is
527	   appropriate for use throughout a region, for instance, Spanish
528	   content tailored to be useful throughout Latin America.

530	   The following rules apply to the region subtags:

532	   1.  Region subtags MUST follow any language, extended language, or
533	       script subtags and MUST precede all other subtags.

535	   2.  All two-character subtags following the primary subtag were
536	       defined in the IANA registry according to the assignments found
537	       in [ISO3166-1] ("Codes for the representation of names of
538	       countries and their subdivisions -- Part 1: Country codes") using
539	       the list of alpha-2 country codes, or using assignments
540	       subsequently made by the ISO 3166 maintenance agency or governing
541	       standardization bodies.

543	   3.  All three-character subtags consisting of digit (numeric)
544	       characters following the primary subtag were defined in the IANA
545	       registry according to the assignments found in UN Standard
546	       Country or Area Codes for Statistical Use [UN_M.49] or
547	       assignments subsequently made by the governing standards body.
548	       Note that not all of the UN M.49 codes are defined in the IANA
549	       registry.  The following rules define which codes are entered
550	       into the registry as valid subtags:

552	       A.  UN numeric codes assigned to 'macro-geographical
553	           (continental)' or sub-regions MUST be registered in the
554	           registry.  These codes are not associated with an assigned
555	           ISO 3166 alpha-2 code and represent supra-national areas,
556	           usually covering more than one nation, state, province, or
557	           territory.

559	       B.  UN numeric codes for 'economic groupings' or 'other
560	           groupings' MUST NOT be registered in the IANA registry and
561	           MUST NOT be used to form language tags.

563	       C.  UN numeric codes for countries or areas with ambiguous ISO
564	           3166 alpha-2 codes, when entered into the registry, MUST be
565	           defined according to the rules in Section 3.4 and MUST be
566	           used to form language tags that represent the country or
567	           region for which they are defined.

569	       D.  UN numeric codes for countries or areas for which there is an
570	           associated ISO 3166 alpha-2 code in the registry MUST NOT be
571	           entered into the registry and MUST NOT be used to form
572	           language tags.  Note that the ISO 3166-based subtag in the
573	           registry MUST actually be associated with the UN M.49 code in
574	           question.

576	       E.  UN numeric codes and ISO 3166 alpha-2 codes for countries or
577	           areas listed as eligible for registration in [RFC4645] but
578	           not presently registered MAY be entered into the IANA
579	           registry via the process described in Section 3.5.  Once
580	           registered, these codes MAY be used to form language tags.

582	       F.  All other UN numeric codes for countries or areas that do not
583	           have an associated ISO 3166 alpha-2 code MUST NOT be entered
584	           into the registry and MUST NOT be used to form language tags.
585	           For more information about these codes, see Section 3.4.

587	   4.  Note: The alphanumeric codes in Appendix X of the UN document
588	       MUST NOT be entered into the registry and MUST NOT be used to
589	       form language tags.  (At the time this document was created,
590	       these values matched the ISO 3166 alpha-2 codes.)

592	   5.  There MUST be at most one region subtag in a language tag and the
593	       region subtag MAY be omitted, as when it adds no distinguishing
594	       value to the tag.

596	   6.  The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
597	       reserved for private use in language tags.  These subtags
598	       correspond to codes reserved by ISO 3166 for private use.  These
599	       codes MAY be used for private use region subtags (instead of
600	       using a private use subtag sequence).  Please refer to
601	       Section 4.5 for more information on private use subtags.

603	   "de-CH" represents German ('de') as used in Switzerland ('CH').

605	   "sr-Latn-RS" represents Serbian ('sr') written using Latin script
606	   ('Latn') as used in Serbia ('RS').

608	   "es-419" represents Spanish ('es') appropriate to the UN-defined
609	   Latin America and Caribbean region ('419').

611	2.2.5.  Variant Subtags

613	   Variant subtags are used to indicate additional, well-recognized
614	   variations that define a language or its dialects that are not
615	   covered by other available subtags.  The following rules apply to the
616	   variant subtags:

618	   1.  Variant subtags are not associated with any external standard.
619	       Variant subtags and their meanings are defined by the
620	       registration process defined in Section 3.5.

622	   2.  Variant subtags MUST follow all of the other defined subtags, but
623	       precede any extension or private use subtag sequences.

625	   3.  More than one variant MAY be used to form the language tag.

627	   4.  Variant subtags MUST be registered with IANA according to the
628	       rules in Section 3.5 of this document before being used to form
629	       language tags.  In order to distinguish variants from other types
630	       of subtags, registrations MUST meet the following length and
631	       content restrictions:

633	       1.  Variant subtags that begin with a letter (a-z, A-Z) MUST be
634	           at least five characters long.

636	       2.  Variant subtags that begin with a digit (0-9) MUST be at
637	           least four characters long.

639	   Variant subtag records in the language subtag registry MAY include
640	   one or more 'Prefix' fields, which indicate the language tag or tags
641	   that would make a suitable prefix (with other subtags, as
642	   appropriate) in forming a language tag with the variant.  For
643	   example, the subtag 'nedis' has a Prefix of "sl", making it suitable
644	   to form language tags such as "sl-nedis" and "sl-IT-nedis", but not
645	   suitable for use in a tag such as "zh-nedis" or "it-IT-nedis".

647	   "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian.

649	   "de-CH-1996" represents German as used in Switzerland and as written
650	   using the spelling reform beginning in the year 1996 C.E.

652	   Most variants that share a prefix are mutually exclusive.  For
653	   example, the German orthographic variations '1996' and '1901' SHOULD
654	   NOT be used in the same tag, as they represent the dates of different
655	   spelling reforms.  A variant that can meaningfully be used in
656	   combination with another variant SHOULD include a 'Prefix' field in
657	   its registry record that lists that other variant.  For example, if
658	   another German variant 'example' were created that made sense to use
659	   with '1996', then 'example' should include two Prefix fields: "de"
660	   and "de-1996".

662	2.2.6.  Extension Subtags

664	   Extensions provide a mechanism for extending language tags for use in
665	   various applications.  See Section 3.7.  The following rules apply to
666	   extensions:

668	   1.   Extension subtags are separated from the other subtags defined
669	        in this document by a single-character subtag ("singleton").
670	        The singleton MUST be one allocated to a registration authority
671	        via the mechanism described in Section 3.7 and MUST NOT be the
672	        letter 'x', which is reserved for private use subtag sequences.

674	   2.   Note: Private use subtag sequences starting with the singleton
675	        subtag 'x' are described in Section 2.2.7 below.

677	   3.   An extension MUST follow at least a primary language subtag.
678	        That is, a language tag cannot begin with an extension.
679	        Extensions extend language tags, they do not override or replace
680	        them.  For example, "a-value" is not a well-formed language tag,
681	        while "de-a-value" is.

683	   4.   Each singleton subtag MUST appear at most one time in each tag
684	        (other than as a private use subtag).  That is, singleton
685	        subtags MUST NOT be repeated.  For example, the tag "en-a-bbb-a-
686	        ccc" is invalid because the subtag 'a' appears twice.  Note that
687	        the tag "en-a-bbb-x-a-ccc" is valid because the second
688	        appearance of the singleton 'a' is in a private use sequence.

690	   5.   Extension subtags MUST meet all of the requirements for the
691	        content and format of subtags defined in this document.

693	   6.   Extension subtags MUST meet whatever requirements are set by the
694	        document that defines their singleton prefix and whatever
695	        requirements are provided by the maintaining authority.

697	   7.   Each extension subtag MUST be from two to eight characters long
698	        and consist solely of letters or digits, with each subtag
699	        separated by a single '-'.

701	   8.   Each singleton MUST be followed by at least one extension
702	        subtag.  For example, the tag "tlh-a-b-foo" is invalid because
703	        the first singleton 'a' is followed immediately by another
704	        singleton 'b'.

706	   9.   Extension subtags MUST follow all language, extended language,
707	        script, region, and variant subtags in a tag.

709	   10.  All subtags following the singleton and before another singleton
710	        are part of the extension.  Example: In the tag "fr-a-Latn", the
711	        subtag 'Latn' does not represent the script subtag 'Latn'
712	        defined in the IANA Language Subtag Registry.  Its meaning is
713	        defined by the extension 'a'.

715	   11.  In the event that more than one extension appears in a single
716	        tag, the tag SHOULD be canonicalized as described in
717	        Section 4.4.

719	   For example, if the prefix singleton 'r' and the shown subtags were
720	   defined, then the following tag would be a valid example: "en-Latn-
721	   GB-boont-r-extended-sequence-x-private"

723	2.2.7.  Private Use Subtags

725	   Private use subtags are used to indicate distinctions in language
726	   important in a given context by private agreement.  The following
727	   rules apply to private use subtags:

729	   1.  Private use subtags are separated from the other subtags defined
730	       in this document by the reserved single-character subtag 'x'.

732	   2.  Private use subtags MUST conform to the format and content
733	       constraints defined in the ABNF for all subtags.

735	   3.  Private use subtags MUST follow all language, extended language,
736	       script, region, variant, and extension subtags in the tag.
737	       Another way of saying this is that all subtags following the
738	       singleton 'x' MUST be considered private use.  Example: The
739	       subtag 'US' in the tag "en-x-US" is a private use subtag.

741	   4.  A tag MAY consist entirely of private use subtags.

743	   5.  No source is defined for private use subtags.  Use of private use
744	       subtags is by private agreement only.

746	   6.  Private use subtags are NOT RECOMMENDED where alternatives exist
747	       or for general interchange.  See Section 4.5 for more information
748	       on private use subtag choice.

750	   For example: Users who wished to utilize codes from the Ethnologue
751	   publication of SIL International for language identification might
752	   agree to exchange tags such as "az-Arab-x-AZE-derbend".  This example
753	   contains two private use subtags.  The first is 'AZE' and the second
754	   is 'derbend'.

756	2.2.8.  Grandfathered Registrations

758	   Prior to RFC 4646, whole language tags were registered according to
759	   the rules in RFC 1766 and/or RFC 3066.  These registered tags
760	   maintain their validity.  Of those tags, those that were made
761	   obsolete or redundant by the advent of RFC 4646, by this document, or
762	   by subsequent registration of subtags are maintained in the registry
763	   in records as "redundant" records.  Those tags that do not match the
764	   'langtag' production in the ABNF in this document or that contain
765	   subtags that do not individually appear in the registry are
766	   maintained in the registry in records of the "grandfathered" type.

768	   Grandfathered tags contain one or more subtags that are not defined
769	   in the Language Subtag Registry (see Section 3).  Redundant tags
770	   consist entirely of subtags defined above and whose independent
771	   registration was superseded by [RFC4646].  For more information see
772	   Section 3.8.

774	   Some grandfathered tags are "regular" in that they match the
775	   'langtag' production in Figure 1.  In some cases, these tags could
776	   become redundant if their (current unregistered) subtags were to be
777	   registered (as variants, for example).  In other cases, although the
778	   subtags match the language tag pattern, the meaning assigned to the
779	   various subtags is prohibited by rules elsewhere in this document.
780	   Those tags can never become redundant.

782	   The remaining grandfathered tags are "irregular" and do not match the
783	   'langtag' production.  These are listed in the 'irregular' production
784	   in Figure 1.  These grandfathered tags can never become redundant.
785	   Many of these tags have been superseded by other registrations: their
786	   record contains a Preferred-Value field that really ought to be used
787	   to form language tags representing that value.

789	2.2.9.  Classes of Conformance

791	   Implementations sometimes need to describe their capabilities with
792	   regard to the rules and practices described in this document.  Tags
793	   can be checked or verified in a number of ways, but two particular
794	   classes of tag conformance are formally defined here.

796	   A tag is considered "well-formed" if it conforms to the ABNF
797	   (Section 2.1).  Note that irregular grandfathered tags are now listed
798	   in the 'irregular' production.

800	   A tag is considered "valid" if it well-formed and it also satisfies
801	   these conditions:

803	   o  The tag is either a grandfathered tag, or all of its language,
804	      extended language, script, region, and variant subtags appear in
805	      the IANA language subtag registry as of the particular registry
806	      date.

808	   o  There are no duplicate singleton (extension) subtags and no
809	      duplicate variant subtags.

811	   o  For each subtag that has a 'Prefix' field in the registry, the
812	      Prefix matches the language tag using Extended Filtering
813	      [RFC4647].  That is, each subtag in the Prefix is present in the
814	      tag and in the same order.  For example, the Prefix "zh-TW"
815	      matches the tag "zh-Hant-TW".

817	   Note that a tag's validity depends on the date of the registry used
818	   to validate the tag.  A more-recent copy of the registry might
819	   contain a subtag that an older version does not.

821	   A tag is considered "valid" for a given extension (Section 3.7) (as
822	   of a particular version, revision, and date) if it meets the criteria
823	   for "valid" above and also satisfies this condition:

825	      Each subtag used in the extension part of the tag is valid
826	      according to the extension.

828	3.  Registry Format and Maintenance

830	   This section defines the Language Subtag Registry and the maintenance
831	   and update procedures associated with it, as well as a registry for
832	   extensions to language tags (Section 3.7).

834	   The Language Subtag Registry contains a comprehensive list of all of
835	   the subtags valid in language tags.  This allows implementers a
836	   straightforward and reliable way to validate language tags.  The
837	   Language Subtag Registry will be maintained so that, except for
838	   extension subtags, it is possible to validate all of the subtags that
839	   appear in a language tag under the provisions of this document or its
840	   revisions or successors.  In addition, the meaning of the various
841	   subtags will be unambiguous and stable over time.  (The meaning of
842	   private use subtags, of course, is not defined by the IANA registry.)

844	3.1.  Format of the IANA Language Subtag Registry

846	   The IANA Language Subtag Registry ("the registry") consists of a text
847	   file that is machine readable in the format described in this
848	   section, plus copies of the registration forms approved in accordance
849	   with the process described in Section 3.5.  The existing registration
850	   forms for grandfathered and redundant tags taken from RFC 3066 will
851	   be maintained as part of the obsolete RFC 3066 registry.  The
852	   remaining set of initial subtags will not have registration forms
853	   created for them.

855	3.1.1.  File Format

857	   The registry is in the text format described below.  This format was
858	   based on the record-jar format described in [record-jar].

860	   Each line of text is limited to 72 characters, including all
861	   whitespace.  Records are separated by lines containing only the
862	   sequence "%%" (%x25.25).

864	   Each field can be viewed as a single, logical line of ASCII
865	   characters, comprising a field-name and a field-body separated by a
866	   COLON character (%x3A).  For convenience, the field-body portion of
867	   this conceptual entity can be split into a multiple-line
868	   representation; this is called "folding".  The format of the registry
869	   is described by the following ABNF (per [RFC4234]):

871	   registry   = record *("%%" CRLF record)
872	   record     = 1*( field-name *SP ":" *SP field-body CRLF )
873	   field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)]
874	   field-body = *(([*WSP CRLF] 1*WSP) 1*ASCCHAR)
875	   ASCCHAR    = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26
876	   UNICHAR    = "&#x" 2*6HEXDIG ";"

878	                      Figure 2: Registry Format ABNF

880	   The sequence '..' (%x2E.2E) in a field-body denotes a range of
881	   values.  Such a range represents all subtags of the same length that
882	   are in alphabetic or numeric order within that range, including the
883	   values explicitly mentioned.  For example 'a..c' denotes the values
884	   'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and
885	   '13'.

887	   Characters from outside the US-ASCII [ISO646] repertoire, as well as
888	   the AMPERSAND character ("&", %x26) when it occurs in a field-body,
889	   are represented by a "Numeric Character Reference" using hexadecimal
890	   notation in the style used by [XML10] (see
891	   <http://www.w3.org/TR/REC-xml/#dt-charref>).  This consists of the
892	   sequence "&#x" (%x26.23.78) followed by a hexadecimal representation
893	   of the character's code point in [ISO10646] followed by a closing
894	   semicolon (%x3B).  For example, the EURO SIGN, U+20AC, would be
895	   represented by the sequence "&#x20AC;".  Note that the hexadecimal
896	   notation MAY have between two and six digits.

898	   All fields whose field-body contains a date value use the "full-date"
899	   format specified in [RFC3339].  For example: "2004-06-28" represents
900	   June 28, 2004, in the Gregorian calendar.

902	3.1.2.  Record Definitions

904	   There are three types of records in the registry: "File-Date",
905	   "Subtag", and "Tag" records.

907	   The first record in the registry is a "File-Date" record.  This
908	   record contains the single field whose field-name is "File-Date" (see
909	   Figure 2).  The field-body of this record contains the last
910	   modification date of this copy of the registry, making it possible to
911	   compare different versions of the registry.  The registry on the IANA
912	   website is the most current.  Versions with an older date than that
913	   one are not up-to-date.

915	   File-Date: 2004-06-28
916	   %%

918	                 Figure 3: Example of the File-Date Record

920	   Subsequent records represent either subtags or tags in the registry.
921	   "Subtag" records contain a field with a field-name of "Subtag",
922	   while, unsurprisingly, "Tag" records contain a field with a field-
923	   name of "Tag".  Each of the fields in each record MUST occur no more
924	   than once, unless otherwise noted below.  Each record MUST contain
925	   the following fields:

927	   o  'Type'

929	      *  Type's field-body MUST consist of one of the following strings:
930	         "language", "extlang", "script", "region", "variant",
931	         "grandfathered", and "redundant" and denotes the type of tag or
932	         subtag.

934	   o  Either 'Subtag' or 'Tag'

936	      *  Subtag's field-body contains the subtag being defined.  This
937	         field MUST only appear in records of whose 'Type' has one of
938	         these values: "language", "extlang", "script", "region", or
939	         "variant".

941	      *  Tag's field-body contains a complete language tag.  This field
942	         MUST only appear in records whose 'Type' has one of these
943	         values: "grandfathered" or "redundant".  Note that the field-
944	         body will always follow the 'grandfathered' production in the
945	         ABNF in Section 2.1

947	   o  Description

949	      *  Description's field-body contains a non-normative description
950	         of the subtag or tag.

952	   o  Added

954	      *  Added's field-body contains the date the record was added to
955	         the registry.

957	   Each record MAY also contain the following fields:

959	   o  Preferred-Value

961	      *  For fields of type 'script', 'region', and 'variant',
962	         'Preferred-Value' contains the subtag of the same 'Type' that
963	         is preferred for forming the language tag.

965	      *  For fields of type 'language' and 'extlang', 'Preferred-Value'
966	         contains the language production (see Figure 1) that is
967	         preferred when forming the language tag.  This can be simply a
968	         'language' subtag, or it can be a 'language' subtag followed by
969	         an extended language sequence.

971	      *  For fields of type 'grandfathered' and 'redundant', a canonical
972	         mapping to a complete language tag.

974	   o  Deprecated

976	      *  Deprecated's field-body contains the date the record was
977	         deprecated.

979	   o  Prefix

981	      *  Prefix's field-body contains a language tag with which this
982	         subtag MAY be used to form a new language tag, perhaps with
983	         other subtags as well.  This field MUST only appear in records
984	         whose 'Type' field-body is 'variant' or 'extlang'.  For
985	         example, the 'Prefix' for the variant 'nedis' is 'sl', meaning
986	         that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate
987	         while the tag "is-nedis" is not.

989	   o  Comments

991	      *  Comments contains additional information about the subtag, as
992	         deemed appropriate for understanding the registry and
993	         implementing language tags using the subtag or tag.

995	   o  Suppress-Script

997	      *  Suppress-Script contains a script subtag that SHOULD NOT be
998	         used to form language tags with the associated primary language
999	         subtag.  This field MUST only appear in records whose 'Type'
1000	         field-body is 'language'.  See Section 4.1.

1002	   Future versions of this document might add additional fields to the
1003	   registry, so implementations SHOULD ignore fields found in the
1004	   registry that are not defined in this document.

1006	3.1.3.  Subtag and Tag Fields

1008	   The 'Subtag' field MUST use lowercase letters to form the subtag,
1009	   with two exceptions.  Subtags whose 'Type' field is 'script' (in
1010	   other words, subtags defined by ISO 15924) MUST use titlecase.
1011	   Subtags whose 'Type' field is 'region' (in other words, subtags
1012	   defined by ISO 3166) MUST use uppercase.  These exceptions mirror the
1013	   use of case in the underlying standards.

1015	   Each subtag in the tags contained in a 'Tag' field MUST be formatted
1016	   using the rules in the preceeding paragraph.  That is, all subtags
1017	   are lowercase except for subtags that represent script or region
1018	   codes.

1020	3.1.4.  Description Field

1022	   The field 'Description' contains a description of the tag or subtag
1023	   in the record.  The 'Description' field MAY appear more than once per
1024	   record, that is, there can be multiple descriptions for a given
1025	   record.  At least one of the 'Description' fields MUST be written or
1026	   transcribed into the Latin script; additional 'Description' fields
1027	   MAY also include a description in a non-Latin script.  Each
1028	   'Description' field MUST be unique, both within the record in which
1029	   it appears and for the collection of records of the same type.
1030	   Moreover, formatting variations of the same description MUST NOT
1031	   occur in that specific record or in any other record of the same
1032	   type.  For example, while the ISO 639-1 code 'fy' contains both the
1033	   descriptions "Western Frisian" and "Frisian, Western", only one of
1034	   these descriptions appears in the registry.

1036	   The 'Description' field is used for identification purposes and
1037	   SHOULD NOT be taken to represent the actual native name of the
1038	   language or variation or to be in any particular language.

1040	   For records taken from a source standard (such as ISO 639 or ISO
1041	   3166), the 'Description' value(s) SHOULD also be taken from the
1042	   source standard.  Multiple descriptions in the source standard MUST
1043	   be split into separate 'Description' fields.  The source standard's
1044	   descriptions MAY be edited, either prior to insertion or via the
1045	   registration process.  For fields of type 'language' or 'extlang',
1046	   the first 'Description' field appearing in the Registry corresponds
1047	   to the Reference Name assigned by ISO 639-3.  This helps facilitate
1048	   cross-referencing between ISO 639 and the registry.

1050	   When creating or updating a record due to the action of one of the
1051	   source standards, the Language Subtag Reviewer SHOULD remove
1052	   duplicate or redundant descriptions and MAY edit descriptions to
1053	   correct irregularities in formatting (such as misspellings,
1054	   inappropriate apostrophes or other punctuation, or excessive or
1055	   missing spaces) prior to submitting the proposed record to the ietf-
1056	   languages list.

1058	   Note: Descriptions in registry entries that correspond to ISO 639,
1059	   ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate
1060	   the meaning of that identifier as defined in the source standard at
1061	   the time it was added to the registry.  The description does not
1062	   replace the content of the source standard itself.  The descriptions
1063	   are not intended to be the English localized names for the subtags.

1065	   Localization or translation of language tag and subtag descriptions
1066	   is out of scope of this document.

1068	3.1.5.  Deprecated Field

1070	   The field 'Deprecated' MAY be added to any record via the maintenance
1071	   process described in Section 3.3 or via the registration process
1072	   described in Section 3.5.  Usually, the addition of a 'Deprecated'
1073	   field is due to the action of one of the standards bodies, such as
1074	   ISO 3166, withdrawing a code.  In some historical cases, it might not
1075	   have been possible to reconstruct the original deprecation date.  For
1076	   these cases, an approximate date appears in the registry.  Although
1077	   valid in language tags, subtags and tags with a 'Deprecated' field
1078	   are deprecated and validating processors SHOULD NOT generate these
1079	   subtags.  Note that a record that contains a 'Deprecated' field and
1080	   no corresponding 'Preferred-Value' field has no replacement mapping.

1082	3.1.6.  Preferred-Value Field

1084	   The field 'Preferred-Value' contains a mapping between the record in
1085	   which it appears and another tag or subtag.  The value in this field
1086	   is strongly RECOMMENDED as the best choice to represent the value of
1087	   this record when selecting a language tag.  These values form three
1088	   groups:

1090	   1.  ISO 639 language codes that were later withdrawn in favor of
1091	       other codes.  These values are mostly a historical curiosity.

1093	   2.  ISO 3166 region codes that have been withdrawn in favor of a new
1094	       code.  This sometimes happens when a country changes its name or
1095	       administration in such a way that warrants a new region code.

1097	   3.  Grandfathered or redundant tags from RFC 3066.  In many cases,
1098	       these tags have become obsolete because the values they represent
1099	       were later encoded by ISO 639.

1101	   Records that contain a 'Preferred-Value' field MUST also have a
1102	   'Deprecated' field.  This field contains a date of deprecation.
1103	   Thus, a language tag processor can use the registry to construct the
1104	   valid, non-deprecated set of subtags for a given date.  In addition,
1105	   for any given tag, a processor can construct the set of valid
1106	   language tags that correspond to that tag for all dates up to the
1107	   date of the registry.  The ability to do these mappings MAY be
1108	   beneficial to applications that are matching, selecting, for
1109	   filtering content based on its language tags.

1111	   Note that 'Preferred-Value' mappings in records of type 'region'
1112	   sometimes do not represent exactly the same meaning as the original
1113	   value.  There are many reasons for a country code to be changed, and
1114	   the effect this has on the formation of language tags will depend on
1115	   the nature of the change in question.

1117	   In particular, the 'Preferred-Value' field does not imply retagging
1118	   content that uses the affected subtag.

1120	   The field 'Preferred-Value' MUST NOT be modified once created in the
1121	   registry.  The field MAY be added to records according to the rules
1122	   in Section 3.3.

1124	   The 'Preferred-Value' field in records of type "grandfathered" and
1125	   "redundant" contains whole language tags that are strongly
1126	   RECOMMENDED for use in place of the record's value.  In many cases,
1127	   the mappings were created by deprecation of the tags during the
1128	   period before this document was adopted.  For example, the tag "no-
1129	   nyn" was deprecated in favor of the ISO 639-1-defined language code
1130	   'nn'.

1132	3.1.7.  Prefix Field

1134	   The field of type 'Prefix' MUST NOT be removed from any record.  The
1135	   field-body for this type of field MAY be modified, but only if the
1136	   modification broadens the meaning of the subtag.  That is, the field-
1137	   body can be replaced only by a prefix a prefix of itself.  For
1138	   example, the Prefix "be-Latn" (Belarusian, Latin script) could be
1139	   replaced by the Prefix "be" (Belarusian) but not by the Prefix "ru-
1140	   Latn" (Russian, Latin script).

1142	   The field-body of the 'Prefix' field consists of a language tag whose
1143	   subtags are appropriate to use with this subtag.  For example, the
1144	   variant subtag '1996' has a 'Prefix' field of "de".  This means that
1145	   tags starting with the sequence "de-" are appropriate with this
1146	   subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while
1147	   the tag "fr-1996" is an inappropriate choice.

1149	   Records of type 'variant' MAY have more than one field of type
1150	   'Prefix'.  Additional fields of this type MAY be added to a 'variant'
1151	   record via the registration process.

1153	   The field-body of the 'Prefix' field MUST NOT conflict with any
1154	   'Prefix' already registered for a given record.  Such a conflict
1155	   would occur when when no valid tag could be constructed that would
1156	   contain the prefix, such as when when two subtags each have a
1157	   'Prefix' that contains the other subtag.  For example, suppose that
1158	   the subtag 'avariant' has the prefix "es-bvariant".  Then the subtag
1159	   'bvariant' cannot given the prefix 'avariant', for that would require
1160	   a tag of the form "es-avariant-bvariant-avariant", which would not be
1161	   valid.

1163	   Records of type 'extlang' MUST have _exactly_ one 'Prefix' field.

1165	3.1.8.  Comments Field

1167	   The field 'Comments' MAY appear more than once per record.  This
1168	   field MAY be inserted or changed via the registration process and no
1169	   guarantee of stability is provided.  The content of this field is not
1170	   restricted, except by the need to register the information, the
1171	   suitability of the request, and by reasonable practical size
1172	   limitations.

1174	3.1.9.  Suppress-Script Field

1176	   The field 'Suppress-Script' MUST only appear in records whose 'Type'
1177	   field-body is 'language'.  This field MUST NOT appear more than one
1178	   time in a record.  This field indicates a script used to write the
1179	   overwhelming majority of documents for the given language and that
1180	   therefore adds no distinguishing information to a language tag.  It
1181	   helps ensure greater compatibility between the language tags
1182	   generated according to the rules in this document and language tags
1183	   and tag processors or consumers based on RFC 3066.  For example,
1184	   virtually all Icelandic documents are written in the Latin script,
1185	   making the subtag 'Latn' redundant in the tag "is-Latn".

1187	   Many language subtag records do not have a Suppress-Script field.
1188	   The lack of a Suppress-Script might indicate that the language is
1189	   customarily written in more than one script or that the language is
1190	   not customarily written at all.  It might also mean that sufficient
1191	   information was not available when the record was created and thus
1192	   remains a candidate for future registration.

1194	3.2.  Language Subtag Reviewer

1196	   The Language Subtag Reviewer moderates the ietf-languages mailing
1197	   list, responds to requests for registration, and performs the other
1198	   registry maintenance duties described in Section 3.3.  Only the
1199	   Language Subtag Reviewer is permitted to request IANA to change,
1200	   update, or add records to the Language Subtag Registry.  The Language
1201	   Subtag Reviewer MAY delegate list moderation and other clerical
1202	   duties as needed.

1204	   The Language Subtag Reviewer is appointed by the IESG for an
1205	   indefinite term, subject to removal or replacement at the IESG's
1206	   discretion.  The IESG will solicit nominees for the position
1207	   (initially or upon a vacancy) and seek to ascertain the candidates'
1208	   qualifications.

1210	   The subsequent performance or decisions of the Language Subtag
1211	   Reviewer MAY be appealed to the IESG under the same rules as other
1212	   IETF decisions (see [RFC2026]).  The IESG can reverse or overturn the
1213	   decision of the Language Subtag Reviewer, provide guidance, or take
1214	   other appropriate actions.

1216	3.3.  Maintenance of the Registry

1218	   Maintenance of the registry requires that as codes are assigned or
1219	   withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language
1220	   Subtag Reviewer MUST evaluate each change and determine the
1221	   appropriate course of action according to the rules in this document.
1222	   Usually this requires that the Language Subtag Reviewer fill in and
1223	   submit the registration form found in Section 3.5 for the new or
1224	   updated record.  If a change to one of these standards takes place
1225	   and the Language Subtag Reviewer does not do this in a timely manner,
1226	   then any interested party MAY submit the form to begin the
1227	   registration process.  Thereafter the registration process continues
1228	   normally.

1230	   Note: The redundant and grandfathered entries together are the
1231	   complete list of tags registered under [RFC3066].  The redundant tags
1232	   are those that can now be formed using the subtags defined in the
1233	   registry together with the rules of Section 2.2.  The grandfathered
1234	   entries include those that can never be legal under those same
1235	   provisions plus those tags that contain subtags not yet registered
1236	   or, perhaps, inappropriate for registration.

1238	   The set of redundant and grandfathered tags is permanent and stable:
1239	   new entries in this section MUST NOT be added and existing entries
1240	   MUST NOT be removed.  Records of type 'grandfathered' MAY have their
1241	   type converted to 'redundant'; see item 12 in Section 3.6 for more
1242	   information.  The decision-making process about which tags were
1243	   initially grandfathered and which were made redundant is described in
1244	   [RFC4645].

1246	   RFC 3066 tags that were deprecated prior to the adoption of [RFC4646]
1247	   are part of the list of grandfathered tags, and their component
1248	   subtags were not included as registered variants (although they
1249	   remain eligible for registration).  For example, the tag "art-lojban"
1250	   was deprecated in favor of the language subtag 'jbo'.

1252	   The Language Subtag Reviewer MUST ensure that new subtags meet the
1253	   requirements in Section 4.1 or submit an appropriate registration
1254	   form for an alternate subtag as described in that section.  When
1255	   either a change or addition to the registry is needed, the Language
1256	   Subtag Reviewer MUST prepare the registration form and each record
1257	   being modified or inserted MUST be sent to the ietf-languages list in
1258	   a separate message.

1260	   Upon approval of the registration, the Language Subtag Reviewer MUST
1261	   forward the form containing the final record to IANA.  If a record
1262	   represents a new subtag that does not currently exist in the
1263	   registry, then the message's subject line MUST include the word
1264	   "INSERT".  If the record represents a change to an existing subtag,
1265	   then the subject line of the message MUST include the word "MODIFY".
1266	   The message MUST contain both the form for the subtag being inserted
1267	   or modified and the new File-Date record.  Here is an example of what
1268	   the body of the message might contain:

1270	   LANGUAGE SUBTAG REGISTRATION FORM

1272	   File-Date: 2005-01-02

1274	   1. Name of requester: Michael Everson
1275	   2. E-mail address of requester: someone@example.org
1276	   3. Record Requested:
1277	   %%
1278	   Type: variant
1279	   Subtag: nedis
1280	   Description: Natisone dialect
1281	   Description: Nadiza dialect
1282	   Added: 2003-10-09
1283	   Prefix: sl
1284	   Comments: This is a comment shown
1285	     as an example.
1286	   %%
1287	   4. Intended meaning of the subtag: Nadiza dialect of Slovenian
1288	   5. Reference to published description
1289	      of the language (book or article): N/A
1290	   6. Any other relevant information: (none)

1292	         Figure 4: Example of a Language Subtag Modification Form

1294	   Whenever an entry is created or modified in the registry, the 'File-
1295	   Date' record at the start of the registry is updated to reflect the
1296	   most recent modification date in the [RFC3339] "full-date" format.

1298	   Before forwarding a new registration to IANA, the Language Subtag
1299	   Reviewer MUST ensure that values in the 'Subtag' field match case
1300	   according to the description in Section 3.1.

1302	3.4.  Stability of IANA Registry Entries

1304	   The stability of entries and their meaning in the registry is
1305	   critical to the long-term stability of language tags.  The rules in
1306	   this section guarantee that a specific language tag's meaning is
1307	   stable over time and will not change.

1309	   These rules specifically deal with how changes to codes (including
1310	   withdrawal and deprecation of codes) maintained by ISO 639, ISO
1311	   15924, ISO 3166, and UN M.49 are reflected in the IANA Language
1312	   Subtag Registry.  Assignments to the IANA Language Subtag Registry
1313	   MUST follow the following stability rules:

1315	   1.   Values in the fields 'Type', 'Subtag', 'Tag', 'Added',
1316	        'Deprecated' and 'Preferred-Value' MUST NOT be changed and are
1317	        guaranteed to be stable over time.

1319	   2.   Values in the 'Description' field MUST NOT be changed in a way
1320	        that would invalidate previously-existing tags.  They MAY be
1321	        broadened somewhat in scope, changed to add information, or
1322	        adapted to the most common modern usage.  For example, countries
1323	        occasionally change their official names; a historical example
1324	        of this would be "Upper Volta" changing to "Burkina Faso".

1326	   3.   Values in the field 'Prefix' MAY be added to records of type
1327	        'variant' via the registration process.  If a prefix is added to
1328	        a variant record, 'Comment' fields SHOULD be used to explain
1329	        different usages with the various prefixes.

1331	   4.   Values in the field 'Prefix' in records of type 'variant' MAY be
1332	        modified, so long as the modifications broaden the set of
1333	        prefixes.  That is, a prefix MAY be replaced by one of its own
1334	        prefixes.  For example, the prefix "en-US" could be replaced by
1335	        "en", but not by the prefixes "en-Latn", "fr", or "en-US-boont".
1336	        If one of those prefixes were needed, a new Prefix SHOULD be
1337	        registered.

1339	   5.   Values in the field 'Prefix' in records of type 'extlang' MUST
1340	        NOT be modified.

1342	   6.   Values in the field 'Prefix' MUST NOT be removed.

1344	   7.   The field 'Comments' MAY be added, changed, modified, or removed
1345	        via the registration process or any of the processes or
1346	        considerations described in this section.

1348	   8.   The field 'Suppress-Script' MAY be added or removed via the
1349	        registration process.

1351	   9.   Codes assigned by ISO 639-1 that do not conflict with existing
1352	        two-letter primary language subtags and which have no
1353	        corresponding three-letter primary or extended language subtags
1354	        defined in the registry are entered into the IANA registry as
1355	        new records of type 'language'.

1357	   10.  Codes assigned by ISO 639-2 that do not conflict with existing
1358	        three-letter primary or extended language subtags are entered
1359	        into the IANA registry as new records of type 'language'.

1361	   11.  Codes assigned by ISO 639-3 that do not conflict with existing
1362	        three-letter primary or extended language subtags are entered
1363	        into the IANA registry as new records.

1365	        1.  Codes that have a defined "macro-language" mapping at the
1366	            time of their registration MUST be entered into the registry
1367	            as records of type 'extlang' with a 'Prefix' field
1368	            containing the appropriate prefix tag.

1370	        2.  Codes that represent sign languages MUST be entered into the
1371	            registry as record of type 'extlang' with a 'Prefix' field
1372	            that matches the Basic Language Range "sgn" (see Section
1373	            3.3.1 "Basic Filtering" in [RFC4647]).

1375	        3.  All other codes MUST be entered into the registry as records
1376	            of type 'language'.

1378	   12.  A record of type 'language' or 'extlang' MUST NOT be registered
1379	        if there exists a record of either type with the same subtag
1380	        value.  For example, if an 'extlang' subtag 'foo' exists in the
1381	        registry, all attempts to register a 'language' subtag 'foo'
1382	        will be rejected.

1384	   13.  Codes assigned by ISO 15924 and ISO 3166 that do not conflict
1385	        with existing subtags of the associated type and whose meaning
1386	        is not the same as an existing subtag of the same type are
1387	        entered into the IANA registry as new records.

1389	   14.  Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are
1390	        withdrawn by their respective maintenance or registration
1391	        authority remain valid in language tags.  A 'Deprecated' field
1392	        containing the date of withdrawal MUST be added to the record.
1393	        If a new record of the same type is added that represents a
1394	        replacement value, then a 'Preferred-Value' field MAY also be
1395	        added.  The registration process MAY be used to add comments
1396	        about the withdrawal of the code by the respective standard.

1398	        Example  The region code 'TL' was assigned to the country
1399	           'Timor-Leste', replacing the code 'TP' (which was assigned to
1400	           'East Timor' when it was under administration by Portugal).
1401	           The subtag 'TP' remains valid in language tags, but its
1402	           record contains the a 'Preferred-Value' of 'TL' and its field
1403	           'Deprecated' contains the date the new code was assigned
1404	           ('2004-07-06').

1406	   15.  Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict
1407	        with existing subtags of the associated type, including subtags
1408	        that are deprecated, MUST NOT be entered into the registry.  The
1409	        following additional considerations apply to subtag values that
1410	        are reassigned:

1412	        A.  For ISO 639 codes, if the newly assigned code's meaning is
1413	            not represented by a subtag in the IANA registry, the
1414	            Language Subtag Reviewer, as described in Section 3.5, SHALL
1415	            prepare a proposal for entering in the IANA registry as soon
1416	            as practical a registered language subtag as an alternate
1417	            value for the new code.  The form of the registered language
1418	            subtag will be at the discretion of the Language Subtag
1419	            Reviewer and MUST conform to other restrictions on language
1420	            subtags in this document.

1422	        B.  For all subtags whose meaning is derived from an external
1423	            standard (that is, by ISO 639, ISO 15924, ISO 3166, or UN
1424	            M.49), if a new meaning is assigned to an existing code and
1425	            the new meaning broadens the meaning of that code, then the
1426	            meaning for the associated subtag MAY be changed to match.
1427	            The meaning of a subtag MUST NOT be narrowed, however, as
1428	            this can result in an unknown proportion of the existing
1429	            uses of a subtag becoming invalid.  Note: ISO 639
1430	            maintenance agency/registration authority (MA/RA) has
1431	            adopted a similar stability policy.

1433	        C.  For ISO 15924 codes, if the newly assigned code's meaning is
1434	            not represented by a subtag in the IANA registry, the
1435	            Language Subtag Reviewer, as described in Section 3.5, SHALL
1436	            prepare a proposal for entering in the IANA registry as soon
1437	            as practical a registered variant subtag as an alternate
1438	            value for the new code.  The form of the registered variant
1439	            subtag will be at the discretion of the Language Subtag
1440	            Reviewer and MUST conform to other restrictions on variant
1441	            subtags in this document.

1443	        D.  For ISO 3166 codes, if the newly assigned code's meaning is
1444	            associated with the same UN M.49 code as another 'region'
1445	            subtag, then the existing region subtag remains as the
1446	            preferred value for that region and no new entry is created.
1447	            A comment MAY be added to the existing region subtag
1448	            indicating the relationship to the new ISO 3166 code.

1450	        E.  For ISO 3166 codes, if the newly assigned code's meaning is
1451	            associated with a UN M.49 code that is not represented by an
1452	            existing region subtag, then the Language Subtag Reviewer,
1453	            as described in Section 3.5, SHALL prepare a proposal for
1454	            entering the appropriate UN M.49 country code as an entry in
1455	            the IANA registry.

1457	        F.  For ISO 3166 codes, if there is no associated UN numeric
1458	            code, then the Language Subtag Reviewer SHALL petition the
1459	            UN to create one.  If there is no response from the UN
1460	            within ninety days of the request being sent, the Language
1461	            Subtag Reviewer SHALL prepare a proposal for entering in the
1462	            IANA registry as soon as practical a registered variant
1463	            subtag as an alternate value for the new code.  The form of
1464	            the registered variant subtag will be at the discretion of
1465	            the Language Subtag Reviewer and MUST conform to other
1466	            restrictions on variant subtags in this document.  This
1467	            situation is very unlikely to ever occur.

1469	   16.  UN M.49 has codes for both countries and areas (such as '276'
1470	        for Germany) and geographical regions and sub-regions (such as
1471	        '150' for Europe).  UN M.49 country or area codes for which
1472	        there is no corresponding ISO 3166 code SHOULD NOT be
1473	        registered, except as a surrogate for an ISO 3166 code that is
1474	        blocked from registration by an existing subtag.  If such a code
1475	        becomes necessary, then the registration authority for ISO 3166
1476	        SHOULD first be petitioned to assign a code to the region.  If
1477	        the petition for a code assignment by ISO 3166 is refused or not
1478	        acted on in a timely manner, the registration process described
1479	        in Section 3.5 MAY then be used to register the corresponding UN
1480	        M.49 code.  This way, UN M.49 codes remain available as the
1481	        value of last resort in cases where ISO 3166 reassigns a
1482	        deprecated value in the registry.

1484	   17.  Stability provisions apply to grandfathered tags with this
1485	        exception: should it be possible to compose one of the
1486	        grandfathered tags from registered subtags, then the field
1487	        'Type' in that record is changed from 'grandfathered' to
1488	        'redundant'.  Note that this will not affect language tags that
1489	        match the grandfathered tag, since these tags will now match
1490	        valid generative subtag sequences.  For example, this document
1491	        caused the ISO 639-3 code 'gan', used in the redundant tag "zh-
1492	        gan", to be registered as an extended language subtag.  The
1493	        formerly-grandfathered tag "zh-gan" became a redundant tag as a
1494	        result (but existing content or implementations that use "zh-
1495	        gan" remain valid).

1497	3.5.  Registration Procedure for Subtags

1499	   The procedure given here MUST be used by anyone who wants to use a
1500	   subtag not currently in the IANA Language Subtag Registry.

1502	   Only subtags of type 'language' and 'variant' will be considered for
1503	   independent registration of new subtags.  Subtags needed for
1504	   stability and subtags necessary to keep the registry synchronized
1505	   with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits
1506	   defined by this document also use this process, as described in
1507	   Section 3.3.  Stability provisions are described in Section 3.4.

1509	   This procedure MAY also be used to register or alter the information
1510	   for the 'Description', 'Comments', 'Deprecated', 'Prefix', or
1511	   'Suppress-Script' fields in a subtag's record as described in
1512	   Section 3.4.  Changes to all other fields in the IANA registry are
1513	   NOT permitted.

1515	   Registering a new subtag or requesting modifications to an existing
1516	   tag or subtag starts with the requester filling out the registration
1517	   form reproduced below.  Note that each response is not limited in
1518	   size so that the request can adequately describe the registration.
1519	   The fields in the "Record Requested" section SHOULD follow the
1520	   requirements in Section 3.1.

1522	   LANGUAGE SUBTAG REGISTRATION FORM
1523	   1. Name of requester:
1524	   2. E-mail address of requester:
1525	   3. Record Requested:

1527	      Type:
1528	      Subtag:
1529	      Description:
1530	      Prefix:
1531	      Preferred-Value:
1532	      Deprecated:
1533	      Suppress-Script:
1534	      Comments:

1536	   4. Intended meaning of the subtag:
1537	   5. Reference to published description
1538	      of the language (book or article):
1539	   6. Any other relevant information:

1541	              Figure 5: The Language Subtag Registration Form

1543	   The subtag registration form MUST be sent to
1544	   <ietf-languages@iana.org> for a two-week review period before it can
1545	   be submitted to IANA.  If modifications are made to the request
1546	   during the course of the registration process (such as corrections to
1547	   meet the requirements in Section 3.1) the corrected form MUST also be
1548	   sent to <ietf-languages@iana.org> prior to submission to IANA.

1550	   The ietf-languages list is an open list and can be joined by sending
1551	   a request to <ietf-languages-request@iana.org>.  The list can be
1552	   hosted by IANA or by any third party at the request of IESG.

1554	   Variant subtags are usually registered for use with a particular
1555	   range of language tags.  For example, the subtag 'rozaj' is intended
1556	   for use with language tags that start with the primary language
1557	   subtag "sl", since Resian is a dialect of Slovenian.  Thus, the
1558	   subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj"
1559	   or "sl-IT-rozaj".  This information is stored in the 'Prefix' field
1560	   in the registry.  Variant registration requests SHOULD include at
1561	   least one 'Prefix' field in the registration form.

1563	   Extended language subtags MUST include exactly one 'Prefix' field.

1565	   The 'Prefix' field for a given registered subtag exists in the IANA
1566	   registry as a guide to usage.  Additional prefixes MAY be added by
1567	   filing an additional registration form.  In that form, the "Any other
1568	   relevant information:" field MUST indicate that it is the addition of
1569	   a prefix.

1571	   Requests to add a prefix to a variant subtag that imply a different
1572	   semantic meaning will probably be rejected.  For example, a request
1573	   to add the prefix "de" to the subtag 'nedis' so that the tag "de-
1574	   nedis" represented some German dialect would be rejected.  The
1575	   'nedis' subtag represents a particular Slovenian dialect and the
1576	   additional registration would change the semantic meaning assigned to
1577	   the subtag.  A separate subtag SHOULD be proposed instead.

1579	   The 'Description' field MUST contain a description of the tag being
1580	   registered written or transcribed into the Latin script; it MAY also
1581	   include a description in a non-Latin script.  Non-ASCII characters
1582	   MUST be escaped using the syntax described in Section 3.1.  The
1583	   'Description' field is used for identification purposes and doesn't
1584	   necessarily represent the actual native name of the language or
1585	   variation or to be in any particular language.

1587	   While the 'Description' field itself is not guaranteed to be stable
1588	   and errata corrections MAY be undertaken from time to time, attempts
1589	   to provide translations or transcriptions of entries in the registry
1590	   itself will probably be frowned upon by the community or rejected
1591	   outright, as changes of this nature have an impact on the provisions
1592	   in Section 3.4.

1594	   When the two-week period has passed, the Language Subtag Reviewer
1595	   MUST take one of the following actions:

1597	   o  Explicitly accept the request and forward the form containing the
1598	      record to be inserted or modified to iana@iana.org according to
1599	      the procedure described in Section 3.3.

1601	   o  Explicitly reject the request because of significant objections
1602	      raised on the list or due to problems with constraints in this
1603	      document (which MUST be explicitly cited).

1605	   o  Extend the review period by granting an additional two-week
1606	      increment to permit further discussion.  After each two-week
1607	      increment, the Language Subtag Reviewer MUST indicate on the list
1608	      whether the registration has been accepted, rejected, or extended.

1610	   Note that the Language Subtag Reviewer MAY raise objections on the
1611	   list if he or she so desires.  The important thing is that the
1612	   objection MUST be made publicly.

1614	   Sometimes the request needs to be modified as a result of discussion
1615	   during the review period or due to requirements in this document.
1616	   The applicant, Language Subtag Reviewer, or others are free to submit
1617	   a modified version of the completed registration form, which will be
1618	   considered in lieu of the original request with the explicit approval
1619	   of the applicant.  Such changes do not restart the two-week
1620	   discussion period, although an application containing the final
1621	   record submitted to IANA MUST appear on the list at least one week
1622	   prior to the Language Subtag Reviewer forwarding the record to IANA.
1623	   The applicant is also free to modify a rejected application with
1624	   additional information and submit it again; this starts a new two-
1625	   week comment period.

1627	   Registrations initiated due to the provisions of Section 3.3 or
1628	   Section 3.4 SHALL NOT be rejected altogether (since they have to
1629	   ultimately appear in the registry) and SHOULD be completed as quickly
1630	   as possible.  The review process allows list members to comment on
1631	   the specific information in the form and the record it contains and
1632	   thus help ensure that it is correct and consistent.  The Language
1633	   Subtag Reviewer MAY reject a specific version of the form, but MUST
1634	   include in the rejection a suitable replacement, extending the review
1635	   period as described above, until the form is in a format worthy of
1636	   reviewer's approval.

1638	   Decisions made by the Language Subtag Reviewer MAY be appealed to the
1639	   IESG [RFC2028] under the same rules as other IETF decisions
1640	   [RFC2026].  This includes a decision to extend the review period or
1641	   the failure to announce a decision in a clear and timely manner.

1643	   The approved records appear in the Language Subtag Registry.  The
1644	   approved registration forms are available online under
1645	   http://www.iana.org/assignments/lang-subtags-templates/.

1647	   Updates or changes to existing records follow the same procedure as
1648	   new registrations.  The Language Subtag Reviewer decides whether
1649	   there is consensus to update the registration following the two week
1650	   review period; normally, objections by the original registrant will
1651	   carry extra weight in forming such a consensus.

1653	   Registrations are permanent and stable.  Once registered, subtags
1654	   will not be removed from the registry and will remain a valid way in
1655	   which to specify a specific language or variant.

1657	   Note: The purpose of the "Reference to published description" section
1658	   in the registration form is to aid in verifying whether a language is
1659	   registered or what language or language variation a particular subtag
1660	   refers to.  In most cases, reference to an authoritative grammar or
1661	   dictionary of that language will be useful; in cases where no such
1662	   work exists, other well-known works describing that language or in
1663	   that language MAY be appropriate.  The Language Subtag Reviewer
1664	   decides what constitutes "good enough" reference material.  This
1665	   requirement is not intended to exclude particular languages or
1666	   dialects due to the size of the speaker population or lack of a
1667	   standardized orthography.  Minority languages will be considered
1668	   equally on their own merits.

1670	3.6.  Possibilities for Registration

1672	   Possibilities for registration of subtags or information about
1673	   subtags include:

1675	   o  Primary language subtags for languages not listed in ISO 639 that
1676	      are not variants of any listed or registered language MAY be
1677	      registered.  At the time this document was created, there were no
1678	      examples of this form of subtag.  Before attempting to register a
1679	      language subtag, there MUST be an attempt to register the language
1680	      with ISO 639.  Subtags MUST NOT be registered for languages
1681	      defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3,
1682	      or that are under consideration by the ISO 639 registration
1683	      authorities, or that have never been attempted for registration
1684	      with those authorities.  If ISO 639 has previously rejected a
1685	      language for registration, it is reasonable to assume that there
1686	      must be additional, very compelling evidence of need before it
1687	      will be registered as a primary language subtag in the IANA
1688	      registry (to the extent that it is very unlikely that any subtags
1689	      will be registered of this type).

1691	   o  Dialect or other divisions or variations within a language, its
1692	      orthography, writing system, regional or historical usage,
1693	      transliteration or other transformation, or distinguishing
1694	      variation MAY be registered as variant subtags.  An example is the
1695	      'rozaj' subtag (the Resian dialect of Slovenian).

1697	   o  The addition or maintenance of fields (generally of an
1698	      informational nature) in Tag or Subtag records as described in
1699	      Section 3.1 and subject to the stability provisions in
1700	      Section 3.4.  This includes descriptions, comments, deprecation
1701	      and preferred values for obsolete or withdrawn codes, or the
1702	      addition of script or extlang information to primary language
1703	      subtags.

1705	   o  The addition of records and related field value changes necessary
1706	      to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and
1707	      UN M.49 as described in Section 3.4.

1709	   Subtags proposed for registration that would cause all or part of a
1710	   grandfathered tag to become redundant but whose meaning conflicts
1711	   with or alters the meaning of the grandfathered tag MUST be rejected.

1713	   This document leaves the decision on what subtags or changes to
1714	   subtags are appropriate (or not) to the registration process
1715	   described in Section 3.5.

1717	   Note: four-character primary language subtags are reserved to allow
1718	   for the possibility of alpha4 codes in some future addition to the
1719	   ISO 639 family of standards.

1721	   ISO 639 defines a maintenance agency for additions to and changes in
1722	   the list of languages in ISO 639.  This agency is:

1724	   International Information Centre for Terminology (Infoterm)
1725	   Aichholzgasse 6/12, AT-1120
1726	   Wien, Austria
1727	   Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72

1729	   ISO 639-2 defines a maintenance agency for additions to and changes
1730	   in the list of languages in ISO 639-2.  This agency is:

1732	   Library of Congress
1733	   Network Development and MARC Standards Office
1734	   Washington, D.C. 20540 USA
1735	   Phone: +1 202 707 6237 Fax: +1 202 707 0115
1736	   URL: http://www.loc.gov/standards/iso639-2

1738	   ISO 639-3 defines a maintenance agency for additions to and changes
1739	   in the list of languages in ISO 639-3.  This agency is:

1741	   SIL International
1742	   ISO 639-3 Registrar
1743	   7500 W. Camp Wisdom Rd.
1744	   Dallas, TX 75236 USA
1745	   Phone: +1 972 708 7400, ext. 2293 Fax: +1 972 708 7546
1746	   Email: iso639-3@sil.org
1747	   URL: http://www.sil.org/iso639-3

1749	   The maintenance agency for ISO 3166 (country codes) is:

1751	   ISO 3166 Maintenance Agency
1752	   c/o International Organization for Standardization
1753	   Case postale 56
1754	   CH-1211 Geneva 20 Switzerland
1755	   Phone: +41 22 749 72 33 Fax: +41 22 749 73 49
1756	   URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html

1758	   The registration authority for ISO 15924 (script codes) is:

1760	   Unicode Consortium Box 391476
1761	   Mountain View, CA 94039-1476, USA
1762	   URL: http://www.unicode.org/iso15924

1764	   The Statistics Division of the United Nations Secretariat maintains
1765	   the Standard Country or Area Codes for Statistical Use and can be
1766	   reached at:

1768	   Statistical Services Branch
1769	   Statistics Division
1770	   United Nations, Room DC2-1620
1771	   New York, NY 10017, USA

1773	   Fax: +1-212-963-0623
1774	   E-mail: statistics@un.org
1775	   URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm

1777	3.7.  Extensions and Extensions Registry

1779	   Extension subtags are those introduced by single-character subtags
1780	   ("singletons") other than 'x'.  They are reserved for the generation
1781	   of identifiers that contain a language component and are compatible
1782	   with applications that understand language tags.

1784	   The structure and form of extensions are defined by this document so
1785	   that implementations can be created that are forward compatible with
1786	   applications that might be created using singletons in the future.

1788	   In addition, defining a mechanism for maintaining singletons will
1789	   lend stability to this document by reducing the likely need for
1790	   future revisions or updates.

1792	   Single-character subtags are assigned by IANA using the "IETF
1793	   Consensus" policy defined by [RFC2434].  This policy requires the
1794	   development of an RFC, which SHALL define the name, purpose,
1795	   processes, and procedures for maintaining the subtags.  The
1796	   maintaining or registering authority, including name, contact email,
1797	   discussion list email, and URL location of the registry, MUST be
1798	   indicated clearly in the RFC.  The RFC MUST specify or include each
1799	   of the following:

1801	   o  The specification MUST reference the specific version or revision
1802	      of this document that governs its creation and MUST reference this
1803	      section of this document.

1805	   o  The specification and all subtags defined by the specification
1806	      MUST follow the ABNF and other rules for the formation of tags and
1807	      subtags as defined in this document.  In particular, it MUST
1808	      specify that case is not significant and that subtags MUST NOT
1809	      exceed eight characters in length.

1811	   o  The specification MUST specify a canonical representation.

1813	   o  The specification of valid subtags MUST be available over the
1814	      Internet and at no cost.

1816	   o  The specification MUST be in the public domain or available via a
1817	      royalty-free license acceptable to the IETF and specified in the
1818	      RFC.

1820	   o  The specification MUST be versioned, and each version of the
1821	      specification MUST be numbered, dated, and stable.

1823	   o  The specification MUST be stable.  That is, extension subtags,
1824	      once defined by a specification, MUST NOT be retracted or change
1825	      in meaning in any substantial way.

1827	   o  The specification MUST include in a separate section the
1828	      registration form reproduced in this section (below) to be used in
1829	      registering the extension upon publication as an RFC.

1831	   o  IANA MUST be informed of changes to the contact information and
1832	      URL for the specification.

1834	   IANA will maintain a registry of allocated single-character
1835	   (singleton) subtags.  This registry MUST use the record-jar format
1836	   described by the ABNF in Section 3.1.  Upon publication of an
1837	   extension as an RFC, the maintaining authority defined in the RFC
1838	   MUST forward this registration form to iesg@ietf.org, who MUST
1839	   forward the request to iana@iana.org.  The maintaining authority of
1840	   the extension MUST maintain the accuracy of the record by sending an
1841	   updated full copy of the record to iana@iana.org with the subject
1842	   line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes.  Only
1843	   the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY
1844	   be modified in these updates.

1846	   Failure to maintain this record, maintain the corresponding registry,
1847	   or meet other conditions imposed by this section of this document MAY
1848	   be appealed to the IESG [RFC2028] under the same rules as other IETF
1849	   decisions (see [RFC2026]) and MAY result in the authority to maintain
1850	   the extension being withdrawn or reassigned by the IESG.
1851	   %%
1852	   Identifier:
1853	   Description:
1854	   Comments:
1855	   Added:
1856	   RFC:
1857	   Authority:
1858	   Contact_Email:
1859	   Mailing_List:
1860	   URL:
1861	   %%

1863	    Figure 6: Format of Records in the Language Tag Extensions Registry

1865	   'Identifier' contains the single-character subtag (singleton)
1866	   assigned to the extension.  The Internet-Draft submitted to define
1867	   the extension SHOULD specify which letter or digit to use, although
1868	   the IESG MAY change the assignment when approving the RFC.

1870	   'Description' contains the name and description of the extension.

1872	   'Comments' is an OPTIONAL field and MAY contain a broader description
1873	   of the extension.

1875	   'Added' contains the date the RFC was published in the "full-date"
1876	   format specified in [RFC3339].  For example: 2004-06-28 represents
1877	   June 28, 2004, in the Gregorian calendar.

1879	   'RFC' contains the RFC number assigned to the extension.

1881	   'Authority' contains the name of the maintaining authority for the
1882	   extension.

1884	   'Contact_Email' contains the email address used to contact the
1885	   maintaining authority.

1887	   'Mailing_List' contains the URL or subscription email address of the
1888	   mailing list used by the maintaining authority.

1890	   'URL' contains the URL of the registry for this extension.

1892	   The determination of whether an Internet-Draft meets the above
1893	   conditions and the decision to grant or withhold such authority rests
1894	   solely with the IESG and is subject to the normal review and appeals
1895	   process associated with the RFC process.

1897	   Extension authors are strongly cautioned that many (including most
1898	   well-formed) processors will be unaware of any special relationships
1899	   or meaning inherent in the order of extension subtags.  Extension
1900	   authors SHOULD avoid subtag relationships or canonicalization
1901	   mechanisms that interfere with matching or with length restrictions
1902	   that sometimes exist in common protocols where the extension is used.
1903	   In particular, applications MAY truncate the subtags in doing
1904	   matching or in fitting into limited lengths, so it is RECOMMENDED
1905	   that the most significant information be in the most significant
1906	   (left-most) subtags and that the specification gracefully handle
1907	   truncated subtags.

1909	   When a language tag is to be used in a specific, known, protocol, it
1910	   is RECOMMENDED that that the language tag not contain extensions not
1911	   supported by that protocol.  In addition, note that some protocols
1912	   MAY impose upper limits on the length of the strings used to store or
1913	   transport the language tag.

1915	3.8.  Update of the Language Subtag Registry

1917	   Upon adoption of this document the IANA Language Subtag Registry will
1918	   need an update so that it contains the complete set of subtags valid
1919	   in a language tag.  This collection of subtags, along with a
1920	   description of the process used to create it, is described by
1921	   [registry-update].  IANA will publish the updated version of the
1922	   registry described by this document using the instructions and
1923	   content of [registry-update].  Once published by IANA, the
1924	   maintenance procedures, rules, and registration processes described
1925	   in this document will be available for new registrations or updates.

1927	   Registrations that are in process under the rules defined in
1928	   [RFC4646] when this document is adopted MUST be completed under the
1929	   rules contained in this document.

1931	4.  Formation and Processing of Language Tags

1933	   This section addresses how to use the information in the registry
1934	   with the tag syntax to choose, form, and process language tags.

1936	4.1.  Choice of Language Tag

1938	   The guiding principle in forming language tags is to "tag content
1939	   wisely."  This means that sometimes there is a choice between several
1940	   possible tags for the same content and that the choice of which tag
1941	   to use depends on the content and application in question.

1943	   Interoperability is best served when the same language tag is used
1944	   consistently to represent the same language.  If an application has
1945	   requirements that make the rules here inapplicable, then that
1946	   application risks damaging interoperability.  It is strongly
1947	   RECOMMENDED that users not define their own rules for language tag
1948	   choice.

1950	   A subtag SHOULD only be used when it adds useful distinguishing
1951	   information to the tag.  Extraneous subtags interfere with the
1952	   meaning, understanding, and processing of language tags.  In
1953	   particular, users and implementations SHOULD follow the 'Prefix' and
1954	   'Suppress-Script' fields in the registry (defined in Section 3.1):
1955	   these fields provide guidance on when specific additional subtags
1956	   SHOULD be used or avoided in a language tag.

1958	   In particular, some applications can benefit from the use of script
1959	   subtags in language tags, as long as the use is consistent for a
1960	   given context.  Script subtags are never appropriate for unwritten
1961	   content (such as audio recordings).

1963	   Script subtags were not formally defined in [RFC3066] and their use
1964	   can affect matching and subtag identification for implementations of
1965	   RFC 3066, as these subtags appear between the primary language and
1966	   region subtags.  For example, if an implementation selects content
1967	   using Basic Filtering [RFC4647] (originally described in Section 2.5
1968	   of [RFC3066]) and the user requested the language range "en-US",
1969	   content labeled "en-Latn-US" will not match the request and thus not
1970	   be selected.  Therefore, it is important to know when script subtags
1971	   will customarily be used and when they ought not be used.  In the
1972	   registry, the Suppress-Script field helps ensure greater
1973	   compatibility between the language tags by defining when users SHOULD
1974	   NOT include a script subtag with a particular primary language
1975	   subtag.

1977	   Extended language subtags (type 'extlang' in the registry; see
1978	   Section 3.1) also appear between the primary language and subsequent
1979	   (script, region, or variant) subtags.  Applications sometimes benefit
1980	   from their judicious use in forming language tags.

1982	   Standards, protocols, and applications that reference this document
1983	   normatively but apply different rules to the ones given in this
1984	   section MUST specify how language tag selection varies from the
1985	   guidelines given here.

1987	   The choice of subtags used to form a language tag SHOULD be guided by
1988	   the following rules:

1990	   1.  Use as precise a tag as possible, but no more specific than is
1991	       justified.  Avoid using subtags that are not important for
1992	       distinguishing content in an application.

1994	       *  For example, 'de' might suffice for tagging an email written
1995	          in German, while "de-CH-1996" is probably unnecessarily
1996	          precise for such a task.

1998	   2.  The script subtag SHOULD NOT be used to form language tags unless
1999	       the script adds some distinguishing information to the tag.  The
2000	       field 'Suppress-Script' in the primary language record in the
2001	       registry indicates script subtags that do not add distinguishing
2002	       information for most applications.  For example:

2004	       *  The subtag 'Latn' should not be used with the primary language
2005	          'en' because nearly all English documents are written in the
2006	          Latin script and it adds no distinguishing information.
2007	          However, if a document were written in English mixing Latin
2008	          script with another script such as Braille ('Brai'), then it
2009	          might be appropriate to choose to indicate both scripts to aid
2010	          in content selection, such as the application of a style
2011	          sheet.

2013	       *  When labeling content that is unwritten (such as a recording
2014	          of human speech), the script subtag should not be used, even
2015	          if the language is customarily written in several scripts.
2016	          Thus the subtitles to a movie might use the tag "zh-cmn-Hant"
2017	          (Chinese, Mandarin, Traditional script), but the audio track
2018	          for the same language would be tagged "zh-cmn".

2020	   3.  If a tag or subtag has a 'Preferred-Value' field in its registry
2021	       entry, then the value of that field SHOULD be used to form the
2022	       language tag in preference to the tag or subtag in which the
2023	       preferred value appears.

2025	       *  For example, use 'he' for Hebrew in preference to 'iw'.

2027	   4.  [ISO639-2] has defined several codes included in the subtag
2028	       registry that require additional care when choosing language
2029	       tags.  In most of these cases, where omitting the language tag is
2030	       permitted, such omission is preferable to using these codes.
2031	       Language tags SHOULD NOT incorporate these subtags as a prefix,
2032	       unless the additional information conveys some value to the
2033	       application.

2035	       1.  Use specific language subtags or subtag sequences in
2036	           preference to subtags for language collections.  A "language
2037	           collection" is a subtag derived from one of the [ISO639-2]
2038	           codes that represents multiple related languages.  These
2039	           codes are included as primary language subtags in the
2040	           registry.  For example, the code 'cmc' represents "Chamic
2041	           languages".  The registry contains values for each of the
2042	           approximately ten individual languages represented by this
2043	           collective code.  Some other examples include the subtags
2044	           Germanic ('ger') or Algonquian languages ('alg').  Since
2045	           these codes are interpreted inclusively, content tagged with
2046	           "en" (English), "de" (German), or "gsw" (Swiss German,
2047	           Alemannic) could also (but SHOULD NOT) be tagged with "ger"
2048	           (Germanic languages).  Subtags derived from collection codes
2049	           SHOULD NOT be used be used unless more specific language
2050	           information is not available.  Note that matching
2051	           implementations generally do not understand the relationship
2052	           between the collection and its encompassed languages, and so
2053	           users ought not assume a subtag based on a language
2054	           collection is a useful means for selecting content in its
2055	           encompassed languages.

2057	       2.  The 'mul' (Multiple) primary language subtag is intended to
2058	           identify content in multiple languages.  It SHOULD NOT be
2059	           used when a list of languages (such as Content-Language) or
2060	           individual tags for each content element can be used instead.

2062	       3.  The 'und' (Undetermined) primary language subtag is intended
2063	           to identify linguistic content whose language is not known.
2064	           It SHOULD NOT be used unless a language tag is required and
2065	           language information is not available or cannot be
2066	           determined.  Omitting the language tag (where permitted) is
2067	           preferred.  The 'und' subtag MAY be useful for protocols that
2068	           require a language tag to be provided or where a primary
2069	           language subtag is required (such as in "und-Latn").  The
2070	           'und' subtag MAY also be useful when matching language tags
2071	           in certain situations.

2073	       4.  The 'zxx' (Non-Linguistic) primary language subtag is
2074	           intended to identify content that has no language.  Some
2075	           examples might include instrumental or electronic music;
2076	           sound recordings consisting of nonverbal sounds; audiovisual
2077	           materials with no narration, printed titles, or subtitles;
2078	           machine-readable data files consisting of machine languages
2079	           or character codes; or programming source code.  Note: where
2080	           there are fragments of linguistic content, such as
2081	           programming source code containing comments written in
2082	           English, the subtag 'zxx' might still be used to indicate the
2083	           primary status of the content, just as 'en' can be applied to
2084	           a predominantly English text that contains a few French
2085	           phrases.

2087	       5.  The 'mis' (Miscellaneous) primary language subtag is derived
2088	           from a collective code and is used to identify linguistic
2089	           content whose language is known but cannot otherwise be
2090	           identified.  It is commonly used when the range of language
2091	           tags is constrained or for languages not otherwise
2092	           categorized.  For example, a library application might be
2093	           limited to the set of subtags defined for use by the [MARC21]
2094	           standard.  The 'mis' subtag might be used by this application
2095	           for languages not included in that set.  It SHOULD NOT be
2096	           used unless a language tag is required and no other means of
2097	           identifying the language is available.

2099	       6.  The grandfathered tag "i-default" (Default Language) was
2100	           originally registered according to [RFC1766] to meet the
2101	           needs of [RFC2277].  It is used to indicate not a specific
2102	           language, but rather, it identifies the condition or content
2103	           used where the language preferences of the user cannot be
2104	           established.  It SHOULD NOT be used except as a means of
2105	           labeling the default content for applications or protocols
2106	           that require default language content to be labeled with that
2107	           specific tag.  It MAY also be used by an application or
2108	           protocol to identify when the default language content is
2109	           being returned.

2111	   5.  The same variant subtag MUST NOT be used more than once within a
2112	       language tag.

2114	       *  For example, the tag "de-DE-1901-1901" is not valid.

2116	   To ensure consistent backward compatibility, this document contains
2117	   several provisions to account for potential instability in the
2118	   standards used to define the subtags that make up language tags.
2119	   These provisions mean that no language tag created under the rules in
2120	   this document will become invalid.

2122	4.2.  Meaning of the Language Tag

2124	   The relationship between the tag and the information it relates to is
2125	   defined by the context in which the tag appears.  Accordingly, this
2126	   section gives only possible examples of its usage.

2128	   o  For a single information object, the associated language tags
2129	      might be interpreted as the set of languages that is necessary for
2130	      a complete comprehension of the complete object.  Example: Plain
2131	      text documents.

2133	   o  For an aggregation of information objects, the associated language
2134	      tags could be taken as the set of languages used inside components
2135	      of that aggregation.  Examples: Document stores and libraries.

2137	   o  For information objects whose purpose is to provide alternatives,
2138	      the associated language tags could be regarded as a hint that the
2139	      content is provided in several languages and that one has to
2140	      inspect each of the alternatives in order to find its language or
2141	      languages.  In this case, the presence of multiple tags might not
2142	      mean that one needs to be multi-lingual to get complete
2143	      understanding of the document.  Example: MIME multipart/
2144	      alternative.

2146	   o  In markup languages, such as HTML and XML, language information
2147	      can be added to each part of the document identified by the markup
2148	      structure (including the whole document itself).  For example, one
2149	      could write <span lang="fr">C'est la vie.</span> inside a
2150	      Norwegian document; the Norwegian-speaking user could then access
2151	      a French-Norwegian dictionary to find out what the marked section
2152	      meant.  If the user were listening to that document through a
2153	      speech synthesis interface, this formation could be used to signal
2154	      the synthesizer to appropriately apply French text-to-speech
2155	      pronunciation rules to that span of text, instead of applying the
2156	      inappropriate Norwegian rules.

2158	   Language tags are related when they contain a similar sequence of
2159	   subtags.  For example, if a language tag B contains language tag A as
2160	   a prefix, then B is typically "narrower" or "more specific" than A.
2161	   Thus, "zh-Hant-TW" is more specific than "zh-Hant".

2163	   This relationship is not guaranteed in all cases: specifically,
2164	   languages that begin with the same sequence of subtags are NOT
2165	   guaranteed to be mutually intelligible, although they might be.  For
2166	   example, the tag "az" shares a prefix with both "az-Latn"
2167	   (Azerbaijani written using the Latin script) and "az-Cyrl"
2168	   (Azerbaijani written using the Cyrillic script).  A person fluent in
2169	   one script might not be able to read the other, even though the text
2170	   might be identical.  Content tagged as "az" most probably is written
2171	   in just one script and thus might not be intelligible to a reader
2172	   familiar with the other script.

2174	4.3.  Length Considerations

2176	   There is no defined upper limit on the size of language tags.  While
2177	   historically most language tags have consisted of language and region
2178	   subtags with a combined total length of up to six characters, larger
2179	   tags have always been both possible and actually appeared in use.

2181	   Neither the language tag syntax nor other requirements in this
2182	   document impose a fixed upper limit on the number of subtags in a
2183	   language tag (and thus an upper bound on the size of a tag).  The
2184	   language tag syntax suggests that, depending on the specific
2185	   language, more subtags (and thus a longer tag) are sometimes
2186	   necessary to completely identify the language for certain
2187	   applications; thus, it is possible to envision long or complex subtag
2188	   sequences.

2190	4.3.1.  Working with Limited Buffer Sizes

2192	   Some applications and protocols are forced to allocate fixed buffer
2193	   sizes or otherwise limit the length of a language tag.  A conformant
2194	   implementation or specification MAY refuse to support the storage of
2195	   language tags that exceed a specified length.  Any such limitation
2196	   SHOULD be clearly documented, and such documentation SHOULD include
2197	   what happens to longer tags (for example, whether an error value is
2198	   generated or the language tag is truncated).  A protocol that allows
2199	   tags to be truncated at an arbitrary limit, without giving any
2200	   indication of what that limit is, has the potential for causing harm
2201	   by changing the meaning of tags in substantial ways.

2203	   In practice, most language tags do not require more than a few
2204	   subtags and will not approach reasonably sized buffer limitations;
2205	   see Section 4.1.

2207	   Some specifications or protocols have limits on tag length but do not
2208	   have a fixed length limitation.  For example, [RFC2231] has no
2209	   explicit length limitation: the length available for the language tag
2210	   is constrained by the length of other header components (such as the
2211	   charset's name) coupled with the 76-character limit in [RFC2047].
2212	   Thus, the "limit" might be 50 or more characters, but it could
2213	   potentially be quite small.

2215	   The considerations for assigning a buffer limit are:

2217	      Implementations SHOULD NOT truncate language tags unless the
2218	      meaning of the tag is purposefully being changed, or unless the
2219	      tag does not fit into a limited buffer size specified by a
2220	      protocol for storage or transmission.

2222	      Implementations SHOULD warn the user when a tag is truncated since
2223	      truncation changes the semantic meaning of the tag.

2225	      Implementations of protocols or specifications that are space
2226	      constrained but do not have a fixed limit SHOULD use the longest
2227	      possible tag in preference to truncation.

2229	      Protocols or specifications that specify limited buffer sizes for
2230	      language tags MUST allow for language tags of up to 33 characters.

2232	      Protocols or specifications that specify limited buffer sizes for
2233	      language tags SHOULD allow for language tags of at least 42
2234	      characters.

2236	   The following illustration shows how the 42-character recommendation
2237	   was derived.  The combination of language and extended language
2238	   subtags was chosen for future compatibility.  At up to 15 characters,
2239	   this combination is longer than the longest possible primary language
2240	   subtag (8 characters):

2242	   language      =  3 (ISO 639-2; ISO 639-1 requires 2)
2243	   extlang1      =  4 (each subsequent subtag includes '-')
2244	   extlang2      =  4 (unlikely: needs prefix="language-extlang1")
2245	   extlang3      =  4 (extremely unlikely)
2246	   script        =  5 (if not suppressed: see Section 4.1)
2247	   region        =  4 (UN M.49; ISO 3166 requires 3)
2248	   variant1      =  9 (needs 'language' as a prefix)
2249	   variant2      =  9 (needs 'language-variant1' as a prefix)

2251	   total         = 42 characters

2253	              Figure 7: Derivation of the Limit on Tag Length

2255	4.3.2.  Truncation of Language Tags

2257	   Truncation of a language tag alters the meaning of the tag, and thus
2258	   SHOULD be avoided.  However, truncation of language tags is sometimes
2259	   necessary due to limited buffer sizes.  Such truncation MUST NOT
2260	   permit a subtag to be chopped off in the middle or the formation of
2261	   invalid tags (for example, one ending with the "-" character).

2263	   This means that applications or protocols that truncate tags MUST do
2264	   so by progressively removing subtags along with their preceding "-"
2265	   from the right side of the language tag until the tag is short enough
2266	   for the given buffer.  If the resulting tag ends with a single-
2267	   character subtag, that subtag and its preceding "-" MUST also be
2268	   removed.  For example:

2270	   Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1
2271	   1. zh-Latn-CN-variant1-a-extend1-x-wadegile
2272	   2. zh-Latn-CN-variant1-a-extend1
2273	   3. zh-Latn-CN-variant1
2274	   4. zh-Latn-CN
2275	   5. zh-Latn
2276	   6. zh

2278	                    Figure 8: Example of Tag Truncation

2280	4.4.  Canonicalization of Language Tags

2282	   Since a particular language tag is sometimes used by many processes,
2283	   language tags SHOULD always be created or generated in a canonical
2284	   form.

2286	   A language tag is in canonical form when:

2288	   1.  The tag is well-formed according the rules in Section 2.1 and
2289	       Section 2.2.

2291	   2.  Subtags of type 'Region' that have a Preferred-Value mapping in
2292	       the IANA registry (see Section 3.1) SHOULD be replaced with their
2293	       mapped value.  Note: In rare cases, the mapped value will also
2294	       have a Preferred-Value.

2296	   3.  Redundant or grandfathered tags that have a Preferred-Value
2297	       mapping in the IANA registry (see Section 3.1) MUST be replaced
2298	       with their mapped value.  These items either are deprecated
2299	       mappings created before the adoption of this document (such as
2300	       the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are
2301	       the result of later registrations or additions to this document
2302	       (for example, "zh-hakka" was deprecated in favor of the language-
2303	       extlang combination "zh-hak" when this document was adopted).

2305	   4.  Other subtags that have a Preferred-Value mapping in the IANA
2306	       registry (see Section 3.1) MUST be replaced with their mapped
2307	       value.  These items consist entirely of clerical corrections to
2308	       ISO 639-1 in which the deprecated subtags have been maintained
2309	       for compatibility purposes.

2311	   5.  If more than one extension subtag sequence exists, the extension
2312	       sequences are ordered into case-insensitive ASCII order by
2313	       singleton subtag.

2315	   Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical
2316	   form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in
2317	   canonical form.

2319	   Example: The language tag "en-BU" (English as used in Burma) is not
2320	   canonical because the 'BU' subtag has a canonical mapping to 'MM'
2321	   (Myanmar), although the tag "en-BU" maintains its validity.

2323	   Canonicalization of language tags does not imply anything about the
2324	   use of upper or lowercase letters when processing or comparing
2325	   subtags (and as described in Section 2.1).  All comparisons MUST be
2326	   performed in a case-insensitive manner.

2328	   When performing canonicalization of language tags, processors MAY
2329	   regularize the case of the subtags (that is, this process is
2330	   OPTIONAL), following the case used in the registry.  Note that this
2331	   corresponds to the following casing rules: uppercase all non-initial
2332	   two-letter subtags; titlecase all non-initial four-letter subtags;
2333	   lowercase everything else.

2335	   Note: Case folding of ASCII letters in certain locales, unless
2336	   carefully handled, sometimes produces non-ASCII character values.
2337	   The Unicode Character Database file "SpecialCasing.txt" defines the
2338	   specific cases that are known to cause problems with this.  In
2339	   particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is
2340	   uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE).
2341	   Implementers SHOULD specify a locale-neutral casing operation to
2342	   ensure that case folding of subtags does not produce this value,
2343	   which is illegal in language tags.  For example, if one were to
2344	   uppercase the region subtag 'in' using Turkish locale rules, the
2345	   sequence U+0130 U+004E would result instead of the expected 'IN'.

2347	   Note: if the field 'Deprecated' appears in a registry record without
2348	   an accompanying 'Preferred-Value' field, then that tag or subtag is
2349	   deprecated without a replacement.  Validating processors SHOULD NOT
2350	   generate tags that include these values, although the values are
2351	   canonical when they appear in a language tag.

2353	   An extension MUST define any relationships that exist between the
2354	   various subtags in the extension and thus MAY define an alternate
2355	   canonicalization scheme for the extension's subtags.  Extensions MAY
2356	   define how the order of the extension's subtags are interpreted.  For
2357	   example, an extension could define that its subtags are in canonical
2358	   order when the subtags are placed into ASCII order: that is, "en-a-
2359	   aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa".  Another extension might
2360	   define that the order of the subtags influences their semantic
2361	   meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
2362	   aaa-bbb-ccc").  However, extension specifications SHOULD be designed
2363	   so that they are tolerant of the typical processes described in
2364	   Section 3.7.

2366	4.5.  Considerations for Private Use Subtags

2368	   Private use subtags, like all other subtags, MUST conform to the
2369	   format and content constraints in the ABNF.  Private use subtags have
2370	   no meaning outside the private agreement between the parties that
2371	   intend to use or exchange language tags that employ them.  The same
2372	   subtags MAY be used with a different meaning under a separate private
2373	   agreement.  They SHOULD NOT be used where alternatives exist and
2374	   SHOULD NOT be used in content or protocols intended for general use.

2376	   Private use subtags are simply useless for information exchange
2377	   without prior arrangement.  The value and semantic meaning of private
2378	   use tags and of the subtags used within such a language tag are not
2379	   defined by this document.

2381	   Subtags defined in the IANA registry as having a specific private use
2382	   meaning convey more information that a purely private use tag
2383	   prefixed by the singleton subtag 'x'.  For applications, this
2384	   additional information MAY be useful.

2386	   For example, the region subtags 'AA', 'ZZ', and in the ranges
2387	   'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY
2388	   be used to form a language tag.  A tag such as "zh-Hans-XQ" conveys a
2389	   great deal of public, interchangeable information about the language
2390	   material (that it is Chinese in the simplified Chinese script and is
2391	   suitable for some geographic region 'XQ').  While the precise
2392	   geographic region is not known outside of private agreement, the tag
2393	   conveys far more information than an opaque tag such as "x-someLang",
2394	   which contains no information about the language subtag or script
2395	   subtag outside of the private agreement.

2397	   However, in some cases content tagged with private use subtags MAY
2398	   interact with other systems in a different and possibly unsuitable
2399	   manner compared to tags that use opaque, privately defined subtags,
2400	   so the choice of the best approach sometimes depends on the
2401	   particular domain in question.

2403	5.  IANA Considerations

2405	   This section deals with the processes and requirements necessary for
2406	   IANA to undertake to maintain the subtag and extension registries as
2407	   defined by this document and in accordance with the requirements of
2408	   [RFC2434].

2410	   The impact on the IANA maintainers of the two registries defined by
2411	   this document will be a small increase in the frequency of new
2412	   entries or updates.

2414	5.1.  Language Subtag Registry

2416	   Upon adoption of this document, IANA will update the registry using
2417	   instructions and content provided in a companion document:
2418	   [registry-update].  The criteria and process for selecting the
2419	   updated set of records are described in that document.  The updated
2420	   set of records represents no impact on IANA, since the work to create
2421	   it will be performed externally.

2423	   Future work on the Language Subtag Registry has been limited to
2424	   inserting or replacing whole records preformatted for IANA by the
2425	   Language Subtag Reviewer as described in Section 3.3 of this document
2426	   and archiving and making publically available the forwarded
2427	   registration form.

2429	   Each registration form sent to IANA contains a single record for
2430	   incorporation into the registry.  The form MUST be sent to
2431	   iana@iana.org by the Language Subtag Reviewer.  It will have a
2432	   subject line indicating whether the enclosed form represents an
2433	   insertion of a new record (indicated by the word "INSERT" in the
2434	   subject line) or a replacement of an existing record (indicated by
2435	   the word "MODIFY" in the subject line).  Records MUST NOT be deleted
2436	   from the registry.

2438	   IANA MUST extract the record from the form and place the inserted or
2439	   modified record into the appropriate section of the language subtag
2440	   registry, grouping the records by their 'Type' field.  Inserted
2441	   records MAY be placed anywhere in the appropriate section; there is
2442	   no guarantee of the order of the records beyond grouping them
2443	   together by 'Type'.  Modified records MUST overwrite the record they
2444	   replace.

2446	   IANA MUST update the File-Date record to contain the most recent
2447	   modification date when performing any inserting or modification:
2448	   included in any request to insert or modify records will be a new
2449	   File-Date record indicating the acceptance date of the record.  This
2450	   record MUST be placed first in the registry, replacing the existing
2451	   File-Date record.  In the event that the File-Date record present in
2452	   the registry has a later date than the record being inserted or
2453	   modified, then the latest (most recent) record MUST be preserved.
2454	   IANA SHOULD process multiple registration requests in order according
2455	   to the File-Date in the form, since one registration could otherwise
2456	   cause a more recent change to be overwritten.

2458	   The registration form sent to IANA MUST be archived and made publicly
2459	   available from
2460	   "http://www.iana.org/assignments/lang-subtags-templates/".  Note that
2461	   multiple registrations can pertain to the same record in the
2462	   registry.

2464	5.2.  Extensions Registry

2466	   The Language Tag Extensions Registry can contain at most 35 records
2467	   and thus changes to this registry are expected to be very infrequent.

2469	   Future work by IANA on the Language Tag Extensions Registry is
2470	   limited to two cases.  First, the IESG MAY request that new records
2471	   be inserted into this registry from time to time.  These requests
2472	   MUST include the record to insert in the exact format described in
2473	   Section 3.7.  In addition, there MAY be occasional requests from the
2474	   maintaining authority for a specific extension to update the contact
2475	   information or URLs in the record.  These requests MUST include the
2476	   complete, updated record.  IANA is not responsible for validating the
2477	   information provided, only that it is properly formatted.  It should
2478	   reasonably be seen to come from the maintaining authority named in
2479	   the record present in the registry.

2481	6.  Security Considerations

2483	   Language tags used in content negotiation, like any other information
2484	   exchanged on the Internet, might be a source of concern because they
2485	   might be used to infer the nationality of the sender, and thus
2486	   identify potential targets for surveillance.

2488	   This is a special case of the general problem that anything sent is
2489	   visible to the receiving party and possibly to third parties as well.
2490	   It is useful to be aware that such concerns can exist in some cases.

2492	   The evaluation of the exact magnitude of the threat, and any possible
2493	   countermeasures, is left to each application protocol (see BCP 72
2494	   [RFC3552] for best current practice guidance on security threats and
2495	   defenses).

2497	   The language tag associated with a particular information item is of
2498	   no consequence whatsoever in determining whether that content might
2499	   contain possible homographs.  The fact that a text is tagged as being
2500	   in one language or using a particular script subtag provides no
2501	   assurance whatsoever that it does not contain characters from scripts
2502	   other than the one(s) associated with or specified by that language
2503	   tag.

2505	   Since there is no limit to the number of variant, private use, and
2506	   extension subtags, and consequently no limit on the possible length
2507	   of a tag, implementations need to guard against buffer overflow
2508	   attacks.  See Section 4.3 for details on language tag truncation,
2509	   which can occur as a consequence of defenses against buffer overflow.

2511	   Although the specification of valid subtags for an extension (see
2512	   Section 3.7) MUST be available over the Internet, implementations
2513	   SHOULD NOT mechanically depend on it being always accessible, to
2514	   prevent denial-of-service attacks.

2516	7.  Character Set Considerations

2518	   The syntax in this document requires that language tags use only the
2519	   characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
2520	   character sets, so the composition of language tags should not have
2521	   any character set issues.

2523	   Rendering of characters based on the content of a language tag is not
2524	   addressed in this memo.  Historically, some languages have relied on
2525	   the use of specific character sets or other information in order to
2526	   infer how a specific character should be rendered (notably this
2527	   applies to language- and culture-specific variations of Han
2528	   ideographs as used in Japanese, Chinese, and Korean).  When language
2529	   tags are applied to spans of text, rendering engines sometimes use
2530	   that information in deciding which font to use in the absence of
2531	   other information, particularly where languages with distinct writing
2532	   traditions use the same characters.

2534	8.  Changes from RFC 4646

2536	   The main goal for this revision of this document was to incorporate
2537	   ISO 639-3 and its attendent set of language codes into the IANA
2538	   Language Subtag Registry, permitting the identification of many more
2539	   languages and dialects than previously supported.

2541	   The specific changes in this document to meet these goals are:

2543	   o  Defines the incorporation of ISO 639-3 codes as language and
2544	      extlang subtags.  Extlangs are now permitted in language tags.
2545	      The changes necessary to achieve this were:

2547	      *  something

2549	   o  Changed the ABNF related to grandfathered tags.  The irregular
2550	      tags are now listed.  Well-formed grandfathered tags are now
2551	      described by the 'langtag' production and the 'grandfathered'
2552	      production was removed as a result.  Also: added description of
2553	      both types of grandfathered tags to Section 2.2.8.

2555	   o  Added the paragraph on "collections" to Section 4.1.

2557	   o  Changed the capitalization rules for 'Tag' fields in Section 3.1.

2559	   o  Split section 3.1 up into subsections.

2561	   o  Modified section 3.5 to allow Suppress-Script fields to be added,
2562	      modified, or removed via the registration process.  This was an
2563	      erratum from RFC 4646.

2565	   o  Modified examples that used region code 'CS' (formerly Serbia and
2566	      Montenegro) to use 'RS' (Serbia) instead.

2568	   o  Modified the rules for creating and maintaining record
2569	      'Description' fields to prevent duplicates, including inverted
2570	      duplicates.

2572	   o  Removed the lengthy description of why RFC 4646 was created from
2573	      this section, which also caused the removal of the reference to
2574	      XML Schema.

2576	   o  Modified the text in section 2.1 to place more emphasis on the
2577	      fact that language tags are not case sensitive.

2579	   o  Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS"
2580	      and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the
2581	      Suppress-Script on 'Latn' with 'fr'.

2583	   o  Changed the requirements for well-formedness to make singleton
2584	      repetition checking optional (it is required for validity
2585	      checking) in Section 2.2.9.

2587	   o  Changed the text in Section 2.2.9 refering to grandfathered
2588	      checking to note that the list is now included in the ABNF.

2590	   o  Modified and added text to Section 3.2.  The job description was
2591	      placed first.  A note was added making clear that the Language
2592	      Subtag Reviewer may delegate various non-critical duties,
2593	      including list moderation.  Finally, additional text was added to
2594	      make the appointment process clear and to clarify that decisions
2595	      and performance of the reviewer are appealable.

2597	   o  Added text to Section 3.5 clarifying that the ietf-languages list
2598	      is operated by whomever the IESG appoints.

2600	   o  Added text to Section 3.1.4 clarifying that the first Description
2601	      in a 'language' or 'extlang' record matches the corresponding
2602	      Reference Name for the language in ISO 639-3.

2604	   o  Modified Section 2.2.9 to define classes of conformance related to
2605	      specific tags (formerly 'well-formed' and 'valid' referred to
2606	      implementations).

2608	   o  Added text to the end of Section 3.1.2 noting that future versions
2609	      of this document might add new field types and recommending that
2610	      implementations ignore any unrecognized fields.

2612	   o  Modified the 'extlang' examples in Appendix A to use valid subtags
2613	      and removed the note saying that they were only examples.

2615	   o  Added text about what the lack of a Suppress-Script field means in
2616	      a record to Section 3.1.9.

2618	   o  Added text allowing the correction of misspellings and typographic
2619	      errors to Section 3.1.4.

2621	   o  Added text to Section 3.1.7 disallowing Prefix field conflicts
2622	      (such as circular prefix references).

2624	   o  Modified text in Section 3.5 to require the subtag reviewer to
2625	      announce his/her decision (or extension) following the two-week
2626	      period.  Also clarified that any decision or failure to decide can
2627	      be appealed.

2629	   o  Modified text in Section 4.1 to include the (heretofore anecdotal)
2630	      guiding principle of tag choice, and clarifying the non-use of
2631	      script subtags in non-written applications.  Also updated examples
2632	      in this section to use Chamic languages as an example of language
2633	      collections.

2635	   o  Prohibited multiple use of the same variant in a tag (i.e. "de-
2636	      1901-1901").  Previously this was only a recommendation
2637	      ("SHOULD").

2639	   o  Removed inappropriate [RFC2119] language from the illustration in
2640	      Section 4.3.1.

2642	   o  Replaced the example of "zh-gouyu" with "zh-hakka"->"zh-hak" in
2643	      Section 4.4, noting that it was this document that caused the
2644	      change.

2646	   o  Replaced the section in Section 4.1 dealing with "mul"/"und" to
2647	      include the subtags 'zxx' and 'mis', as well as the tag
2648	      "i-default".  A normative reference to RFC 2277 was added, along
2649	      with an informative reference to MARC21.

2651	   o  Added text to Section 3.5 clarifying that any modifications of a
2652	      registration request must be sent to the ietf-languages list
2653	      before submission to IANA.

2655	   o  Changed the ABNF for the record-jar format from using the LWSP
2656	      production to use the FWS production intead.  This effectively
2657	      prevents blank lines in the file.

2659	   o  Clarified and revised text in Section 3.3, Section 3.5, and
2660	      Section 5.1 to clarify that the Language Subtag Reviewer sends the
2661	      complete registration forms to IANA, that IANA extracts the record
2662	      from the form, and that the forms must also be archived separately
2663	      from the registry.

2665	   [[Ed.Note: Open issues in this version:

2667	      Whether encompassed language rules for the creation of extlang
2668	      records in the registry should be retained or modified.

2670	      Modification of the registry to use UTF-8 as its character
2671	      encoding. (removed and apparently rejected)

2673	      Details of the appointment, term duration, performance review of
2674	      the subtag reviewer by the IESG. (addressed?)

2676	      Inclusion of additional information related to Suppress-Script in
2677	      the registry (e.g. that it wasn't assigned on purpose)

2679	   ]]

2681	9.  References

2683	9.1.  Normative References

2685	   [ISO10646]
2686	              International Organization for Standardization, "ISO/IEC
2687	              10646:2003. Information technology -- Universal Multiple-
2688	              Octet Coded Character Set (UCS)", 2003.

2690	   [ISO15924]
2691	              International Organization for Standardization, "ISO
2692	              15924:2004. Information and documentation -- Codes for the
2693	              representation of names of scripts", January 2004.

2695	   [ISO3166-1]
2696	              International Organization for Standardization, "ISO 3166-
2697	              1:1997. Codes for the representation of names of countries
2698	              and their subdivisions -- Part 1: Country codes", 1997.

2700	   [ISO639-1]
2701	              International Organization for Standardization, "ISO 639-
2702	              1:2002. Codes for the representation of names of languages
2703	              -- Part 1: Alpha-2 code", 2002.

2705	   [ISO639-2]
2706	              International Organization for Standardization, "ISO 639-
2707	              2:1998. Codes for the representation of names of languages
2708	              -- Part 2: Alpha-3 code, first edition", 1998.

2710	   [ISO639-3]
2711	              International Organization for Standardization, "ISO 639-
2712	              3:2007. Codes for the representation of names of languages
2713	              -- Part 3: Alpha-3 code for comprehensive coverage of
2714	              languages", 2007.

2716	   [ISO646]   International Organization for Standardization, "ISO/IEC
2717	              646:1991, Information technology -- ISO 7-bit coded
2718	              character set for information interchange.", 1991.

2720	   [RFC2026]  Bradner, S., "The Internet Standards Process -- Revision
2721	              3", BCP 9, RFC 2026, October 1996.

2723	   [RFC2028]  Hovey, R. and S. Bradner, "The Organizations Involved in
2724	              the IETF Standards Process", BCP 11, RFC 2028,
2725	              October 1996.

2727	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2728	              Requirement Levels", BCP 14, RFC 2119, March 1997.

2730	   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
2731	              Languages", BCP 18, RFC 2277, January 1998.

2733	   [RFC2434]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
2734	              IANA Considerations Section in RFCs", BCP 26, RFC 2434,
2735	              October 1998.

2737	   [RFC2860]  Carpenter, B., Baker, F., and M. Roberts, "Memorandum of
2738	              Understanding Concerning the Technical Work of the
2739	              Internet Assigned Numbers Authority", RFC 2860, June 2000.

2741	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
2742	              Timestamps", RFC 3339, July 2002.

2744	   [RFC4234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
2745	              Specifications: ABNF", RFC 4234, October 2005.

2747	   [RFC4645]  Ewell, D., Ed., "Initial Language Subtag Registry",
2748	              September 2006, <http://www.ietf.org/rfc/rfc4645.txt>.

2750	   [RFC4647]  Phillips, A., Ed. and M. Davis, Ed., "Matching of Language
2751	              Tags", September 2006,
2752	              <http://www.ietf.org/rfc/rfc4647.txt>.

2754	   [UN_M.49]  Statistics Division, United Nations, "Standard Country or
2755	              Area Codes for Statistical Use", UN Standard Country or
2756	              Area Codes for Statistical Use, Revision 4 (United Nations
2757	              publication, Sales No. 98.XVII.9, June 1999.

2759	9.2.  Informative References

2761	   [MARC21]   Library of Congress, National Development and MARC
2762	              Standards Office, "MARC 21 Specifications for Record
2763	              Structure, Character Sets, and Exchange Media",
2764	              January 2000, <http://www.loc.gov/marc/specifications/>.

2766	   [RFC1766]  Alvestrand, H., "Tags for the Identification of
2767	              Languages", RFC 1766, March 1995.

2769	   [RFC2047]  Moore, K., "MIME (Multipurpose Internet Mail Extensions)
2770	              Part Three: Message Header Extensions for Non-ASCII Text",
2771	              RFC 2047, November 1996.

2773	   [RFC2231]  Freed, N. and K. Moore, "MIME Parameter Value and Encoded
2774	              Word Extensions: Character Sets, Languages, and
2775	              Continuations", RFC 2231, November 1997.

2777	   [RFC2781]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO
2778	              10646", RFC 2781, February 2000.

2780	   [RFC3066]  Alvestrand, H., "Tags for the Identification of
2781	              Languages", BCP 47, RFC 3066, January 2001.

2783	   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
2784	              Text on Security Considerations", BCP 72, RFC 3552,
2785	              July 2003.

2787	   [RFC4646]  Phillips, A., Ed. and M. Davis, Ed., "Tags for the
2788	              Identification of Languages", September 2006,
2789	              <http://www.ietf.org/rfc/rfc4646.txt>.

2791	   [Unicode]  Unicode Consortium, "The Unicode Consortium. The Unicode
2792	              Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003.
2793	              ISBN 0-321-49081-0)", January 2007.

2795	   [XML10]    Bray (et al), T., "Extensible Markup Language (XML) 1.0",
2796	              02 2004.

2798	   [iso639.prin]
2799	              ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory
2800	              Committee:  Working principles for ISO 639 maintenance",
2801	              March 2000,
2802	              <http://www.loc.gov/standards/iso639-2/
2803	              iso639jac_n3r.html>.

2805	   [record-jar]
2806	              Raymond, E., "The Art of Unix Programming", 2003,
2807	              <urn:isbn:0-13-142901-9>.

2809	   [registry-update]
2810	              Ewell, D., Ed., "Update to the Language Subtag Registry",
2811	              September 2006, <http://www.ietf.org/internet-drafts/
2812	              draft-ietf-ltru-initial-registry-00.txt>.

2814	Appendix A.  Acknowledgements

2816	   Any list of contributors is bound to be incomplete; please regard the
2817	   following as only a selection from the group of people who have
2818	   contributed to make this document what it is today.

2820	   The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the
2821	   precursors of this document, made enormous contributions directly or
2822	   indirectly to this document and are generally responsible for the
2823	   success of language tags.

2825	   The following people contributed to this document:

2827	   Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan,
2828	   Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion
2829	   Gunn, Kent Karlsson, Chris Newman, Randy Presuhn, Stephen Silver, and
2830	   many, many others.

2832	   Very special thanks must go to Harald Tveit Alvestrand, who
2833	   originated RFCs 1766 and 3066, and without whom this document would
2834	   not have been possible.

2836	   Special thanks go to Michael Everson, who served as the Language Tag
2837	   Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as
2838	   the Language Subtag Reviewer since the adoption of RFC 4646.

2840	   Special thanks also to Doug Ewell, for his production of the first
2841	   complete subtag registry, his work to support and maintain new
2842	   registrations, and his careful editorship of both RFC 4645 and
2843	   [registry-update].

2845	Appendix B.  Examples of Language Tags (Informative)

2847	   Simple language subtag:

2849	      de (German)

2851	      fr (French)

2853	      ja (Japanese)

2855	      i-enochian (example of a grandfathered tag)

2857	   Language subtag plus Script subtag:

2859	      zh-Hant (Chinese written using the Traditional Chinese script)

2861	      zh-Hans (Chinese written using the Simplified Chinese script)

2863	      sr-Cyrl (Serbian written using the Cyrillic script)

2865	      sr-Latn (Serbian written using the Latin script)

2867	   Language-Script-Region:

2869	      zh-Hans-CN (Chinese written using the Simplified script as used in
2870	      mainland China)

2872	      sr-Latn-RS (Serbian written using the Latin script as used in
2873	      Serbia)

2875	   Language-Variant:

2877	      sl-rozaj (Resian dialect of Slovenian)

2879	      sl-nedis (Nadiza dialect of Slovenian)

2881	   Language-Region-Variant:

2883	      de-CH-1901 (German as used in Switzerland using the 1901 variant
2884	      [orthography])

2886	      sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)

2888	   Language-Script-Region-Variant:

2890	      hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as
2891	      used in Italy)

2893	   Language-Region:

2895	      de-DE (German for Germany)

2897	      en-US (English as used in the United States)

2899	      es-419 (Spanish appropriate for the Latin America and Caribbean
2900	      region using the UN region code)

2902	   Private use subtags:

2904	      de-CH-x-phonebk

2906	      az-Arab-x-AZE-derbend

2908	   Extended language subtags:

2910	      zh-cmn

2912	      zh-cmn-Hant-CN

2914	   Private use registry values:

2916	      x-whatever (private use using the singleton 'x')

2918	      qaa-Qaaa-QM-x-southern (all private tags)

2920	      de-Qaaa (German, with a private script)

2922	      sr-Latn-QM (Serbian, Latin-script, private region)

2924	      sr-Qaaa-RS (Serbian, private script, for Serbia)

2926	   Tags that use extensions (examples ONLY: extensions MUST be defined
2927	   by revision or update to this document or by RFC):

2929	      en-US-u-islamCal

2931	      zh-CN-a-myExt-x-private

2933	      en-a-myExt-b-another

2935	   Some Invalid Tags:

2937	      de-419-DE (two region tags)

2939	      a-DE (use of a single-character subtag in primary position; note
2940	      that there are a few grandfathered tags that start with "i-" that
2941	      are valid)

2943	      ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter
2944	      prefix)

2946	Authors' Addresses

2948	   Addison Phillips (editor)
2949	   Yahoo! Inc.

2951	   Email: addison@inter-locale.com
2952	   URI:   http://www.inter-locale.com

2954	   Mark Davis (editor)
2955	   Google

2957	   Email: mark.davis@macchiato.com or mark.davis@google.com

2959	Full Copyright Statement

2961	   Copyright (C) The IETF Trust (2007).

2963	   This document is subject to the rights, licenses and restrictions
2964	   contained in BCP 78, and except as set forth therein, the authors
2965	   retain all their rights.

2967	   This document and the information contained herein are provided on an
2968	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2969	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
2970	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
2971	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
2972	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2973	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2975	Intellectual Property

2977	   The IETF takes no position regarding the validity or scope of any
2978	   Intellectual Property Rights or other rights that might be claimed to
2979	   pertain to the implementation or use of the technology described in
2980	   this document or the extent to which any license under such rights
2981	   might or might not be available; nor does it represent that it has
2982	   made any independent effort to identify any such rights.  Information
2983	   on the procedures with respect to rights in RFC documents can be
2984	   found in BCP 78 and BCP 79.

2986	   Copies of IPR disclosures made to the IETF Secretariat and any
2987	   assurances of licenses to be made available, or the result of an
2988	   attempt made to obtain a general license or permission for the use of
2989	   such proprietary rights by implementers or users of this
2990	   specification can be obtained from the IETF on-line IPR repository at
2991	   http://www.ietf.org/ipr.

2993	   The IETF invites any interested party to bring to its attention any
2994	   copyrights, patents or patent applications, or other proprietary
2995	   rights that may cover technology that may be required to implement
2996	   this standard.  Please address the information to the IETF at
2997	   ietf-ipr@ietf.org.

2999	Acknowledgment

3001	   Funding for the RFC Editor function is provided by the IETF
3002	   Administrative Support Activity (IASA).