idnits 2.17.1 

draft-ietf-ltru-registry-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 2550.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2527.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2534.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2540.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The abstract seems to indicate that this document obsoletes RFC3066, but
     the header doesn't have an 'Obsoletes:' line to match this.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 719 has weird spacing: '...logical  line ...'

  == Line 720 has weird spacing: '...prising  a fie...'

  == Line 721 has weird spacing: '...ld-body  porti...'

  == Line 722 has weird spacing: '...   this  conce...'

  == Line 880 has weird spacing: '...ve been  possi...'

  == (7 more instances...)

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 19, 2005) is 6911 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'ISO 639' on line 206

  -- Looks like a reference, but probably isn't: 'ISO 3166' on line 209

  -- Looks like a reference, but probably isn't: 'ISO 15924' on line 272

  -- Looks like a reference, but probably isn't: 'RFC 2231' on line 246

  -- Looks like a reference, but probably isn't: 'ISO 639-1' on line 324

  -- Looks like a reference, but probably isn't: 'ISO 639-2' on line 331

  -- Looks like a reference, but probably isn't: 'RFC 2028' on line 1439

  -- Looks like a reference, but probably isn't: 'RFC 2026' on line 1247

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  ** Obsolete normative reference: RFC 2028 (ref. '9') (Obsoleted by RFC 9281)

  ** Obsolete normative reference: RFC 2434 (ref. '11') (Obsoleted by RFC
     5226)

  ** Downref: Normative reference to an Informational RFC: RFC 2781 (ref.
     '12')

  ** Downref: Normative reference to an Informational RFC: RFC 2860 (ref.
     '13')

  -- Obsolete informational reference (is this intentional?): RFC 1766 (ref.
     '21') (Obsoleted by RFC 3066, RFC 3282)

  -- Obsolete informational reference (is this intentional?): RFC 3066 (ref.
     '23') (Obsoleted by RFC 4646, RFC 4647)


     Summary: 7 errors (**), 0 flaws (~~), 9 warnings (==), 24 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                   A. Phillips, Ed.
3	Internet-Draft                                            Quest Software
4	Expires: November 20, 2005                                 M. Davis, Ed.
5	                                                                     IBM
6	                                                            May 19, 2005

8	                     Tags for Identifying Languages
9	                      draft-ietf-ltru-registry-02

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on November 20, 2005.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2005).

40	Abstract

42	   This document describes the structure, content, construction, and
43	   semantics of language tags for use in cases where it is desirable to
44	   indicate the language used in an information object.  It also
45	   describes how to register values for use in language tags and the
46	   creation of user defined extensions for private interchange.  This
47	   document obsoletes RFC 3066 (which replaced RFC 1766).

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	   2.  The Language Tag . . . . . . . . . . . . . . . . . . . . . . .  4
53	     2.1   Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .  4
54	       2.1.1   Length Considerations  . . . . . . . . . . . . . . . .  6
55	     2.2   Language Subtag Sources and Interpretation . . . . . . . .  7
56	       2.2.1   Primary Language Subtag  . . . . . . . . . . . . . . .  8
57	       2.2.2   Extended Language Subtags  . . . . . . . . . . . . . . 10
58	       2.2.3   Script Subtag  . . . . . . . . . . . . . . . . . . . . 10
59	       2.2.4   Region Subtag  . . . . . . . . . . . . . . . . . . . . 11
60	       2.2.5   Variant Subtags  . . . . . . . . . . . . . . . . . . . 12
61	       2.2.6   Extension Subtags  . . . . . . . . . . . . . . . . . . 13
62	       2.2.7   Private Use Subtags  . . . . . . . . . . . . . . . . . 14
63	       2.2.8   Pre-Existing RFC 3066 Registrations  . . . . . . . . . 15
64	       2.2.9   Classes of Conformance . . . . . . . . . . . . . . . . 15
65	   3.  Registry Format and Maintenance  . . . . . . . . . . . . . . . 17
66	     3.1   Format of the IANA Language Subtag Registry  . . . . . . . 17
67	     3.2   Maintenance of the Registry  . . . . . . . . . . . . . . . 21
68	     3.3   Stability of IANA Registry Entries . . . . . . . . . . . . 23
69	     3.4   Registration Procedure for Subtags . . . . . . . . . . . . 26
70	     3.5   Possibilities for Registration . . . . . . . . . . . . . . 29
71	     3.6   Extensions and Extensions Namespace  . . . . . . . . . . . 31
72	     3.7   Conversion of the RFC 3066 Language Tag Registry . . . . . 34
73	   4.  Formation and Processing of Language Tags  . . . . . . . . . . 37
74	     4.1   Choice of Language Tag . . . . . . . . . . . . . . . . . . 37
75	     4.2   Meaning of the Language Tag  . . . . . . . . . . . . . . . 39
76	     4.3   Canonicalization of Language Tags  . . . . . . . . . . . . 40
77	     4.4   Considerations for Private Use Subtags . . . . . . . . . . 41
78	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 43
79	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 44
80	   7.  Character Set Considerations . . . . . . . . . . . . . . . . . 45
81	   8.  Changes from RFC 3066  . . . . . . . . . . . . . . . . . . . . 46
82	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 50
83	     9.1   Normative References . . . . . . . . . . . . . . . . . . . 50
84	     9.2   Informative References . . . . . . . . . . . . . . . . . . 51
85	       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 52
86	   A.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 53
87	   B.  Examples of Language Tags (Informative)  . . . . . . . . . . . 54
88	   C.  Example Registry . . . . . . . . . . . . . . . . . . . . . . . 57
89	       Intellectual Property and Copyright Statements . . . . . . . . 61

91	1.  Introduction

93	   Human beings on our planet have, past and present, used a number of
94	   languages.  There are many reasons why one would want to identify the
95	   language used when presenting or requesting information.

97	   Information about a user's language preferences commonly needs to be
98	   identified so that appropriate processing can be applied.  For
99	   example, the user's language preferences in a browser can be used to
100	   select web pages appropriately.  A choice of language preference can
101	   also be used to select among tools (such as dictionaries) to assist
102	   in the processing or understanding of content in different languages.

104	   In addition, knowledge about the particular language used by some
105	   piece of information content may be useful or even required by some
106	   types of information processing; for example spell-checking,
107	   computer-synthesized speech, Braille transcription, or high-quality
108	   print renderings.

110	   One means of indicating the language used is by labeling the
111	   information content with a language identifier.  These identifiers
112	   can also be used to specify user preferences when selecting
113	   information content, or for labeling additional attributes of content
114	   and associated resources.

116	   These identifiers can also be used to indicate additional attributes
117	   of content that are closely related to the language.  In particular,
118	   it is often necessary to indicate specific information about the
119	   dialect, writing system, or orthography used in a document or
120	   resource, as these attributes may be important for the user to obtain
121	   information in a form that they can understand, or important in
122	   selecting appropriate processing resources for the given content.

124	   This document specifies an identifier mechanism and a registration
125	   function for values to be used with that identifier mechanism.  It
126	   also defines a mechanism for private use values and future extension.

128	   This document replaces RFC 3066, which replaced RFC 1766.  For a list
129	   of changes in this document, see Section 8.

131	   The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
132	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
133	   document are to be interpreted as described in RFC 2119 [10].

135	2.  The Language Tag

137	2.1  Syntax

139	   The language tag is composed of one or more parts: A primary language
140	   subtag and a (possibly empty) series of subsequent subtags.  Subtags
141	   are distinguished by their length, position in the subtag sequence,
142	   and content, so that each type of subtag can be recognized solely by
143	   these features.  This makes it possible to construct a parser that
144	   can extract and assign some semantic information to the subtags, even
145	   if specific subtag values are not recognized.  Thus a parser need not
146	   have an up-to-date copy of the registered subtag values to perform
147	   most searching and matching operations.

149	   The syntax of this tag in ABNF [7] is:

151	   Language-Tag = (lang
152	                   *("-" extlang)
153	                   ["-" script]
154	                   ["-" region]
155	                   *("-" variant)
156	                   *("-" extension)
157	                   ["-" privateuse])
158	                   / privateuse         ; private-use tag
159	                   / grandfathered      ; grandfathered registrations

161	   lang            = 2*3ALPHA           ; shortest ISO 639 code
162	                   / registered-lang
163	   extlang         = 3ALPHA             ; reserved for future use
164	   script          = 4ALPHA             ; ISO 15924 code
165	   region          = 2ALPHA             ; ISO 3166 code
166	                   / 3DIGIT             ; UN country number
167	   variant         =  5*8alphanum       ; registered variants
168	                   / ( DIGIT 3alphanum )
169	   extension       = singleton 1*("-" (2*8alphanum))
170	   privateuse      = ("x"/"X") 1*("-" (1*8alphanum))
171	   singleton       = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT
172	                   ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9"
173	                   ; Single letters: x/X is reserved for private use
174	   registered-lang = 4*8ALPHA          ; registered language subtag
175	   grandfathered   = 1*3ALPHA 1*2("-" (2*8alphanum))
176	                                       ; grandfathered registration
177	                                       ; Note: i is the only singleton
178	                                       ; that starts a grandfathered tag
179	   alphanum        = (ALPHA / DIGIT)   ; letters and numbers

181	                        Figure 1: Language Tag ABNF

183	   The character "-" is HYPHEN-MINUS (ABNF: %x2D).  All subtags have a
184	   maximum length of eight characters.  Note that there is a subtlety in
185	   the ABNF for 'variant': variants starting with a digit may be only
186	   four characters long, while those starting with a letter must be at
187	   least five characters long.

189	   Whitespace is not permitted in a language tag.  For examples of
190	   language tags, see Appendix B.

192	   Note that although [7] refers to octets, the language tags described
193	   in this document are sequences of characters from the US-ASCII
194	   repertoire.  Language tags may be used in documents and applications
195	   that use other encodings, so long as these encompass the US-ASCII
196	   repertoire.  An example of this would be an XML document that uses
197	   the UTF-16LE [12] encoding of Unicode [20].

199	   The tags and their subtags, including private-use and extensions, are
200	   to be treated as case insensitive: there exist conventions for the
201	   capitalization of some of the subtags, but these should not be taken
202	   to carry meaning.

204	   For example:

206	   o  [ISO 639] [1] recommends that language codes be written in lower
207	      case ('mn' Mongolian).

209	   o  [ISO 3166] [4] recommends that country codes be capitalized ('MN'
210	      Mongolia).

212	   o  [ISO 15924] [3] recommends that script codes use lower case with
213	      the initial letter capitalized ('Cyrl' Cyrillic).

215	   However, in the tags defined by this document, the uppercase US-ASCII
216	   letters in the range 'A' through 'Z' are considered equivalent and
217	   mapped directly to their US-ASCII lowercase equivalents in the range
218	   'a' through 'z'.  Thus the tag "mn-Cyrl-MN" is not distinct from "MN-
219	   cYRL-mn" or "mN-cYrL-Mn" (or any other combination) and each of these
220	   variations conveys the same meaning: Mongolian written in the
221	   Cyrillic script as used in Mongolia.

223	2.1.1  Length Considerations

225	   Although neither the ABNF nor other guidelines in this document
226	   provide a fixed upper limit on the number of subtags in a Language
227	   Tag (and thus the upper bound on the size of a tag) and it is
228	   possible to envision quite long and complex subtag sequences, in
229	   practice these are rare because additional granularity in tags seldom
230	   adds useful distinguishing information and because longer, more
231	   granular tags interefere with the meaning, understanding, and
232	   processing of language tags.

234	   In particular,  variant subtags SHOULD be used only with their
235	   recommended prefix.  In practice, this limits most tags to a sequence
236	   of four subtags, and thus a maximum length of 26 characters
237	   (excluding any extensions or private use sequences).  This is because
238	   subtags are limited to a length of eight characters and the extlang,
239	   script, and region subtags are limited to even fewer characters.  See
240	   Section 4.1 for more information on selecting the most appropriate
241	   Language Tag.

243	   A conformant implementation MAY refuse to support the storage of
244	   language tags which exceed a specified length.  For an example, see

246	   [RFC 2231] [22].  Any such limitation MUST be clearly documented, and
247	   such documentation SHOULD include the disposition of any longer tags
248	   (for example, whether an error value is generated or the language tag
249	   is truncated).  If truncation is permitted it MUST NOT permit a
250	   subtag to be divided.

252	2.2  Language Subtag Sources and Interpretation

254	   The namespace of language tags and their subtags is administered by
255	   the Internet Assigned Numbers Authority (IANA) [13] according to the
256	   rules in Section 5 of this document.  The registry maintained by IANA
257	   is the source for valid subtags: other standards referenced in this
258	   section provide the source material for that registry.

260	   Terminology in this section:

262	   o  Tag or tags refers to a complete language tag, such as
263	      "fr-Latn-CA".  Examples of tags in this document are enclosed in
264	      double-quotes ("en-US").

266	   o  Subtag refers to a specific section of a tag, delimited by hyphen,
267	      such as the subtag 'Latn' in "fr-Latn-CA".  Examples of subtags in
268	      this document are enclosed in single quotes ('Latn').

270	   o  Code or codes refers to values defined in external standards (and
271	      which are used as subtags in this document).  For example, 'Latn'
272	      is an [ISO 15924] [3] script code which was used to define the
273	      'Latn' script subtag for use in a language tag.  Examples of codes
274	      in this document are enclosed in single quotes ('en', 'Latn').

276	   The definitions in this section apply to the various subtags within
277	   the language tags defined by this document, excepting those
278	   "grandfathered" tags defined in Section 2.2.8.

280	   Language tags are designed so that each subtag type has unique length
281	   and content restrictions.  These make identification of the subtag's
282	   type possible, even if the content of the subtag itself is
283	   unrecognized.  This allows tags to be parsed and processed without
284	   reference to the latest version of the underlying standards or the
285	   IANA registry and makes the associated exception handling when
286	   parsing tags simpler.

288	   Subtags in the IANA registry that do not come from an underlying
289	   standard can only appear in specific positions in a tag.
290	   Specifically, they can only occur as primary language subtags or as
291	   variant subtags.

293	   Note that sequences of private-use and extension subtags MUST occur
294	   at the end of the sequence of subtags and MUST NOT be interspersed
295	   with subtags defined elsewhere in this document.

297	   Single letter and digit subtags are reserved for current or future
298	   use.  These include the following current uses:

300	   o  The single letter subtag 'x' is reserved to introduce a sequence
301	      of private-use subtags.  The interpretation of any private-use
302	      subtags is defined solely by private agreement and is not defined
303	      by the rules in this section or in any standard or registry
304	      defined in this document.

306	   o  All other single letter subtags are reserved to introduce
307	      standardized extension subtag sequences as described in
308	      Section 3.6.

310	   The single letter subtag 'i' is used by some grandfathered tags, such
311	   as "i-enochian", where it always appears in the first position and
312	   cannot be confused with an extension.

314	2.2.1  Primary Language Subtag

316	   The primary language subtag is the first subtag in a language tag
317	   (with the exception of private-use and certain grandfathered tags)
318	   and cannot be omitted.  The following rules apply to the primary
319	   language subtag:

321	   1.  All two character language subtags were defined in the IANA
322	       registry according to the assignments found in the standard ISO
323	       639 Part 1, "ISO 639-1:2002, Codes for the representation of
324	       names of languages -- Part 1: Alpha-2 code" [ISO 639-1] [1], or
325	       using assignments subsequently made by the ISO 639 Part 1
326	       maintenance agency or governing standardization bodies.

328	   2.  All three character language subtags were defined in the IANA
329	       registry according to the assignments found in ISO 639 Part 2,
330	       "ISO 639-2:1998 - Codes for the representation of names of
331	       languages -- Part 2: Alpha-3 code - edition 1" [ISO 639-2] [2],
332	       or assignments subsequently made by the ISO 639 Part 2
333	       maintenance agency or governing standardization bodies.

335	   3.  The subtags in the range 'qaa' through 'qtz' are reserved for
336	       private use in language tags.  These subtags correspond to codes
337	       reserved by ISO 639-2 for private use.  These codes MAY be used
338	       for non-registered primary-language subtags (instead of using
339	       private-use subtags following 'x-').  Please refer to Section 4.4
340	       for more information on private use subtags.

342	   4.  All four character language subtags are reserved for possible
343	       future standardization.

345	   5.  All language subtags of 5 to 8 characters in length in the IANA
346	       registry were defined via the registration process in Section 3.4
347	       and MAY be used to form the primary language subtag.  At the time
348	       this document was created, there were no examples of this kind of
349	       subtag and future registrations of this type will be discouraged:
350	       primary languages are STRONGLY RECOMMENDED for registration with
351	       ISO 639 and proposals rejected by ISO 639/RA will be closely
352	       scrutinized before they are registered with IANA.

354	   6.  The single character subtag 'x' as the primary subtag indicates
355	       that the language tag consists solely of subtags whose meaning is
356	       defined by private agreement.  For example, in the tag "x-fr-CH",
357	       the subtags 'fr' and 'CH' should not be taken to represent the
358	       French language or the country of Switzerland (or any other value
359	       in the IANA registry) unless there is a private agreement in
360	       place to do so.  See Section 4.4.

362	   7.  The single character subtag 'i' is used by some grandfathered
363	       tags (see Section 2.2.8) such as "i-klingon" and "i-bnn".  (Other
364	       grandfathered tags have a primary language subtag in their first
365	       position)

367	   8.  Other values MUST NOT be assigned to the primary subtag except by
368	       revision or update of this document.

370	   Note: For languages that have both an ISO 639-1 two character code
371	   and an ISO 639-2 three character code, only the ISO 639-1 two
372	   character code is defined in the IANA registry.

374	   Note: For languages that have no ISO 639-1 two character code and for
375	   which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
376	   (Bibliographic) codes differ, only the Terminology code is defined in
377	   the IANA registry.  At the time this document was created, all
378	   languages that had both kinds of three character code were also
379	   assigned a two character code; it is not expected that future
380	   assignments of this nature will occur.

382	   Note: To avoid problems with versioning and subtag choice as
383	   experienced during the transition between RFC 1766 and RFC 3066, as
384	   well as the canonical nature of subtags defined by this document, the
385	   ISO 639 Registration Authority Joint Advisory Committee (ISO 639/
386	   RA-JAC) has included the following statement in [16]:

388	   "A language code already in ISO 639-2 at the point of freezing ISO
389	   639-1 shall not later be added to ISO 639-1.  This is to ensure
390	   consistency in usage over time, since users are directed in Internet
391	   applications to employ the alpha-3 code when an alpha-2 code for that
392	   language is not available."

394	   In order to avoid instability of the canonical form of tags, if a two
395	   character code is added to ISO 639-1 for a language for which a three
396	   character code was already included in ISO 639-2, the two character
397	   code will not be added as a subtag in the registry.  See Section 3.3.

399	   For example, if some content were tagged with 'haw' (Hawaiian), which
400	   currently has no two character code, the tag would not be invalidated
401	   if ISO 639-1 were to assign a two character code to the Hawaiian
402	   language at a later date.

404	   For example, one of the grandfathered IANA registrations is
405	   "i-enochian".  The subtag 'enochian' could be registered in the IANA
406	   registry as a primary language subtag (assuming that ISO 639 does not
407	   register this language first), making tags such as "enochian-AQ" and
408	   "enochian-Latn" valid.

410	2.2.2  Extended Language Subtags

412	   The following rules apply to the extended language subtags:

414	   1.  Three letter subtags immediately following the primary subtag are
415	       reserved for future standardization, anticipating work that is
416	       currently under way on ISO 639.

418	   2.  Extended language subtags MUST follow the primary subtag and
419	       precede any other subtags.

421	   3.  There MAY be any additional number of extended language subtags.

423	   4.  Extended language subtags will not be registered except by
424	       revision of this document.

426	   5.  Extended language subtags MUST NOT be used to form language tags
427	       except by revision of this document.

429	   Example: In a future revision or update of this document, the tag
430	   "zh-gan" (registered under RFC 3066) might become a valid non-
431	   grandfathered (that is, redundant) tag in which the subtag 'gan'
432	   might represent the Chinese dialect 'Gan'.

434	2.2.3  Script Subtag

436	   The following rules apply to the script subtags:

438	   1.  All four character subtags were defined according to ISO 15924
439	       [3]--"Codes for the representation of the names of scripts":
440	       alpha-4 script codes, or subsequently assigned by the ISO 15924
441	       maintenance agency or governing standardization bodies, denoting
442	       the script or writing system used in conjunction with this
443	       language.

445	   2.  Script subtags MUST immediately follow the primary language
446	       subtag and all extended language subtags and MUST occur before
447	       any other type of subtag described below.

449	   3.  The script subtags 'Qaaa' through 'Qabx' are reserved for private
450	       use in language tags.  These subtags correspond to codes reserved
451	       by ISO 15924 for private use.  These codes MAY be used for non-
452	       registered script values.  Please refer to Section 4.4 for more
453	       information on private-use subtags.

455	   4.  Script subtags cannot be registered using the process in
456	       Section 3.4 of this document.  Variant subtags may be considered
457	       for registration for that purpose.

459	   Example: "de-Latn" represents German written using the Latin script.

461	2.2.4  Region Subtag

463	   The following rules apply to the region subtags:

465	   1.  The region subtag defines language variations used in a specific
466	       region, geographic, or political area.  Region subtags MUST
467	       follow any language, extended language, or script subtags and
468	       MUST precede all other subtags.

470	   2.  All two character subtags following the primary subtag were
471	       defined in the IANA registry according to the assignments found
472	       in ISO 3166 [4]--"Codes for the representation of names of
473	       countries and their subdivisions - Part 1: Country
474	       codes"--alpha-2 country codes or assignments subsequently made by
475	       the ISO 3166 maintenance agency or governing standardization
476	       bodies.

478	   3.  All three character codes consisting of digit (numeric)
479	       characters were defined in the IANA registry according to the
480	       assignments found in UN Standard Country or Area Codes for
481	       Statistical  Use [5] or assignments subsequently made by the
482	       governing standards body.  Note that not all of the UN M.49 codes
483	       are defined in the IANA registry:

485	       A.  UN numeric codes assigned to 'macro-geographical
486	           (continental)' or sub-regions not associated with an assigned
487	           ISO 3166 alpha-2 code _are_ defined.

489	       B.  UN numeric codes for 'economic groupings' or 'other
490	           groupings' are _not_ defined in the IANA registry and MUST
491	           NOT be used to form language tags.

493	       C.  UN numeric codes for countries with ambiguous ISO 3166
494	           alpha-2 codes as defined in Section 3.3 are defined in the
495	           registry and are canonical for the given country or region
496	           defined.

498	       D.  The alphanumeric codes in Appendix X of the UN document are
499	           _not_ defined and MUST NOT be used to form language tags.
500	           (At the time this document was created these values match the
501	           ISO 3166 alpha-2 codes.)

503	   4.  There may be at most one region subtag in a language tag.

505	   5.  The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
506	       reserved for private use in language tags.  These subtags
507	       correspond to codes reserved by ISO 3166 for private use.  These
508	       codes MAY be used for private use region subtags (instead of
509	       using a private-use subtag sequence).  Please refer to
510	       Section 4.4 for more information on private use subtags.

512	   "de-CH" represents German ('de') as used in Switzerland ('CH').

514	   "sr-Latn-CS" represents Serbian ('sr') written using Latin script
515	   ('Latn') as used in Serbia and Montenegro ('CS').

517	   "es-419" represents Spanish ('es') as used in the UN-defined Latin
518	   America and Caribbean region ('419').

520	2.2.5  Variant Subtags

522	   The following rules apply to the variant subtags:

524	   1.  Variant subtags are not associated with any external standard.
525	       Variant subtags and their meanings are defined by the
526	       registration process defined in Section 3.4.

528	   2.  Variant subtags MUST follow all of the other defined subtags, but
529	       precede any extension or private-use subtag sequences.

531	   3.  More than one variant MAY be used to form the language tag.

533	   4.  Variant subtags MUST be registered with IANA according to the
534	       rules in Section 3.4 of this document before being used to form
535	       language tags.  In order to distinguish variants from other types
536	       of subtags, registrations must meet the following length and
537	       content restrictions:

539	       1.  Variant subtags that begin with a letter (a-z, A-Z) MUST be
540	           at least five characters long.

542	       2.  Variant subtags that begin with a digit (0-9) MUST be at
543	           least four characters long.

545	   "en-scouse" represents the Scouse dialect of English.

547	   "de-CH-1996" represents German as used in Switzerland and as written
548	   using the spelling reform beginning in the year 1996 C.E.

550	2.2.6  Extension Subtags

552	   The following rules apply to extensions:

554	   1.   Extension subtags are separated from the other subtags defined
555	        in this document by a single-letter subtag ("singleton").  The
556	        singleton MUST be one allocated to a registration authority via
557	        the mechanism described in Section 3.6 and cannot be the letter
558	        'x', which is reserved for private-use subtag sequences.

560	   2.   Note: Private-use subtag sequences starting with the singleton
561	        subtag 'x' are described below.

563	   3.   An extension MUST follow at least a primary language subtag.
564	        That is, a language tag cannot begin with an extension.
565	        Extensions extend language tags, they do not override or replace
566	        them.  For example, "a-value" is not a well-formed language tag,
567	        while "de-a-value" is.

569	   4.   Each singleton subtag MUST appear at most one time in each tag
570	        (other than as a private-use subtag).  That is, singleton
571	        subtags MUST NOT be repeated.  For example, the tag "en-a-bbb-a-
572	        ccc" is invalid because the subtag 'a' appears twice.  Note that
573	        the tag "en-a-bbb-x-a-ccc" is valid because the second
574	        appearance of the singleton 'a' is in a private use sequence.

576	   5.   Extension subtags MUST meet all of the requirements for the
577	        content and format of subtags defined in this document.

579	   6.   Extension subtags MUST meet whatever requirements are set by the
580	        document that defines their singleton prefix and whatever
581	        requirements are provided by the maintaining authority.

583	   7.   Each extension subtag MUST be from two to eight characters long
584	        and consist solely of letters or digits, with each subtag
585	        separated by a single '-'.

587	   8.   Each singleton MUST be followed by at least one extension
588	        subtag.  For example, the tag "tlh-a-b-foo" is invalid because
589	        the first singleton 'a' is followed immediately by another
590	        singleton 'b'.

592	   9.   Extension subtags MUST follow all language, extended language,
593	        script, region and variant subtags in a tag.

595	   10.  All subtags following the singleton and before another singleton
596	        are part of the extension.  Example: In the tag "fr-a-Latn", the
597	        subtag 'Latn' does not represent the script subtag 'Latn'
598	        defined in the IANA Language Subtag Registry.  Its meaning is
599	        defined by the extension 'a'.

601	   11.  In the event that more than one extension appears in a single
602	        tag, the tag SHOULD be canonicalized as described in
603	        Section 4.3.

605	   For example, if the prefix singleton 'r' and the shown subtags were
606	   defined, then the following tag would be a valid example: "en-Latn-
607	   GB-boont-r-extended-sequence-x-private"

609	2.2.7  Private Use Subtags

611	   The following rules apply to private-use subtags:

613	   1.  Private-use subtags are separated from the other subtags defined
614	       in this document by the reserved single-character subtag 'x'.

616	   2.  Private-use subtags MUST follow all language, extended language,
617	       script, region, variant, and extension subtags in the tag.
618	       Another way of saying this is that all subtags following the
619	       singleton 'x' MUST be considered private use.  Example: The
620	       subtag 'US' in the tag "en-x-US" is a private use subtag.

622	   3.  A tag MAY consist entirely of private-use subtags.

624	   4.  No source is defined for private use subtags.  Use of private use
625	       subtags is by private agreement only.

627	   For example: Users who wished to utilize SIL Ethnologue for
628	   identification might agree to exchange tags such as "az-Arab-x-AZE-
629	   derbend".  This example contains two private-use subtags.  The first
630	   is 'AZE' and the second is 'derbend'.

632	2.2.8  Pre-Existing RFC 3066 Registrations

634	   Existing IANA-registered language tags from RFC 1766 and/or RFC 3066
635	   maintain their validity.  IANA will maintain these tags in the
636	   registry under either the "grandfathered" or "redundant" type.  For
637	   more information see Section 3.7.

639	   It is important to note that all language tags formed under the
640	   guidelines in this document were either legal, well-formed tags or
641	   could have been registered under RFC 3066.

643	2.2.9  Classes of Conformance

645	   Implementations may wish to express their level of conformance with
646	   the rules and practices described in this document.  There are
647	   generally two classes of conforming implementations: "well-formed"
648	   processors and "validating" processors.  Claims of conformance SHOULD
649	   explicitly reference one of these definitions.

651	   An implementation that claims to check for well-formed language tags
652	   MUST:

654	   o  Check that the tag and all of its subtags, including extension and
655	      private-use subtags, conform to the ABNF or that the tag is on the
656	      list of grandfathered tags.

658	   o  Check that singleton subtags that identify extensions do not
659	      repeat.  For example, the tag "en-a-xx-b-yy-a-zz" is not well-
660	      formed.

662	   Well-formed processors are strongly encouraged to implement the
663	   canonicalization rules contained in Section 4.3.

665	   An implementation that claims to be validating MUST:

667	   o  Check that the tag is well-formed.

669	   o  Specify the particular registry date for which the implementation
670	      performs validation of subtags.

672	   o  Check that either the tag is a grandfathered tag, or that all
673	      language, script, region, and variant subtags consist of valid
674	      codes for use in language tags according to the IANA registry as
675	      of the particular date specified by the implementation.

677	   o  Specify which, if any, extension RFCs as defined in Section 3.6
678	      are supported, including version, revision, and date.

680	   o  For any such extensions supported, check that all subtags used in
681	      that extension are valid.

683	   o  If the processor generates tags, it MUST do so in canonical form,
684	      including any supported extensions, as defined in Section 4.3.

686	3.  Registry Format and Maintenance

688	   This section defines the Language Subtag Registry and the maintenance
689	   and update procedures associated with it.

691	   The language subtag registry will be maintained so that, except for
692	   extension subtags, it is possible to validate all of the subtags that
693	   appear in a language tag under the provisions of this document or its
694	   revisions or successors.  In addition, the meaning of the various
695	   subtags will be unambiguous and stable over time.  (The meaning of
696	   private-use subtags, of course, is not defined by the IANA registry.)

698	   The registry defined under this document contains a comprehensive
699	   list of all of the subtags valid in language tags.  This allows
700	   implementers a straightforward and reliable way to validate language
701	   tags.

703	3.1  Format of the IANA Language Subtag Registry

705	   The IANA Language Subtag Registry ("the registry") will consist of a
706	   text file that is machine readable in the format described in this
707	   section, plus copies of the registration forms approved by the
708	   Language Subtag Reviewer in accordance with the process described in
709	   Section 3.4.  With the exception of the registration forms for
710	   grandfathered and redundant tags, no registration records will be
711	   maintained for the initial set of subtags.

713	   The registry will be in a modified record-jar format text file [17].
714	   Lines are limited to 72 characters, including all whitespace.

716	   Records are separated by lines containing only the sequence "%%"
717	   (%x25.25).

719	   Each field can be viewed as a single, logical  line  of ASCII
720	   characters,  comprising  a field-name and a field-body separated by a
721	   COLON character (%x3A).  For convenience, the field-body  portion  of
722	   this  conceptual entity  can be split into a multiple-line
723	   representation; this is called "folding".  The format of the registry
724	   is described by the following ABNF (per [7]):

726	   registry   = record *("%%" CRLF record)
727	   record     = 1*( field-name *SP ":" *SP field-body CRLF )
728	   field-name = *(ALPHA / DIGIT / "-")
729	   field-body = *(ASCCHAR/LWSP)
730	   ASCCHAR    = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26
731	   UNICHAR    = "&#x" 2*6HEXDIG ";"

733	   The sequence '..' (%x2E.2E) in a field-body denotes a range of
734	   values.  Such a range represents all subtags of the same length that
735	   are alphabetically within that range, including the values explicitly
736	   mentioned.  For example 'a..c' denotes the values 'a', 'b', and 'c'.

738	   Characters from outside the US-ASCII repertoire, as well as the
739	   AMPERSAND character ("&", %x26) when it occurs in a field-body are
740	   represented by a "Numeric Character Reference" using hexadecimal
741	   notation in the style used by XML 1.0 [18] (see
742	   <http://www.w3.org/TR/REC-xml/#dt-charref>).  This consists of the
743	   sequence "&#x" (%x26.23.78) followed by a hexadecimal representation
744	   of the character's code point in ISO/IEC 10646 [6] followed by a
745	   closing semicolon (%x3B).  For example, the EURO SIGN, U+20AC, would
746	   be represented by the sequence "&#x20AC;".  Note that the hexadecimal
747	   notation may have between two and six digits.

749	   All fields whose field-body contains a date value use the "full-date"
750	   format specified in RFC 3339 [14].  For example: "2004-06-28"
751	   represents June 28, 2004 in the Gregorian calendar.

753	   The first record in the file contains the single field whose field-
754	   name is "File-Date" and whose field-body contains the last
755	   modification date of the registry:

757	   File-Date: 2004-06-28
758	   %%

760	   Subsequent records represent subtags in the registry.  Each of the
761	   fields in each record MUST occur no more than once, unless otherwise
762	   noted below.  Each record MUST contain the following fields:

764	   o  'Type'

766	      *  Type's field-value MUST consist of one of the following
767	         strings: "language", "extlang", "script", "region", "variant",
768	         "grandfathered", and "redundant" and denotes the type of tag or
769	         subtag.

771	   o  Either 'Subtag' or 'Tag'

773	      *  Subtag's field-value contains the subtag being defined.  This
774	         field MUST only appear in records of whose Type has one of
775	         these values: "language", "extlang", "script", "region", or
776	         "variant".

778	      *  Tag's field-value contains a complete language tag.  This field
779	         MUST only appear in records whose Type has one of these values:
780	         "grandfathered" or "redundant".

782	   o  Description

784	      *  Description's field-value contains a non-normative description
785	         of the subtag or tag.

787	   o  Added

789	      *  Added's field-value contains the date the record was added to
790	         the registry.

792	   The 'Subtag' or 'Tag' field MUST use lowercase letters to form the
793	   subtag or tag, with two exceptions.  Subtags whose 'Type' field is
794	   'script' (in other words, subtags defined by ISO 15924) MUST use
795	   titlecase.  Subtags whose 'Type' field is 'region' (in other words,
796	   subtags defined by ISO 3166) MUST use uppercase.  These exceptions
797	   mirror the use of case in the underlying standards.

799	   The field 'Description' MAY appear more than one time.  At least one
800	   of the  'Description' fields must contain a description of the tag
801	   being registered written or transcribed into the Latin script; the
802	   same or additional fields may also include a description in a non-
803	   Latin script.  The 'Description' field is used for identification
804	   purposes and should not be taken to represent the actual native name
805	   of the language or variation or to be in any particular language.
806	   Most descriptions are taken directly from source standards such as
807	   ISO 639 or ISO 3166.

809	   Note: Descriptions in registry entries that correspond to ISO 639,
810	   ISO 15924,  ISO 3166 or UN M.49 codes are intended only to indicate
811	   the meaning of that identifier as defined in the source standard at
812	   the time it was added to the registry.  The description does not
813	   replace the content of the source standard itself.  The descriptions
814	   are not intended to be the English localized names for the subtags.
815	   Localization or translation of language tag and subtag descriptions
816	   is out of scope of this document.

818	   Each record MAY also contain the following fields:

820	   o  Canonical

822	      *  For fields of type 'language', 'extlang', 'script', 'region',
823	         and 'variant', a canonical mapping of this record to a subtag
824	         record of the same 'Type'.

826	      *  For fields of type 'grandfathered' and 'redundant', a canonical
827	         mapping to a complete language tag.

829	   o  Deprecated

831	      *  Deprecated's field-value contains the date the record was
832	         deprecated.

834	   o  Recommended-Prefix

836	      *  Recommended-Prefix's field-value contains a language tag with
837	         which this subtag may be used to form a new language tag,
838	         perhaps with other subtags as well.  This field MUST only
839	         appear in records whose 'Type' field-value is 'variant' or
840	         'extlang'.  For example, the 'Recommended-Prefix' for the
841	         variant 'scouse' is 'en', meaning that the tags "en-scouse" and
842	         "en-GB-scouse" might be appropriate while the tag "is-scouse"
843	         is not.

845	   o  Comments

847	      *  Comments contains additional information about the subtag, as
848	         deemed appropriate for understanding the registry and
849	         implementing language tags using the subtag or tag.

851	   o  Suppress-Script

853	      *  Suppress-Script contains a script subtag that SHOULD NOT be
854	         used to form language tags with the associated primary language
855	         subtag.  This field MUST only appear in records whose 'Type'
856	         field-value is 'language'.  See Section 4.1.

858	   The field 'Canonical' SHALL NOT be added to any record already in the
859	   registry.  The field 'Canonical' SHALL NOT be modified except for
860	   records of type "grandfathered": therefore a subtag whose record
861	   contains no canonical mapping when the record is created is a
862	   canonical form and will remain so.

864	   The 'Canonical' field in records of type "grandfathered" and
865	   "redundant" contains whole language tags that are STRONGLY
866	   RECOMMENDED for use in place of the record's value.  In many cases
867	   the mappings were created by deprecation of the tags during the
868	   period before this document was adopted.  For example, the tag "no-
869	   nyn" was deprecated in favor of the ISO 639-1 defined language code
870	   'nn'.

872	   Note that a record that has a 'Canonical' field MUST have a
873	   'Deprecated' field also (although the converse is not true).

875	   The field 'Deprecated' MAY be added to any record via the maintenance
876	   process described in Section 3.2 or via the registration process
877	   described in Section 3.4.  Usually the addition of a 'Deprecated'
878	   field is due to the action of one of the standards bodies, such as
879	   ISO 3166, withdrawing a code.  In some historical cases it may not
880	   have been  possible to reconstruct the original deprecation date.
881	   For these cases, an approximate date appears in the registry.
882	   Although valid in language tags, subtags and tags with a 'Deprecated'
883	   field are deprecated and validating processors SHOULD NOT generate
884	   these subtags.  Note that a record that contains a 'Deprecated' field
885	   and no corresponding 'Canonical' field has no replacement mapping.

887	   The field 'Recommended-Prefix' MAY appear more than once per record.
888	   Additional fields of this type MAY be added to a record via the
889	   registration process.  The field-value of of this field consists of a
890	   language tag that is RECOMMENDED for use as a prefix for this subtag.
891	   For example, the variant subtag 'scouse' has a recommended prefix of
892	   "en".  This means that tags starting with the prefix "en-" are most
893	   appropriate with this subtag, so "en-Latn-scouse" and "en-GB-scouse"
894	   are both acceptable, while the tag "fr-scouse" is probably an
895	   inappropriate choice.

897	   The field of type Recommended-Prefix MUST NOT be removed from any
898	   record.  The field-value for this type of field MUST NOT be modified.

900	   The field 'Comments' MAY appear more than once per record.  This
901	   field MAY be inserted or changed via the registration process and no
902	   guarantee of stability is provided.  The content of this field is not
903	   restricted, except by the need to register the information, the
904	   suitability of the request, and by reasonable practical size
905	   limitations.  Long screeds about a particular subtag are frowned
906	   upon.

908	   The field 'Suppress-Script' MUST only appear in records whose 'Type'
909	   field-value is 'language'.  This field may appear at most one time in
910	   a record.  This field indicates a script used to write the
911	   overwhelming majority of documents for the given language and which
912	   therefore adds no distinguishing information to a language tag.  It
913	   helps ensure greater compatibility between the language tags
914	   generated according to the rules in this document and language tags
915	   and tag processors or consumers based on RFC 3066.  For example,
916	   virtually all Icelandic documents are written in the Latin script,
917	   making the subtag 'Latn' redundant in the tag "is-Latn".

919	   For examples of registry entries and their format, see Appendix C.

921	3.2  Maintenance of the Registry

923	   Maintenance of the registry requires that as new codes are assigned
924	   by ISO 639, ISO 15924, and ISO 3166, the Language Subtag Reviewer
925	   will evaluate each assignment, determine whether it conflicts with
926	   existing registry entries, and submit the information to IANA for
927	   inclusion in the registry.  If an assignment takes place and the
928	   Language Subtag Reviewer does not do this in a timely manner, then
929	   any interested party may use the procedure in Section 3.4 to register
930	   the appropriate update.

932	   Note: The redundant and grandfathered entries together are the
933	   complete list of tags registered under RFC 3066 [23].  The redundant
934	   tags are those that can now be formed using the subtags defined in
935	   the registry together with the rules of  Section 2.2.  The
936	   grandfathered entries are those that can never be legal under those
937	   same provisions.  The items in both lists are permanent and stable,
938	   although grandfathered items may be deprecated over time.  Refer to
939	   Section 3.7 for more information.

941	   RFC 3066 tags that were deprecated prior to the adoption of this
942	   document are part of the list of grandfathered tags and their
943	   component subtags were not included as registered variants (although
944	   they remain eligible for registration).  For example, the tag "art-
945	   lojban" was deprecated in favor of the language subtag 'jbo'.

947	   The Language Subtag Reviewer MUST ensure that new subtags meet the
948	   requirements in Section 4.1 or submit an appropriate alternate subtag
949	   as described in that section.  If a change or addition to the
950	   registry is required, the Language Subtag Reviewer will prepare the
951	   complete record, including all fields, and forward it to IANA for
952	   insertion into the registry.  If this represents a new subtag, then
953	   the message will indicate that this represents an INSERTION of a
954	   record.  If this represents a change to an existing subtag, then the
955	   message must indicate that this represents a MODIFICATION, as shown
956	   in the following example:

958	   LANGUAGE SUBTAG MODIFICATION
959	   File-Date: 2005-01-02
960	   %%
961	   Type: variant
962	   Subtag: nedis
963	   Description: Natisone dialect
964	   Description: Nadiza dialect
965	   Added: 2003-10-09
966	   Recommended-Prefix: sl
967	   Comments: This is a comment shown
968	     as an example.
969	   %%

971	                                 Figure 4

973	   Whenever an entry is created or modified in the registry, the 'File-
974	   Date' record at the start of the registry is updated to reflect the
975	   most recent modification date in the RFC 3339 [14] "full-date"
976	   format.

978	   Values in the 'Subtag' field must be lowercase except as provided for
979	   in Section 3.1.

981	3.3  Stability of IANA Registry Entries

983	   The stability of entries and their meaning in the registry is
984	   critical to the long term stability of language tags.  The rules in
985	   this section guarantee that a specific language tag's meaning is
986	   stable over time and will not change and that the choice of language
987	   tag for specific content is also stable over time.

989	   These rules specifically deal with how changes to codes (including
990	   withdrawal and deprecation of codes) maintained by ISO 639, ISO
991	   15924, ISO 3166, and UN M.49 are reflected in the IANA Language
992	   Subtag Registry.  Assignments to the IANA Language Subtag Registry
993	   MUST follow the following stability rules:

995	   o  Values in the fields 'Type', 'Subtag', 'Tag', 'Added' and
996	      'Canonical' MUST NOT be changed and are guaranteed to be stable
997	      over time.

999	   o  Values in the 'Description' field MUST NOT be changed in a way
1000	      that would invalidate previously-existing tags.  They may be
1001	      broadened somewhat in scope, changed to add information, or
1002	      adapted to the most common modern usage.  For example, countries
1003	      occasionally change their official names: an historical example of
1004	      this would be "Upper Volta" changing to "Burkina Faso".

1006	   o  Values in the field 'Recommended-Prefix' MAY be added via the
1007	      registration process.

1009	   o  Values in the field 'Recommended-Prefix' MAY be modified, so long
1010	      as the modifications broaden the set of recommended prefixes.
1011	      That is, a recommended prefix MAY be replaced by one of its own
1012	      prefixes.  For example, the prefix "en-US" could be replaced by
1013	      "en", but not by the ranges "en-Latn", "fr", or "en-US-boont".

1015	   o  Values in the field 'Recommended-Prefix' MUST NOT be removed.

1017	   o  The field 'Comments' MAY be added, changed, modified, or removed
1018	      via the registration process or any of the processes or
1019	      considerations described in this section.

1021	   o  The field 'Suppress-Script' MAY be added or removed via the
1022	      registration process.

1024	   o  Codes assigned by ISO 639, ISO 15924, and ISO 3166 that do not
1025	      conflict with existing subtags of the associated type and whose
1026	      meaning is not the same as an existing subtag of the same type are
1027	      entered into the IANA registry as new records and their value is
1028	      canonical for the meaning assigned to them.

1030	   o  Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are
1031	      withdrawn by their respective maintenance or registration
1032	      authority remain valid in language tags.  The registration process
1033	      MAY be used to add a note indicating the withdrawal of the code by
1034	      the respective standard.

1036	   o  Codes assigned by ISO 639, ISO 15924, or ISO 3166 that do not
1037	      conflict with existing subtags of the associated type but which
1038	      represent the same meaning as an existing subtag of that type are
1039	      entered into the IANA registry as new records.  The field
1040	      'canonical value' for that record MUST contain the existing subtag
1041	      of the same meaning

1043	      Example If ISO 3166 were to assign the code 'IM' to represent the
1044	         value "Isle of Man" (represented in the IANA registry by the UN
1045	         M.49 code '833'), '833' remains the canonical subtag and 'IM'
1046	         would be assigned '833' as a canonical value.  This prevents
1047	         tags that are in canonical form from becoming non-canonical.

1049	      Example If the tag 'enochian' were registered as a primary
1050	         language subtag and ISO 639 subsequently assigned an alpha-3
1051	         code to the same language, the new ISO 639 code would be
1052	         entered into the IANA registry as a subtag with a canonical
1053	         mapping to 'enochian'.  The new ISO code can be used, but it is
1054	         not canonical.

1056	   o  Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict
1057	      with existing subtags of the associated type MUST NOT be entered
1058	      into the registry.  The following additional considerations apply:

1060	      *  For ISO 639 codes, if the newly assigned code's meaning is not
1061	         represented by a subtag in the IANA registry, the Language
1062	         Subtag Reviewer, as described in Section 3.4, shall prepare a
1063	         proposal for entering in the IANA registry as soon as practical
1064	         a registered language subtag as an alternate value for the new
1065	         code.  The form of the registered language subtag will be at
1066	         the discretion of the Language Subtag Reviewer and must conform
1067	         to other restrictions on language subtags in this document.

1069	      *  For all subtags whose meaning is derived from an external
1070	         standard (i.e.  ISO 639, ISO 15924, ISO 3166, or UN M.49), if a
1071	         new meaning is assigned to an existing code and the new meaning
1072	         broadens the meaning of that code, then the meaning for the
1073	         associated subtag MAY be changed to match.  The meaning of a
1074	         subtag MUST NOT be narrowed, however, as this can result in an
1075	         unknown proportion of the existing uses of a subtag becoming
1076	         invalid.  Note: ISO 639 MA/RA has adopted a similar stability
1077	         policy.

1079	      *  For ISO 15924 codes, if the newly assigned code's meaning is
1080	         not represented by a subtag in the IANA registry, the Language
1081	         Subtag Reviewer, as described in Section 3.4, shall prepare a
1082	         proposal for entering in the IANA registry as soon as practical
1083	         a registered variant subtag as an alternate value for the new
1084	         code.  The form of the registered variant subtag will be at the
1085	         discretion of the Language Subtag Reviewer and must conform to
1086	         other restrictions on variant subtags in this document.

1088	      *  For ISO 3166 codes, if the newly assigned code's meaning is
1089	         associated with the same UN M.49 code as another 'region'
1090	         subtag, then the existing region subtag remains as the
1091	         canonical entry for that region and no new entry is created.  A
1092	         comment MAY be added to the existing region subtag indicating
1093	         the relationship to the new ISO 3166 code.

1095	      *  For ISO 3166 codes, if the newly assigned code's meaning is
1096	         associated with a UN M.49 code that is not represented by an
1097	         existing region subtag, then then the Language Subtag Reviewer,
1098	         as described in Section 3.4, shall prepare a proposal for
1099	         entering the appropriate numeric UN country code as an entry in
1100	         the IANA registry.

1102	      *  For ISO 3166 codes, if there is no associated UN numeric code,
1103	         then the Language Subtag Reviewer SHALL petition the UN to
1104	         create one.  If there is no response from the UN within ninety
1105	         days of the request being sent, the Language Subtag Reviewer
1106	         shall prepare a proposal for entering in the IANA registry as
1107	         soon as practical a registered variant subtag as an alternate
1108	         value for the new code.  The form of the registered variant
1109	         subtag will be at the discretion of the Language Subtag
1110	         Reviewer and must conform to other restrictions on variant
1111	         subtags in this document.  This situation is very unlikely to
1112	         ever occur.

1114	   o  Stability provisions apply to grandfathered tags with this
1115	      exception: should all of the subtags in a grandfathered tag become
1116	      valid subtags in the IANA registry, then the grandfathered tag
1117	      MUST be marked as redundant.  Note that this will not affect
1118	      language tags that match the grandfathered tag, since these tags
1119	      will now match valid generative subtag sequences.  For example, if
1120	      the subtag 'gan' in the language tag "zh-gan" were to be
1121	      registered as an extended language subtag, then the grandfathered
1122	      tag "zh-gan" would be deprecated (but existing content or
1123	      implementations that use "zh-gan" would remain valid).

1125	3.4  Registration Procedure for Subtags

1127	   The procedure given here MUST be used by anyone who wants to use a
1128	   subtag not currently in the IANA Language Subtag Registry.

1130	   Only subtags  of type 'language' and 'variant' will be considered for
1131	   independent registration of new subtags.  Handling of subtags
1132	   required for stability and subtags required to keep the registry
1133	   synchronized with ISO 639, ISO 15924, ISO 3166, and UN M.49 within
1134	   the limits defined by this document are described in Section 3.2.
1135	   Stability provisions are described in Section 3.3.

1137	   This procedure MAY also be used to register or alter the information
1138	   for the "Description", "Comments", "Deprecated", or "Recommended-
1139	   Prefix" fields in a subtag's record as described in Figure 7.
1140	   Changes to all other fields in the IANA registry are NOT permitted.

1142	   Registering a new subtag or requesting modifications to an existing
1143	   tag or subtag starts with the requster filling out the registration
1144	   form reproduced below.  Note that each response is not limited in
1145	   size and should take the room necessary to adequately describe the
1146	   registration.  The fields in the "Record Requested" section SHOULD
1147	   follow the requirements in Section 3.1.

1149	   LANGUAGE SUBTAG REGISTRATION FORM
1150	   1. Name of requester:
1151	   2. E-mail address of requester:
1152	   3. Record Requested:

1154	   Type:
1155	   Subtag:
1156	   Description:
1157	   Recommended-Prefix:
1158	   Canonical:
1159	   Deprecated:
1160	   Suppress-Script:
1161	   Comments:

1163	   4. Intended meaning of the subtag:
1164	   5. Reference to published description
1165	   of the language (book or article):
1166	   6. Any other relevant information:

1168	                                 Figure 5

1170	   The subtag registration form MUST be sent to
1171	   <ietf-languages@iana.org> for a two week review period before it can
1172	   be submitted to IANA.  (This is an open list.  Requests to be added
1173	   should be sent to <ietf-languages-request@iana.org>.)

1175	   Variant subtags are generally registered for use with a particular
1176	   range of language tags.  For example, the subtag 'scouse' is intended
1177	   for use with language tags that start with the primary language
1178	   subtag "en", since Scouse is a dialect of English.  Thus the subtag
1179	   'scouse' could be included in tags such as "en-Latn-scouse" or "en-
1180	   GB-scouse".  This information is stored in the "Recommended-Prefix"
1181	   field in the registry.  Variant registration requests are REQUIRED to
1182	   include at least one "Recommended-Prefix" field in the registration
1183	   form.

1185	   Any subtag MAY be incorporated into a variety of language tags,
1186	   according to the rules of Section 2.1, including tags that do not
1187	   match any of the recommended prefixes of the registered subtag.
1188	   (Note that this is probably a poor choice.)  This makes validation
1189	   simpler and thus more uniform across implementations, and does not
1190	   require the registration of a separate subtag for the same purpose
1191	   and meaning but a different recommended prefix.

1193	   The recommended prefixes for a given registered subtag will be
1194	   maintained in the IANA registry as a guide to usage.  If it is
1195	   necessary to add an additional prefix to that list for an existing
1196	   language tag, that can be done by filing an additional registration
1197	   form.  In that form, the "Any other relevant information:" field
1198	   should indicate that it is the addition of an additional recommended
1199	   prefix.

1201	   Requests to add a recommended prefix to a subtag that imply a
1202	   different semantic meaning will probably be rejected.  For example, a
1203	   request to add the prefix "de" to the subtag 'nedis' so that the tag
1204	   "de-nedis" represented some German dialect would be rejected.  The
1205	   'nedis' subtag represents a particular Slovenian dialect and the
1206	   additional registration would change the semantic meaning assigned to
1207	   the subtag.  A separate subtag should be proposed instead.

1209	   The 'Description' field must contain a description of the tag being
1210	   registered written or transcribed into the Latin script; it may also
1211	   include a description in a non-Latin script.  Non-ASCII characters
1212	   must be escaped using the syntax described in Section 3.1.  The
1213	   'Description' field is used for identification purposes and should
1214	   not be taken to represent the actual native name of the language or
1215	   variation or to be in any particular language.

1217	   While the 'Description' field itself is not guaranteed to be stable
1218	   and errata corrections may be undertaken from time to time, attempts
1219	   to provide translations or transcriptions of entries in the registry
1220	   itself will probably be frowned upon by the community or rejected
1221	   outright, as changes of this nature may impact the provisions in
1222	   Section 3.3.

1224	   The Language Subtag Reviewer is responsible for responding to
1225	   requests for the registration of subtags through the registration
1226	   process  and is appointed by the IESG.

1228	   When the two week period has passed the Language Subtag Reviewer
1229	   either forwards the record to be inserted or modified to
1230	   iana@iana.org according to the procedure described in Section 3.2, or
1231	   rejects the request because of significant objections raised on the
1232	   list or due to problems with constraints in this document (which
1233	   should be explicitly cited).  The reviewer may also extend the review
1234	   period in two week increments to permit further discussion.  The
1235	   reviewer must indicate on the list whether the registration has been
1236	   accepted, rejected, or extended following each two week period.

1238	   Note that the reviewer can raise objections on the list if he or she
1239	   so desires.  The important thing is that the objection must be made
1240	   publicly.

1242	   The applicant is free to modify a rejected application with
1243	   additional information and submit it again; this restarts the two
1244	   week comment period.

1246	   Decisions made by the reviewer may be appealed to the IESG [RFC 2028]
1247	   [9] under the same rules as other IETF decisions [RFC 2026] [8].

1249	   All approved registration forms are available online in the directory
1250	   http://www.iana.org/numbers.html under "languages".

1252	   Updates or changes to existing records, including previous
1253	   registrations, follow the same procedure as new registrations.  The
1254	   Language Subtag Reviewer decides whether there is consensus to update
1255	   the registration following the two week review period; normally
1256	   objections by the original registrant will carry extra weight in
1257	   forming such a consensus.

1259	   Registrations are permanent and stable.  Once registered, subtags
1260	   will not be removed from the registry and will remain the canonical
1261	   method of referring to a specific language or variant.  This
1262	   provision does not apply to grandfathered tags, which may become
1263	   deprecated due to registration of subtags.  For example, the tag
1264	   "i-navajo" is deprecated in favor of the tag "nv", which consists of
1265	   the single primary language subtag 'nv'.

1267	   Note: The purpose of the "published description" in the registration
1268	   form is intended as an aid to people trying to verify whether a
1269	   language is registered or what language or language variation a
1270	   particular subtag refers to.  In most cases, reference to an
1271	   authoritative grammar or dictionary of that language will be useful;
1272	   in cases where no such work exists, other well known works describing
1273	   that language or in that language may be appropriate.  The subtag
1274	   reviewer decides what constitutes "good enough" reference material.
1275	   This requirement is not intended to exclude particular languages or
1276	   dialects due to the size of the speaker population or lack of a
1277	   standardized orthography.  Minority languages will be considered
1278	   equally on their own merits.

1280	3.5  Possibilities for Registration

1282	   Possibilities for registration of subtags or information about
1283	   subtags include:

1285	   o  Primary language subtags for languages not listed in ISO 639 that
1286	      are not variants of any listed or registered language can be
1287	      registered.  At the time this document was created there were no
1288	      examples of this form of subtag.  Before attempting to register a
1289	      language subtag, there MUST be an attempt to register the language
1290	      with ISO 639.  No language subtags will be registered for codes
1291	      that exist in ISO 639-1 or ISO 639-2, which are under
1292	      consideration by the ISO 639 maintenance or registration
1293	      authorities, or which have never been attempted for registration
1294	      with those authorities.  If ISO 639 has previously rejected a
1295	      language for registration, it is reasonable to assume that there
1296	      MUST be additional very compelling evidence of need before it will
1297	      be registered in the IANA registry (to the extent that it is very
1298	      unlikely that any subtags will be registered of this type).

1300	   o  Dialect or other divisions or variations within a language, its
1301	      orthography, writing system, regional or historical usage,
1302	      transliteration or other transformation, or distinguishing
1303	      variation may be registered as variant subtags.  An example is the
1304	      'scouse' subtag (the Scouse dialect of English).

1306	   o  The addition or maintenance of fields (generally of an
1307	      informational nature) in Tag or Subtag records as described in
1308	      Section 3.1 and subject to the stability provisions in
1309	      Section 3.3.  This includes descriptions, recommended prefixes,
1310	      comments, deprecation of obsolete items, or the addition of script
1311	      or extlang information to primary language subtags.

1313	   This document leaves the decision on what subtags  or changes to
1314	   subtags are appropriate (or not) to the registration process
1315	   described in Section 3.4.

1317	   Note: four character primary language subtags are reserved to allow
1318	   for the possibility of  alpha4 codes in some future addition to the
1319	   ISO 639 family of standards.

1321	   ISO 639 defines a maintenance agency for additions to and changes in
1322	   the list of languages in ISO 639.  This agency is:

1324	   International Information Centre for Terminology (Infoterm)
1325	   Aichholzgasse 6/12, AT-1120
1326	   Wien, Austria
1327	   Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72

1329	   ISO 639-2 defines a maintenance agency for additions to and changes
1330	   in the list of languages in ISO 639-2.  This agency is:

1332	   Library of Congress
1333	   Network Development and MARC Standards Office
1334	   Washington, D.C. 20540 USA
1335	   Phone: +1 202 707 6237  Fax: +1 202 707 0115
1336	   URL: http://www.loc.gov/standards/iso639

1338	   The maintenance agency for ISO 3166 (country codes) is:

1340	   ISO 3166 Maintenance Agency
1341	   c/o International Organization for Standardization
1342	    Case postale 56
1343	   CH-1211 Geneva 20 Switzerland
1344	   Phone: +41 22 749 72 33  Fax: +41 22 749 73 49
1345	   URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html

1347	   The registration authority for ISO 15924 (script codes) is:

1349	   Unicode Consortium Box 391476
1350	   Mountain View, CA 94039-1476, USA
1351	   URL: http://www.unicode.org/iso15924

1353	   The Statistics Division of the United Nations Secretariat maintains
1354	   the Standard Country or Area Codes for Statistical Use and can be
1355	   reached at:

1357	   Statistical Services Branch
1358	   Statistics Division
1359	   United Nations, Room DC2-1620
1360	   New York, NY 10017, USA

1362	   Fax: +1-212-963-0623
1363	   E-mail: statistics@un.org
1364	   URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm

1366	3.6  Extensions and Extensions Namespace

1368	   Extension subtags are those introduced by single-letter subtags other
1369	   than 'x'.  They are reserved for the generation of identifiers which
1370	   contain a language component, and are compatible with applications
1371	   that understand language tags.  For example, they might be used to
1372	   define locale identifiers, which are generally based on language.

1374	   The structure and form of extensions are defined by this document so
1375	   that implementations can be created that are forward compatible with
1376	   applications that may be created using single-letter subtags in the
1377	   future.  In addition, defining a mechanism for maintaining single-
1378	   letter subtags will lend to the stability of this document by
1379	   reducing the likely need for future revisions or updates.

1381	   Allocation of a single-letter subtag shall take the form of an RFC
1382	   defining the name, purpose, processes, and procedures for maintaining
1383	   the subtags.  The maintaining or registering authority, including
1384	   name, contact email, discussion list email, and URL location of the
1385	   registry must be indicated clearly in the RFC.  The RFC MUST specify
1386	   or include each of the following:

1388	   o  The specification MUST reference the specific version or revision
1389	      of this document that governs its creation and MUST reference this
1390	      section of this document.

1392	   o  The specification and all subtags defined by the specification
1393	      MUST follow the ABNF and other rules for the formation of tags and
1394	      subtags as defined in this document.  In particular it MUST
1395	      specify that case is not significant and that subtags MUST NOT
1396	      exceed eight characters in length.

1398	   o  The specification MUST specify a canonical representation.

1400	   o  The specification of valid subtags MUST be available over the
1401	      Internet and at no cost.

1403	   o  The specification MUST be in the public domain or available via a
1404	      royalty-free license acceptable to the IETF and specified in the
1405	      RFC.

1407	   o  The specification MUST be versioned and each version of the
1408	      specification MUST be numbered, dated, and stable.

1410	   o  The specification MUST be stable.  That is, extension subtags,
1411	      once defined by a specification, MUST NOT be retracted or change
1412	      in meaning in any substantial way.

1414	   o  The specification MUST include in a separate section the
1415	      registration form reproduced in this section (below) to be used in
1416	      registering the extension upon publication as an RFC.

1418	   o  IANA MUST be informed of changes to the contact information and
1419	      URL for the specification.

1421	   o  Modified the latin-script requirement on the 'Description' field
1422	      so that "at least one Description field" must contain a Latin
1423	      transcription.  (A.Phillips)

1425	   IANA will maintain a registry of allocated single-letter (singleton)
1426	   subtags.  This registry will use the record-jar format described by
1427	   the ABNF in Section 3.1.  Upon publication of an extension as an RFC,
1428	   the maintaining authority defined in the RFC must forward this
1429	   registration form to iesg@ietf.org, who will forward the request to
1430	   iana@iana.org.  The maintaining authority of the extension MUST
1431	   maintain the accuracy of the record by sending an updated full copy
1432	   of the record to iana@iana.org with the subject line "LANGUAGE TAG
1433	   EXTENSION UPDATE" whenever content changes.  Only the 'Comments',
1434	   'Contact_Email', 'Mailing_List', and 'URL' fields may be modified in
1435	   these updates.

1437	   Failure to maintain this record, the corresponding registry, or meet
1438	   other conditions imposed by this section of this document may be
1439	   appealed to the IESG [RFC 2028] [9] under the same rules as other
1440	   IETF decisions (see [8]) and may result in the authority to maintain
1441	   the extension being withdrawn or reassigned by the IESG.
1442	   %%
1443	   Identifier:
1444	   Description:
1445	   Comments:
1446	   Added:
1447	   RFC:
1448	   Authority:
1449	   Contact_Email:
1450	   Mailing_List:
1451	   URL:
1452	   %%

1454	    Figure 6: Format of Records in the Language Tag Extensions Registry

1456	   'Identifier' contains the single letter subtag (singleton) assigned
1457	   to the extension.  The Internet-Draft submitted to define the
1458	   extension should specific which letter to use, although the IESG may
1459	   change the assignment when approving the RFC.

1461	   'Description' contains the name and description of the extension.

1463	   'Comments' is an optional field and may contain a broader description
1464	   of the extension.

1466	   'Added' contains the date the RFC was published in the "full-date"
1467	   format specified in RFC 3339 [14].  For example: 2004-06-28
1468	   represents June 28, 2004, in the Gregorian calendar.

1470	   'RFC' contains the RFC number assigned to the extension.

1472	   'Authority' contains the name of the maintaining authority for the
1473	   extension.

1475	   'Contact_Email' contains the email address used to contact the
1476	   maintaining authority.

1478	   'Mailing_List' contains the URL or subscription email address of the
1479	   mailing list used by the maintaining authority.

1481	   'URL' contains the URL of the registry for this extension.

1483	   The determination of whether an Internet-Draft meets the above
1484	   conditions and the decision to grant or withhold such authority rests
1485	   solely with the IESG, and is subject to the normal review and appeals
1486	   process associated with the RFC process.

1488	   Extension authors are strongly cautioned that many (including most
1489	   well-formed) processors will be unaware of any special relationships
1490	   or meaning inherent in the order of extension subtags.  Extension
1491	   authors SHOULD avoid subtag relationships or canonicalization
1492	   mechanisms that interfere with matching or with length restrictions
1493	   that may exist in common protocols where the extension is used.  In
1494	   particular, applications may truncate the subtags in doing matching
1495	   or in fitting into limited lengths, so it is RECOMMENDED that the
1496	   most significant information be in the most significant (left-most)
1497	   subtags, and that the specification gracefully handle truncated
1498	   subtags.

1500	   When a language tag is to be used in a specific, known, protocol, it
1501	   is RECOMMENDED that that the language tag not contain extensions not
1502	   supported by that protocol.  In addition, it should be noted that
1503	   some protocols may impose upper limits on the length of the strings
1504	   used to store or transport the language tag.

1506	3.7  Conversion of the RFC 3066 Language Tag Registry

1508	   Upon publication of this document as a BCP, the existing IANA
1509	   language tag registry must be converted into the new subtag registry.
1510	   This section defines the process for performing this conversion.

1512	   The impact on the IANA maintainers of the registry of this conversion
1513	   will be a small increase in the frequency of new entries.  The
1514	   initial set of records represents no impact on IANA, since the work
1515	   to create it will be performed externally (as defined in this
1516	   section).  Future work will be limited to inserting or replacing
1517	   whole records preformatted for IANA by the Language Subtag Reviewer.

1519	   When this document is published, an email will be sent by the
1520	   chair(s) of the LTRU working group to the LTRU and ietf-languages
1521	   mail lists advising of the impending conversion of the registry.  In
1522	   that notice, the chair(s) will provide a URL whose referred content
1523	   is the proposed IANA Language Subtag Registry following conversion.
1524	   There will be a Last Call period of not less than four weeks for
1525	   comments and corrections to be discussed on the
1526	   ietf-languages@iana.org mail list.  Changes as a result of comments
1527	   will not restart the Last Call period.  At the end of the period, the
1528	   chair(s) will forward the URL to IANA, which will post the new
1529	   registry on-line.

1531	   Tags that are currently deprecated will be maintained as
1532	   grandfathered entries.  The record for the grandfathered entry will
1533	   contain a 'Deprecated' field with the most appropriate date that can
1534	   be determined for when the record was deprecated.  The 'Comments'
1535	   field will contain the reason for the deprecation.  The 'Canonical'
1536	   field will contain the tag that replaces the value.  For example, the
1537	   tag "art-lojban" is deprecated and will be placed in the
1538	   grandfathered section.  It's 'Deprecated' field will contain the
1539	   deprecation date and 'Canonical' field the value "jbo".

1541	   Tags that are not deprecated that consist entirely of subtags that
1542	   are valid under this document and which have the correct form and
1543	   format for tags defined by this document are superseded by this
1544	   document.  Such tags are placed in records of type 'redundant' in the
1545	   registry.  For example, "zh-Hant" is now defined by this document.

1547	   Tags that are not deprecated and which contain subtags which are
1548	   consistent with registration under the guidelines in this document
1549	   will have a new subtag registration created for each eligible subtag.
1550	   If all of the subtags in the original tag are fully defined by the
1551	   resulting registrations or by this document, then the original tag is
1552	   superseded by this document.  Such tags are placed in the 'redundant'
1553	   section of the registry.  For example, "en-boont" will result in a
1554	   new subtag 'boont' and the RFC 3066 registered tag "en-boont" placed
1555	   in the redundant section of the registry.

1557	   Tags that contain one or more subtags that do not match the valid
1558	   registration pattern and which are not otherwise defined by this
1559	   document will have records of type  'grandfathered' created in the
1560	   registry.

1562	   There will be a reasonable period in which the community may comment
1563	   on the proposed list entries, which SHALL be no less than four weeks
1564	   in length.  At the completion of this period, the chair(s) will
1565	   notify iana@iana.org and the ltru and ietf-languages mail lists that
1566	   the task is complete and forward the necessary materials to IANA for
1567	   publication.

1569	   Registrations that are in process under the rules defined in RFC 3066
1570	   MAY be completed under the former rules, at the discretion of the
1571	   language tag reviewer.  Any new registrations submitted after the
1572	   request for conversion of the registry MUST be rejected.

1574	   All existing RFC 3066 language tag registrations will be maintained
1575	   in perpetuity.

1577	   Users of tags that are grandfathered should consider registering
1578	   appropriate subtags in the IANA subtag registry (but are not required
1579	   to).

1581	   Where two subtags have the same meaning, the priority of which to
1582	   make canonical SHALL be the following:

1584	   o  As of the date of acceptance of this document as a BCP, if a code
1585	      exists in the associated ISO standard and it is not deprecated or
1586	      withdrawn as of that date, then it has priority.

1588	   o  Otherwise, the earlier-registered tag in the associated ISO
1589	      standard has priority.

1591	   UN numeric codes assigned to 'macro-geographical (continental)' or
1592	   sub-regions not associated with an assigned ISO 3166 alpha-2 code are
1593	   defined in the IANA registry and are valid for use in language tags.
1594	   These codes MUST be added to the initial version of the registry.
1595	   The UN numeric codes for 'economic groupings' or 'other groupings',
1596	   and the alphanumeric codes in Appendix X of the UN document MUST NOT
1597	   be added to the registry.

1599	   When creating records for ISO 639, ISO 15924, ISO3166, and UN M.49
1600	   codes, the following criteria SHALL be applied to the inclusion,
1601	   canonical mapping, and deprecation of codes:

1603	   For each standard, the date of the standard referenced in RFC 1766 is
1604	   selected as the starting date.  Codes that were valid on that date in
1605	   the selected standard are added to the registry.  Codes that were
1606	   previously assigned by were vacated or withdrawn before that date are
1607	   not added to the registry.  For each successive change to the
1608	   standard, any additional assignments are added to the registry.
1609	   Values that are withdrawn are marked as deprecated, but not removed.
1610	   Changes in meaning or assignment of a subtag are permitted during
1611	   this process (cf. 'CS').  This continues up to the date that this
1612	   document was adopted.  The resulting set of records is added to the
1613	   registry.  Future changes or additions to this portion of the
1614	   registry are governed by the provisions of this document.

1616	4.  Formation and Processing of Language Tags

1618	   This section addresses how to use the registry with the language tag
1619	   format to choose, form and process language tags.

1621	4.1  Choice of Language Tag

1623	   One may occasionally be faced with several possible tags for the same
1624	   body of text.

1626	   Interoperability is best served when all users use the same language
1627	   tag in order to represent the same language.  If an application has
1628	   requirements that make the rules here inapplicable, then that
1629	   application risks damaging interoperability.  It is strongly
1630	   RECOMMENDED that users not define their own rules for language tag
1631	   choice.

1633	   Of particular note, many applications can benefit from the use of
1634	   script subtags in language tags, as long as the use is consistent for
1635	   a given context.  Script subtags were not formally defined in RFC
1636	   3066 and their use may affect matching and subtag identification by
1637	   implementations of RFC 3066, as these subtags appear between the
1638	   primary language and region subtags.  For example, if a user requests
1639	   content in an implementation of Section 2.5 of RFC 3066 [23] using
1640	   the language range "en-US", content labeled "en-Latn-US" will not
1641	   match the request.  Therefore it is important to know when script
1642	   subtags will customarily be used and when they should not be used.
1643	   In the registry, the Suppress-Script field helps ensure greater
1644	   compatibility between the language tags generated according to the
1645	   rules in this document and language tags and tag processors or
1646	   consumers based on RFC 3066 by defining when users should generally
1647	   not include a script subtag with a particular primary language
1648	   subtag.

1650	   Extended language subtags (type 'extlang' in the registry, see
1651	   Section 3.1) also appear between the primary language and region
1652	   subtags and are reserved for future standardization.  Applications
1653	   may benefit from their judicious use in forming language tags in the
1654	   future and similar recommendations are expected to apply to their use
1655	   as apply to script subtags.

1657	   Standards, protocols and applications that reference this document
1658	   normatively but apply different rules to the ones given in this
1659	   section MUST specify how the procedure varies from the one given
1660	   here.

1662	   The choice of subtags used to form a language tag should be guided by
1663	   the following rules:

1665	   1.  Use as precise a tag as possible, but no more specific than is
1666	       justified.  Avoid using subtags that are not important for
1667	       distinguishing content in an application.

1669	       *  For example, 'de' might suffice for tagging an email written
1670	          in German, while "de-CH-1996" is probably unnecessarily
1671	          precise for such a task.

1673	   2.  The script subtag SHOULD NOT be used to form language tags unless
1674	       the script adds some distinguishing information to the tag.  The
1675	       field 'Suppress-Script' in the primary language record in the
1676	       registry indicates which script subtags do not add distinguishing
1677	       information for most applications.

1679	       *  For example, the subtag 'Latn' should not be used with the
1680	          primary language 'en' because nearly all English documents are
1681	          written in the Latin script and it adds no distinguishing
1682	          information.  However, if a document were written in English
1683	          mixing Latin script with another script such as Braille
1684	          ('Brai'), then it may be appropriate to choose to indicate
1685	          both scripts to aid in content selection, such as the
1686	          application of a stylesheet.

1688	   3.  If a subtag has a 'Canonical' field in its registry entry, the
1689	       canonical subtag SHOULD be used to form the language tag in
1690	       preference to any of its aliases.

1692	       *  For example, use 'he' for Hebrew in preference to 'iw'.

1694	   4.  The 'und' (Undetermined) primary language subtag SHOULD NOT be
1695	       used to label content, even if the language is unknown.  Omitting
1696	       the language tag altogether is preferred to using a tag with a
1697	       primary language subtag of 'und'.  The 'und' subtag may be useful
1698	       for protocols that require a language tag to be provided.  The
1699	       'und' subtag may also be useful when matching language tags in
1700	       certain situations.

1702	   5.  The 'mul' (Multiple) primary language subtag SHOULD NOT be used
1703	       whenever the protocol allows the separate tags for multiple
1704	       languages, as is the case for the Content-Language header in
1705	       HTTP.  The 'mul' subtag conveys little useful information:
1706	       content in multiple languages should individually tag the
1707	       languages where they appear or otherwise indicate the actual
1708	       language in preference to the 'mul' subtag.

1710	   6.  The same variant subtag SHOULD NOT be used more than once within
1711	       a language tag.

1713	       *  For example, do not use "en-GB-scouse-scouse".

1715	   To ensure consistent backward compatibility, this document contains
1716	   several provisions to account for potential instability in the
1717	   standards used to define the subtags that make up language tags.
1718	   These provisions mean that no language tag created under the rules in
1719	   this document will become obsolete.  In addition, tags that are in
1720	   canonical form will always be in canonical form.

1722	4.2  Meaning of the Language Tag

1724	   The language tag always defines a language as spoken (or written,
1725	   signed or otherwise signaled) by human beings for communication of
1726	   information to other human beings.  Computer languages such as
1727	   programming languages are explicitly excluded.

1729	   If a language tag B contains language tag A as a prefix, then B is
1730	   typically "narrower" or "more specific" than A. For example, "zh-
1731	   Hant-TW" is more specific than "zh-Hant".

1733	   This relationship is not guaranteed in all cases: specifically,
1734	   languages that begin with the same sequence of subtags are NOT
1735	   guaranteed to be mutually intelligible, although they may be.  For
1736	   example, the tag "az" shares a prefix with both "az-Latn"
1737	   (Azerbaijani written using the Latin script) and "az-Cyrl"
1738	   (Azerbaijani written using the Cyrillic script).  A person fluent in
1739	   one script may not be able to read the other, even though the text
1740	   might be identical.  Content tagged as "az" most probably is written
1741	   in just one script and thus might not be intelligible to a reader
1742	   familiar with the other script.

1744	   The relationship between the tag and the information it relates to is
1745	   defined by the standard describing the context in which it appears.
1746	   Accordingly, this section can only give possible examples of its
1747	   usage.

1749	   o  For a single information object, the associated language tags
1750	      might be interpreted as the set of languages that is required for
1751	      a complete comprehension of the complete object.  Example: Plain
1752	      text documents.

1754	   o  For an aggregation of information objects, the associated language
1755	      tags could be taken as the set of languages used inside components
1756	      of that aggregation.  Examples: Document stores and libraries.

1758	   o  For information objects whose purpose is to provide alternatives,
1759	      the associated language tags could be regarded as a hint that the
1760	      content is provided in several languages, and that one has to
1761	      inspect each of the alternatives in order to find its language or
1762	      languages.  In this case, the presence of multiple tags might not
1763	      mean that one needs to be multi-lingual to get complete
1764	      understanding of the document.  Example: MIME multipart/
1765	      alternative.

1767	   o  In markup languages, such as HTML and XML, language information
1768	      can be added to each part of the document identified by the markup
1769	      structure (including the whole document itself).  For example, one
1770	      could write <span lang="fr">C'est la vie.</span> inside a
1771	      Norwegian document; the Norwegian-speaking user could then access
1772	      a French-Norwegian dictionary to find out what the marked section
1773	      meant.  If the user were listening to that document through a
1774	      speech synthesis interface, this formation could be used to signal
1775	      the synthesizer to appropriately apply French text-to-speech
1776	      pronunciation rules to that span of text, instead of applying the
1777	      inappropriate Norwegian rules.

1779	4.3  Canonicalization of Language Tags

1781	   Since a particular language tag may be used in many processes,
1782	   language tags SHOULD always be created or generated in a canonical
1783	   form.

1785	   A language tag is in canonical form when:

1787	   1.  The tag is well-formed according the rules in Section 2.1 and
1788	       Section 2.2.

1790	   2.  None of the subtags in the language tag has a Canonical-Value
1791	       mapping in the IANA registry (see Section 3.1).  Subtags with a
1792	       Canonical-Value mapping MUST be replaced with their mapping in
1793	       order to canonicalize the tag.

1795	   3.  If more than one extension subtag sequence exists, the extension
1796	       sequences are ordered into case-insensitive ASCII order by
1797	       singleton subtag.

1799	   Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical
1800	   form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in
1801	   canonical form.

1803	   Example: The language tag "en-NH" (English as used in the New
1804	   Hebrides) is not canonical because the 'NH' subtag has a canonical
1805	   mapping to 'VU' (Vanuatu).

1807	   Canonicalization of language tags does not imply anything about the
1808	   use of upper or lowercase letters when processing or comparing
1809	   subtags (and as described in Section 2.1).  All comparisons MUST be
1810	   performed in a case-insensitive manner.

1812	   When performing canonicalization of language tags, processors MAY
1813	   optionally regularize the case of the subtags, following the case
1814	   used in the registry.  Note that this corresponds to the following
1815	   casing rules: uppercase all non-initial two-letter subtags; titlecase
1816	   all non-initial four-letter subtags; lowercase everything else.

1818	   Note: Case folding of ASCII letters in certain locales, unless
1819	   carefully handled, may produce non-ASCII character values.  The
1820	   Unicode Character Database file "SpecialCasing.txt" defines the
1821	   specific cases that are known to cause problems with this.  In
1822	   particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is
1823	   uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE).
1824	   Implementers should specify a locale-neutral casing operation to
1825	   ensure that case folding of subtags does not produce this value,
1826	   which is illegal in language tags.  For example, if one were to
1827	   uppercase the region subtag 'in' using Turkish locale rules, the
1828	   sequence U+0130 U+004E would result instead of the expected 'IN'.

1830	   Note: if the field 'Deprecated' appears in a registry record without
1831	   an accompanying 'Canonical' field, then that tag or subtag is
1832	   deprecated without a replacement.  Validating processors SHOULD NOT
1833	   generate tags that include these values, although the values are
1834	   canonical when they appear in a language tag.

1836	   An extension MUST define any relationships that may exist between the
1837	   various subtags in the extension and thus MAY define an alternate
1838	   canonicalization scheme for the extension's subtags.  Extensions MAY
1839	   define how the order of the extension's subtags are interpreted.  For
1840	   example, an extension could define that its subtags are in canonical
1841	   order when the subtags are placed into ASCII order: that is, "en-a-
1842	   aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa".  Another extension might
1843	   define that the order of the subtags influences their semantic
1844	   meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
1845	   aaa-bbb-ccc").  However, extension specifications SHOULD be designed
1846	   so that they are tolerant of the typical processes described in
1847	   Section 3.6.

1849	4.4  Considerations for Private Use Subtags

1851	   Private-use subtags require private agreement between the parties
1852	   that intend to use or exchange language tags that use them and great
1853	   caution should be used in employing them in content or protocols
1854	   intended for general use.  Private-use subtags are simply useless for
1855	   information exchange without prior arrangement.

1857	   The value and semantic meaning of private-use tags and of the subtags
1858	   used within such a language tag are not defined by this document.

1860	   The use of subtags defined in the IANA registry as having a specific
1861	   private use meaning convey more information that a purely private use
1862	   tag prefixed by the singleton subtag 'x'.  For applications this
1863	   additional information may be useful.

1865	   For example, the region subtags 'AA', 'ZZ' and in the ranges
1866	   'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) may
1867	   be used to form a language tag.  A tag such as "zh-Hans-XQ" conveys a
1868	   great deal of public, interchangeable information about the language
1869	   material (that it is Chinese in the simplified Chinese script and is
1870	   suitable for some geographic region 'XQ').  While the precise
1871	   geographic region is not known outside of private agreement, the tag
1872	   conveys far more information than an opaque tag such as "x-someLang",
1873	   which contains no information about the language subtag or script
1874	   subtag outside of the private agreement.

1876	   However, in some cases content tagged with private use subtags may
1877	   interact with other systems in a different and possibly unsuitable
1878	   manner compared to tags that use opaque, privately defined subtags,
1879	   so the choice of the best approach may depend on the particular
1880	   domain in question.

1882	5.  IANA Considerations

1884	   This section deals with the processes and requirements necessary for
1885	   IANA to undertake to maintain the rsubtag and extension registries as
1886	   defined by this document and in accordance with the requirements of
1887	   RFC 2434 [11].

1889	   The impact on the IANA maintainers of the two registries defined by
1890	   this document will be a small increase in the frequency of new
1891	   entries or updates.

1893	   Upon adoption of this document, the process described in Section 3.7
1894	   will be used to generate the initial Language Subtag Registry.  The
1895	   initial set of records represents no impact on IANA, since the work
1896	   to create it will be performed externally (as defined in that
1897	   section).  The new registry will be listed under "Language Tags" at
1898	   <http://www.iana.org/numbers.html>.  The existing directory of
1899	   registration forms and RFC 3066 registrations will be relabeled as
1900	   "Language Tags (Obsolete)" and maintained (but not added to or
1901	   modified).

1903	   Future work on the Language Subtag Registry will be limited to
1904	   inserting or replacing whole records preformatted for IANA by the
1905	   Language Subtag Reviewer as described in Section 3.2 of this
1906	   document.  Each record will be sent to iana@iana.org with a subject
1907	   line indicating whether the enclosed record is an insertion (of a new
1908	   record) or a replacment of an existing record which has a Type and
1909	   Subtag (or Tag) field that exactly matches the record sent.  Records
1910	   cannot be deleted from the registry.

1912	   The Language Tag Extensions registry will also be generated and sent
1913	   to IANA as described in Section 3.6.  This registry may contain at
1914	   most 35 records and thus changes to this registry are expected to be
1915	   very infrequent.

1917	   Future work by IANA on the Language Tag Extensions Registry is
1918	   limited to two cases.  First, the IESG may request that new records
1919	   be inserted into this registry from time to time.  These requests
1920	   will include the record to insert in the exact format described in
1921	   Section 3.6.  In addition, there may be occasional requests from the
1922	   maintaining authority for a specific extension to update the contact
1923	   information or URLs in the record.  These requests MUST include the
1924	   complete, updated record.  IANA is not responsible for validating the
1925	   information provided, only that it is properly formatted.  It should
1926	   reasonably be seen to come from the maintaining authority named in
1927	   the record present in the registry.

1929	6.  Security Considerations

1931	   The only security issue that has been raised with language tags since
1932	   the publication of RFC 1766 [21], which stated that "Security issues
1933	   are believed to be irrelevant to this memo", is a concern with
1934	   language identifiers used in content negotiation - that they may be
1935	   used to infer the nationality of the sender, and thus identify
1936	   potential targets for surveillance.

1938	   This is a special case of the general problem that anything sent is
1939	   visible to the receiving party and possibly to third parties as well.
1940	   It is useful to be aware that such concerns can exist in some cases.

1942	   The evaluation of the exact magnitude of the threat, and any possible
1943	   countermeasures, is left to each application protocol (see BCP 72,
1944	   RFC  3552 [15] for best current practice guidance on security threats
1945	   and defenses).

1947	   Although the specification of valid subtags for an extension MUST be
1948	   available over the Internet, implementations SHOULD NOT mechanically
1949	   depend on it being always accessible, to prevent denial-of-service
1950	   attacks.

1952	7.  Character Set Considerations

1954	   The syntax in this document requires that language tags use only the
1955	   characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
1956	   character sets, so the composition of language tags should not have
1957	   any character set issues.

1959	   Rendering of characters based on the content of a language tag is not
1960	   addressed in this memo.  Historically, some languages have relied on
1961	   the use of specific character sets or other information in order to
1962	   infer how a specific character should be rendered (notably this
1963	   applies to language and culture specific variations of Han ideographs
1964	   as used in Japanese, Chinese, and Korean).  When language tags are
1965	   applied to spans of text, rendering engines may use that information
1966	   in deciding which font to use in the absence of other information,
1967	   particularly where languages with distinct writing traditions use the
1968	   same characters.

1970	8.  Changes from RFC 3066

1972	   The main goals for this revision of language tags were the following:

1974	   *Compatibility.* All valid RFC 3066 language tags  (including those
1975	   in the IANA registry)  remain valid in this specification.  Thus
1976	   there is complete backward compatibility of this specification with
1977	   existing content.  In addition, this document defines language tags
1978	   in such as way as to ensure future compatibility, and processors
1979	   based solely on the RFC 3066 ABNF (such as those described in XML
1980	   Schema version 1.0 [19]) will be able to process tags described by
1981	   this document.

1983	   *Stability.* Because of the changes in underlying ISO standards, a
1984	   valid RFC 3066 language tag may become invalid (or have its meaning
1985	   change) at a later date.  With so much of the world's computing
1986	   infrastructure dependent on language tags, this is simply
1987	   unacceptable: it invalidates content that may have an extensive
1988	   shelf-life.  In this specification, once a language tag is valid, it
1989	   remains valid forever.  Previously, there was no way to determine
1990	   when two tags were equivalent.  This specification provides a stable
1991	   mechanism for doing so, through the use of canonical forms.  These
1992	   are also stable, so that implementations can depend on the use of
1993	   canonical forms to assess equivalency.

1995	   *Validity.*  The structure of language tags defined by this document
1996	   makes it possible to determine if a particular tag is well-formed
1997	   without regard for the actual content or "meaning" of the tag as a
1998	   whole.  This is important because the registry and underlying
1999	   standards  change over time.  In addition, it must be possible to
2000	   determine if a tag is valid (or not) for a given point in time in
2001	   order  to provide reproducible, testable results.  This process must
2002	   not be error-prone; otherwise even intelligent people will generate
2003	   implementations that give different results.  This specification
2004	   provides for that by having a single data file, with specific
2005	   versioning information, so that the validity of language tags at any
2006	   point in time can be precisely determined (instead of interpolating
2007	   values from many separate sources).

2009	   *Extensibility.* It is important to be able to differentiate between
2010	   written forms of language -- for many implementations this is more
2011	   important than distinguishing between spoken variants of a language.
2012	   Languages are written in a wide variety of different scripts, so this
2013	   document provides for the generative use of ISO 15924 script codes.
2014	   Like the generative use of ISO language and country codes in RFC
2015	   3066, this allows combinations to be produced without resorting to
2016	   the registration process.  The addition of UN codes provides for the
2017	   generation of language tags with regional scope, which is also
2018	   required for information technology.

2020	   The recast of the registry from containing whole language tags to
2021	   subtags is a key part of this.  An important feature of RFC 3066 was
2022	   that it allowed generative use of subtags.  This allows people to
2023	   meaningfully use generated tags, without the delays in registering
2024	   whole tags, and the burden on the registry of having to supply all of
2025	   the combinations that people may find useful.

2027	   Because of the widespread use of language tags, it is potentially
2028	   disruptive to have periodic revisions of the core specification,
2029	   despite demonstrated need.  The extension mechanism provides for a
2030	   way for independent RFCs to define extensions to language tags.
2031	   These extensions have a very constrained, well-defined structure to
2032	   prevent extensions from interfering with implementations of language
2033	   tags defined in this document.  The document also anticipates
2034	   features of ISO 639-3 with the addition of the extended language
2035	   subtags, as well as the possibility of other ISO 639 parts becoming
2036	   useful for the formation of language tags in the future.  The use and
2037	   definition of private use tags has also been modified, to allow
2038	   people to move as much information as possible out of private use
2039	   tags, and into the regular structure.  The goal is to dramatically
2040	   reduce the need to produce a revision of this document in the future.

2042	   The specific changes in this document to meet these goals are:

2044	   o  Defines the ABNF and rules for subtags so that the category of all
2045	      subtags can be determined without reference to the registry.

2047	   o  Adds the concept of well-formed vs. validating processors,
2048	      defining the rules by which an implementation can claim to be one
2049	      or the other.

2051	   o  Replaces the IANA language tag registry with a language subtag
2052	      registry that provides a complete list of valid subtags in the
2053	      IANA registry.  This allows for robust implementation and ease of
2054	      maintenance.  The language subtag registry becomes the canonical
2055	      source for forming language tags.

2057	   o  Provides a process that guarantees stability of language tags, by
2058	      handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in
2059	      the event that they register a previously used value for a new
2060	      purpose.

2062	   o  Allows ISO 15924 script code subtags and allows them to be used
2063	      generatively.  Defines a method for indicating in the registry
2064	      when script subtags are necessary for a given language tag.

2066	   o  Adds the concept of a variant subtag and allows variants to be
2067	      used generatively.

2069	   o  Adds the ability to use a class of UN M.49 tags for  supra-
2070	      national regions and to resolve conflicts in the assignment of ISO
2071	      3166 codes.

2073	   o  Defines the private-use tags in ISO 639, ISO 15924, and ISO 3166
2074	      as the mechanism for creating private-use language, script, and
2075	      region subtags respectively.

2077	   o  Adds a well-defined extension mechanism.

2079	   o  Defines an extended language subtag, possibly for use with certain
2080	      anticipated features of ISO 639-3.

2082	   Ed Note: The following items are provided for the convenience of
2083	   reviewers and will be removed from the final document.

2085	   Changes between draft-ietf-ltru-registry-01 and this version are:

2087	   o  Minor updates to the changes section (the text just above) to
2088	      reflect various updates in the WG drafts (A.Phillips)

2090	   o  Minor change to the section on the extensions registry (because
2091	      there can be 35, not 25, entries maximum.  (D.Ewell)

2093	   o  Changed "SHOULD NOT permit a subtag to be divided" to MUST NOT.
2094	      (#944) (R.Presuhn)

2096	   o  Added text to Section 3.1 and Section 4.1 describing the rationale
2097	      for Suppress-Script.  Both sentences are slight rewordings of this
2098	      text suggested in the email thread: "This field helps ensure
2099	      greater compatibility between the language tags generated
2100	      according to the rules in this document and language tags and tag
2101	      processors or consumers based on RFC 3066." (#954) (F.Ellermann,
2102	      A.Phillips)

2104	   o  Added text about case folding during canonicalization.  This also
2105	      includes rules in Section 3.2 for casing of registry entries, as
2106	      well the insertion of the text permitting case normalization in
2107	      Section 4.3 and the warning about locale-specific casing
2108	      operations in the same section. (#985) (F.Ellermann, J.Cowan,
2109	      A.Phillips)

2111	   o  Fixed the reference to Canonical-Value in Section 4.3.
2112	      (A.Phillips)

2114	   o  In Section 3.4, changed the reference from the subtag 'nv' to the
2115	      tag "nv" to be consistent with the wording in Section 3.1. (part
2116	      of #954) (D.Ewell)

2118	   o  Added missing word 'that' in Section 3.6 (A.Phillips)

2120	9.  References

2122	9.1  Normative References

2124	   [1]   International Organization for Standardization, "ISO 639-
2125	         1:2002, Codes for the representation of names of languages --
2126	         Part 1: Alpha-2 code", ISO Standard 639, 2002.

2128	   [2]   International Organization for Standardization, "ISO 639-2:1998
2129	         - Codes for the representation of names of languages -- Part 2:
2130	         Alpha-3 code - edition 1", August 1988.

2132	   [3]   ISO TC46/WG3, "ISO 15924:2003 (E/F) - Codes for the
2133	         representation of names of scripts", January 2004.

2135	   [4]   International Organization for Standardization, "Codes for the
2136	         representation of names of countries, 3rd edition",
2137	         ISO Standard 3166, August 1988.

2139	   [5]   Statistical Division, United Nations, "Standard Country or Area
2140	         Codes for Statistical Use", UN Standard Country or Area Codes
2141	         for Statistical Use, Revision 4 (United Nations publication,
2142	         Sales No. 98.XVII.9, June 1999.

2144	   [6]   International Organization for Standardization, "ISO/IEC 10646-
2145	         1:2000. Information technology -- Universal Multiple-Octet
2146	         Coded Character Set (UCS) -- Part 1: Architecture and Basic
2147	         Multilingual Plane and ISO/IEC 10646-2:2001. Information
2148	         technology -- Universal Multiple-Octet Coded Character Set
2149	         (UCS) -- Part 2: Supplementary Planes, as, from time to time,
2150	         amended, replaced by a new edition or expanded by the addition
2151	         of new parts", 2000.

2153	   [7]   Crocker, D. and P. Overell, "Augmented BNF for Syntax
2154	         Specifications: ABNF", draft-crocker-abnf-rfc2234bis-00 (work
2155	         in progress), March 2005.

2157	   [8]   Bradner, S., "The Internet Standards Process -- Revision 3",
2158	         BCP 9, RFC 2026, October 1996.

2160	   [9]   Hovey, R. and S. Bradner, "The Organizations Involved in the
2161	         IETF Standards Process", BCP 11, RFC 2028, October 1996.

2163	   [10]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
2164	         Levels", BCP 14, RFC 2119, March 1997.

2166	   [11]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
2167	         Considerations Section in RFCs", BCP 26, RFC 2434,
2168	         October 1998.

2170	   [12]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
2171	         RFC 2781, February 2000.

2173	   [13]  Carpenter, B., Baker, F., and M. Roberts, "Memorandum of
2174	         Understanding Concerning the Technical Work of the Internet
2175	         Assigned Numbers Authority", RFC 2860, June 2000.

2177	   [14]  Klyne, G. and C. Newman, "Date and Time on the Internet:
2178	         Timestamps", RFC 3339, July 2002.

2180	   [15]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on
2181	         Security Considerations", BCP 72, RFC 3552, July 2003.

2183	9.2  Informative References

2185	   [16]  ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory
2186	         Committee:  Working principles for ISO 639 maintenance",
2187	         March 2000,
2188	         <http://www.loc.gov/standards/iso639-2/iso639jac_n3r.html>.

2190	   [17]  Raymond, E., "The Art of Unix Programming", 2003.

2192	   [18]  Bray (et al), T., "Extensible Markup Language (XML) 1.0",
2193	         02 2004.

2195	   [19]  Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part 2:
2196	         Datatypes Second Edition", 10 2004, <
2197	         http://www.w3.org/TR/xmlschema-2/>.

2199	   [20]  Unicode Consortium, "The Unicode Consortium. The Unicode
2200	         Standard, Version 4.1.0, defined by: The Unicode Standard,
2201	         Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-
2202	         18578-1), as amended by Unicode 4.0.1
2203	         (http://www.unicode.org/versions/Unicode4.0.1) and by Unicode
2204	         4.1.0 (http://www.unicode.org/versions/Unicode4.1.0).",
2205	         March 2005.

2207	   [21]  Alvestrand, H., "Tags for the Identification of Languages",
2208	         RFC 1766, March 1995.

2210	   [22]  Freed, N. and K. Moore, "MIME Parameter Value and Encoded Word
2211	         Extensions: Character Sets, Languages, and Continuations",
2212	         RFC 2231, November 1997.

2214	   [23]  Alvestrand, H., "Tags for the Identification of Languages",
2215	         BCP 47, RFC 3066, January 2001.

2217	Authors' Addresses

2219	   Addison Phillips (editor)
2220	   Quest Software

2222	   Email: addison.phillips@quest.com

2224	   Mark Davis (editor)
2225	   IBM

2227	   Email: mark.davis@us.ibm.com

2229	Appendix A.  Acknowledgements

2231	   Any list of contributors is bound to be incomplete; please regard the
2232	   following as only a selection from the group of people who have
2233	   contributed to make this document what it is today.

2235	   The contributors to RFC 3066 and RFC 1766, the precursors of this
2236	   document, made enormous contributions directly or indirectly to this
2237	   document and are generally responsible for the success of language
2238	   tags.

2240	   The following people (in alphabetical order) contributed to this
2241	   document or to RFCs 1766 and 3066:

2243	   Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet,
2244	   Nathaniel Borenstein, Eric Brunner, Sean M. Burke, M.T. Carrasco
2245	   Benitez, Jeremy Carroll, John Clews, Jim Conklin, Peter Constable,
2246	   John Cowan, Mark Crispin, Dave Crocker, Martin Duerst, Frank
2247	   Ellerman, Michael Everson, Doug Ewell, Ned Freed, Tim Goodwin, Dirk-
2248	   Willem van Gulik, Marion Gunn, Joel Halpren, Elliotte Rusty Harold,
2249	   Paul Hoffman, Scott Hollenbeck, Richard Ishida, Olle Jarnefors, Kent
2250	   Karlsson, John Klensin, Alain LaBonte, Eric Mader, Ira McDonald,
2251	   Keith Moore, Chris Newman, Masataka Ohta, Randy Presuhn, George
2252	   Rhoten, Markus Scherer, Keld Jorn Simonsen, Thierry Sourbier, Otto
2253	   Stolz, Tex Texin, Andrea Vine, Rhys Weatherley, Misha Wolf, Francois
2254	   Yergeau and many, many others.

2256	   Very special thanks must go to Harald Tveit Alvestrand, who
2257	   originated RFCs 1766 and 3066, and without whom this document would
2258	   not have been possible.  Special thanks must go to Michael Everson,
2259	   who has served as language tag reviewer for almost the complete
2260	   period since the publication of RFC 1766.  Special thanks to Doug
2261	   Ewell, for his production of the first complete subtag registry, and
2262	   his work in producing a test parser for verifying language tags.

2264	Appendix B.  Examples of Language Tags (Informative)

2266	   Simple language subtag:

2268	      de (German)

2270	      fr (French)

2272	      ja (Japanese)

2274	      i-enochian (example of a grandfathered tag)

2276	   Language subtag plus Script subtag:

2278	      zh-Hant (Chinese written using the Traditional Chinese script)

2280	      zh-Hans (Chinese written using the Simplified Chinese script)

2282	      sr-Cyrl (Serbian written using the  Cyrillic script)

2284	      sr-Latn (Serbian written using the Latin script)

2286	   Language-Script-Region:

2288	      zh-Hans-CN (Chinese written using the Simlified script as used in
2289	      mainland China)

2291	      sr-Latn-CS (Serbian written using the Latin script as used in
2292	      Serbia and Montenegro)

2294	   Language-Variant:

2296	      en-boont (Boontling dialect of English)

2298	      en-scouse (Scouse dialect of English)

2300	   Language-Region-Variant:

2302	      en-GB-scouse (Scouse dialect of English as used in the UK)

2304	   Language-Script-Region-Variant:

2306	      sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the
2307	      Latin script as used in Italy.  Note that this tag is not
2308	      recommended because subtag 'sl' has a Suppress-Script value of
2309	      'Latn')

2311	   Language-Region:

2313	      de-DE (German for Germany)

2315	      en-US (English as used in the United States)

2317	      es-419 (Spanish for Latin America and Caribbean region using the
2318	      UN region code)

2320	   Private-use subtags:

2322	      de-CH-x-phonebk

2324	      az-Arab-x-AZE-derbend

2326	   Extended language subtags (examples ONLY: extended languages must be
2327	   defined by revision or update to this document):

2329	      zh-min

2331	      zh-min-nan-Hant-CN

2333	   Private-use registry values:

2335	      x-whatever (private use using the singleton 'x')

2337	      qaa-Qaaa-QM-x-southern (all private tags)

2339	      de-Qaaa (German, with a private script)

2341	      sr-Latn-QM (Serbian, Latin-script, private region)

2343	      sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro)

2345	   Tags that use extensions (examples ONLY: extensions must be defined
2346	   by revision or update to this document or by RFC):

2348	      en-US-u-islamCal

2350	      zh-CN-a-myExt-x-private

2352	      en-a-myExt-b-another

2354	   Some Invalid Tags:

2356	      de-419-DE (two region tags)
2357	      a-DE (use of a single character subtag in primary position; note
2358	      that there are a few grandfathered tags that start with "i-" that
2359	      are valid)

2361	      ar-a-aaa-b-bbb-a-ccc (two extensions with same single letter
2362	      prefix)

2364	Appendix C.  Example Registry

2366	   Example Registry

2368	   File-Date: 2005-04-18
2369	   %%
2370	   Type: language
2371	   Subtag: aa
2372	   Description: Afar
2373	   Added: 2004-07-06
2374	   %%
2375	   Type: language
2376	   Subtag: ab
2377	   Description: Abkhazian
2378	   Added: 2004-07-06
2379	   %%
2380	   Type: language
2381	   Subtag: ae
2382	   Description: Avestan
2383	   Added: 2004-07-06
2384	   %%
2385	   Type: language
2386	   Subtag: ar
2387	   Description: Arabic
2388	   Added: 2004-07-06
2389	   Suppress-Script: Arab
2390	   Comment: Arabic text is usually written in Arabic script
2391	   %%
2392	   Type: language
2393	   Subtag: qaa..qtz
2394	   Description: PRIVATE USE
2395	   Added: 2004-08-01
2396	   Comment: Use private use codes in preference
2397	     to the x- singleton for primary language
2398	   Comment: This is an example of two comments.
2399	   %%
2400	   Type: script
2401	   Subtag: Arab
2402	   Description: Arabic
2403	   Added: 2004-07-06
2404	   %%
2405	   Type: script
2406	   Subtag: Armn
2407	   Description: Armenian
2408	   Added: 2004-07-06
2409	   %%
2410	   Type: script
2411	   Subtag: Bali
2412	   Description: Balinese
2413	   Added: 2004-07-06
2414	   %%
2415	   Type: script
2416	   Subtag: Batk
2417	   Description: Batak
2418	   Added: 2004-07-06
2419	   %%
2420	   Type: region
2421	   Subtag: AA
2422	   Description: PRIVATE USE
2423	   Added: 2004-08-01
2424	   %%
2425	   Type: region
2426	   Subtag: AD
2427	   Description: Andorra
2428	   Added: 2004-07-06
2429	   %%
2430	   Type: region
2431	   Subtag: AE
2432	   Description: United Arab Emirates
2433	   Added: 2004-07-06
2434	   %%
2435	   Type: region
2436	   Subtag: AX
2437	   Description: &#xC5;land Islands
2438	   Added: 2004-07-06
2439	   Comments: The description shows a Unicode escape
2440	     for the letter A-ring.
2441	   %%
2442	   Type: region
2443	   Subtag: 001
2444	   Description: World
2445	   Added: 2004-07-06
2446	   %%
2447	   Type: region
2448	   Subtag: 002
2449	   Description: Africa
2450	   Added: 2004-07-06
2451	   %%
2452	   Type: region
2453	   Subtag: 003
2454	   Description: North America
2455	   Added: 2004-07-06
2456	   %%
2457	   Type: variant
2458	   Subtag: 1901
2459	   Description: Traditional German
2460	      orthography
2461	   Added: 2004-09-09
2462	   Recommended-Prefix: de
2463	   Comment: <shows continuation>
2464	   %%
2465	   Type: variant
2466	   Subtag: 1996
2467	   Description: German orthography of 1996
2468	   Added: 2004-09-09
2469	   Recommended-Prefix: de
2470	   %%
2471	   Type: variant
2472	   Subtag: boont
2473	   Description: Boontling
2474	   Added: 2003-02-14
2475	   Recommended-Prefix: en
2476	   %%
2477	   Type: variant
2478	   Subtag: gaulish
2479	   Description: Gaulish
2480	   Added: 2001-05-25
2481	   Recommended-Prefix: cel
2482	   %%
2483	   Type: grandfathered
2484	   Tag: art-lojban
2485	   Description: Lojban
2486	   Added: 2001-11-11
2487	   Canonical: jbo
2488	   Deprecated: 2003-09-02
2489	   %%
2490	   Type: grandfathered
2491	   Tag: en-GB-oed
2492	   Description: English, Oxford English Dictionary spelling
2493	   Added: 2003-07-09
2494	   %%
2495	   Type: grandfathered
2496	   Tag: i-ami
2497	   Description: 'Amis
2498	   Added: 1999-05-25
2499	   %%
2500	   Type: grandfathered
2501	   Tag: i-bnn
2502	   Description: Bunun
2503	   Added: 1999-05-25
2504	   %%
2505	   Type: redundant
2506	   Tag: az-Arab
2507	   Description: Azerbaijani in Arabic script
2508	   Added: 2003-05-30
2509	   %%
2510	   Type: redundant
2511	   Tag: az-Cyrl
2512	   Description: Azerbaijani in Cyrillic script
2513	   Added: 2003-05-30
2514	   %%

2516	                 Figure 7: Example of the Registry Format

2518	Intellectual Property Statement

2520	   The IETF takes no position regarding the validity or scope of any
2521	   Intellectual Property Rights or other rights that might be claimed to
2522	   pertain to the implementation or use of the technology described in
2523	   this document or the extent to which any license under such rights
2524	   might or might not be available; nor does it represent that it has
2525	   made any independent effort to identify any such rights.  Information
2526	   on the procedures with respect to rights in RFC documents can be
2527	   found in BCP 78 and BCP 79.

2529	   Copies of IPR disclosures made to the IETF Secretariat and any
2530	   assurances of licenses to be made available, or the result of an
2531	   attempt made to obtain a general license or permission for the use of
2532	   such proprietary rights by implementers or users of this
2533	   specification can be obtained from the IETF on-line IPR repository at
2534	   http://www.ietf.org/ipr.

2536	   The IETF invites any interested party to bring to its attention any
2537	   copyrights, patents or patent applications, or other proprietary
2538	   rights that may cover technology that may be required to implement
2539	   this standard.  Please address the information to the IETF at
2540	   ietf-ipr@ietf.org.

2542	Disclaimer of Validity

2544	   This document and the information contained herein are provided on an
2545	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2546	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
2547	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
2548	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
2549	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2550	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2552	Copyright Statement

2554	   Copyright (C) The Internet Society (2005).  This document is subject
2555	   to the rights, licenses and restrictions contained in BCP 78, and
2556	   except as set forth therein, the authors retain all their rights.

2558	Acknowledgment

2560	   Funding for the RFC Editor function is currently provided by the
2561	   Internet Society.