idnits 2.17.1 

draft-ietf-ltru-registry-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 2558.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2535.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2542.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2548.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The abstract seems to indicate that this document obsoletes RFC3066, but
     the header doesn't have an 'Obsoletes:' line to match this.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 140 has weird spacing: '...  being  spoke...'

  == Line 772 has weird spacing: '...logical  line ...'

  == Line 773 has weird spacing: '...prising  a fie...'

  == Line 774 has weird spacing: '...ld-body  porti...'

  == Line 775 has weird spacing: '...   this  conce...'

  == (13 more instances...)

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The tags and their subtags, including private-use and extensions,
     are to be treated as case insensitive: there exist conventions for the
     capitalization of some of the subtags, but these MUST not be taken to
     carry meaning.

  == The expression 'MAY NOT', while looking like RFC 2119 requirements text,
     is not defined in RFC 2119, and should not be used.  Consider using 'MUST
     NOT' instead (if that is what you mean).
     
     Found 'MAY NOT' in this paragraph:
     
     Note that 'Preferred-Value' mappings in records of type 'region'
     MAY NOT represent exactly the same meaning as the original value.  There
     are many reasons for a country code to be changed and the effect this has
     on the formation of language tags will depend on the nature of the change
     in question.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 24, 2005) is 6881 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC1766' is defined on line 2366, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281)

  ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226)

  ** Downref: Normative reference to an Informational RFC: RFC 2781

  ** Downref: Normative reference to an Informational RFC: RFC 2860

  -- Obsolete informational reference (is this intentional?): RFC 1766
     (Obsoleted by RFC 3066, RFC 3282)

  -- Obsolete informational reference (is this intentional?): RFC 3066
     (Obsoleted by RFC 4646, RFC 4647)


     Summary: 7 errors (**), 0 flaws (~~), 12 warnings (==), 15 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                   A. Phillips, Ed.
3	Internet-Draft                                            Quest Software
4	Expires: December 26, 2005                                 M. Davis, Ed.
5	                                                                     IBM
6	                                                           June 24, 2005

8	                     Tags for Identifying Languages
9	                      draft-ietf-ltru-registry-07

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on December 26, 2005.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2005).

40	Abstract

42	   This document describes the structure, content, construction, and
43	   semantics of language tags for use in cases where it is desirable to
44	   indicate the language used in an information object.  It also
45	   describes how to register values for use in language tags and the
46	   creation of user defined extensions for private interchange.  This
47	   document obsoletes RFC 3066 (which replaced RFC 1766).

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	   2.  The Language Tag . . . . . . . . . . . . . . . . . . . . . . .  4
53	     2.1   Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .  4
54	     2.2   Language Subtag Sources and Interpretation . . . . . . . .  6
55	       2.2.1   Primary Language Subtag  . . . . . . . . . . . . . . .  7
56	       2.2.2   Extended Language Subtags  . . . . . . . . . . . . . .  9
57	       2.2.3   Script Subtag  . . . . . . . . . . . . . . . . . . . . 10
58	       2.2.4   Region Subtag  . . . . . . . . . . . . . . . . . . . . 11
59	       2.2.5   Variant Subtags  . . . . . . . . . . . . . . . . . . . 12
60	       2.2.6   Extension Subtags  . . . . . . . . . . . . . . . . . . 14
61	       2.2.7   Private Use Subtags  . . . . . . . . . . . . . . . . . 15
62	       2.2.8   Pre-Existing RFC 3066 Registrations  . . . . . . . . . 15
63	       2.2.9   Classes of Conformance . . . . . . . . . . . . . . . . 16
64	   3.  Registry Format and Maintenance  . . . . . . . . . . . . . . . 18
65	     3.1   Format of the IANA Language Subtag Registry  . . . . . . . 18
66	     3.2   Maintenance of the Registry  . . . . . . . . . . . . . . . 23
67	     3.3   Stability of IANA Registry Entries . . . . . . . . . . . . 25
68	     3.4   Registration Procedure for Subtags . . . . . . . . . . . . 28
69	     3.5   Possibilities for Registration . . . . . . . . . . . . . . 31
70	     3.6   Extensions and Extensions Namespace  . . . . . . . . . . . 33
71	     3.7   Initialization of the Registry . . . . . . . . . . . . . . 36
72	   4.  Formation and Processing of Language Tags  . . . . . . . . . . 37
73	     4.1   Choice of Language Tag . . . . . . . . . . . . . . . . . . 37
74	     4.2   Meaning of the Language Tag  . . . . . . . . . . . . . . . 39
75	     4.3   Length Considerations  . . . . . . . . . . . . . . . . . . 40
76	       4.3.1   Working with Limited Buffer Sizes  . . . . . . . . . . 40
77	       4.3.2   Truncation of Language Tags  . . . . . . . . . . . . . 42
78	     4.4   Canonicalization of Language Tags  . . . . . . . . . . . . 42
79	     4.5   Considerations for Private Use Subtags . . . . . . . . . . 44
80	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 46
81	     5.1   Language Subtag Registry . . . . . . . . . . . . . . . . . 46
82	     5.2   Extensions Registry  . . . . . . . . . . . . . . . . . . . 47
83	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 48
84	   7.  Character Set Considerations . . . . . . . . . . . . . . . . . 49
85	   8.  Changes from RFC 3066  . . . . . . . . . . . . . . . . . . . . 50
86	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 53
87	     9.1   Normative References . . . . . . . . . . . . . . . . . . . 53
88	     9.2   Informative References . . . . . . . . . . . . . . . . . . 54
89	       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 55
90	   A.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 56
91	   B.  Examples of Language Tags (Informative)  . . . . . . . . . . . 57
92	       Intellectual Property and Copyright Statements . . . . . . . . 60

94	1.  Introduction

96	   Human beings on our planet have, past and present, used a number of
97	   languages.  There are many reasons why one would want to identify the
98	   language used when presenting or requesting information.

100	   User's language preferences often need to be identified so that
101	   appropriate processing can be applied.  For example, the user's
102	   language preferences in a Web browser can be used to select Web pages
103	   appropriately.  Language preferences can also be used to select among
104	   tools (such as dictionaries) to assist in the processing or
105	   understanding of content in different languages.

107	   In addition, knowledge about the particular language used by some
108	   piece of information content might be useful or even required by some
109	   types of processing; for example spell-checking, computer-synthesized
110	   speech, Braille transcription, or high-quality print renderings.

112	   One means of indicating the language used is by labeling the
113	   information content with an identifier or "tag".  These tags can be
114	   used to specify user preferences when selecting information content,
115	   or for labeling additional attributes of content and associated
116	   resources.

118	   Tags can also be used to indicate additional language attributes of
119	   content.  For example, indicating specific information about the
120	   dialect, writing system, or orthography used in a document or
121	   resource may enable the user to obtain information in a form that
122	   they can understand, or important in processing or rendering the
123	   given content into an appropriate form or style.

125	   This document specifies a particular identifier mechanism (the
126	   language tag) and a registration function for values to be used to
127	   form tags.  It also defines a mechanism for private use values and
128	   future extension.

130	   This document replaces RFC 3066, which replaced RFC 1766.  For a list
131	   of changes in this document, see Section 8.

133	   The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
134	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
135	   document are to be interpreted as described in [RFC2119].

137	2.  The Language Tag

139	   The language tag always defines a language as used (which includes
140	   being  spoken, written, signed, or otherwise signaled) by human
141	   beings for communication of information to other human beings.
142	   Computer languages such as programming languages are explicitly
143	   excluded.

145	2.1  Syntax

147	   The language tag is composed of one or more parts or "subtags".  Each
148	   subtag consists of a sequence of alpha-numeric characters.  Subtags
149	   are distinguished and separated from one another by a hyphen ("-").
150	   A language tag consists of a "primary language" subtag and a
151	   (possibly empty) series of subsequent subtags, each of which refines
152	   or narrows the range of language identified by the overall tag.

154	   Each type of subtag is distinguished by length, position in the tag,
155	   and content: subtags can be recognized solely by these features.
156	   This makes it possible to construct a parser that can extract and
157	   assign some semantic information to the subtags, even if the specific
158	   subtag values are not recognized.  Thus a parser need not have an up-
159	   to-date copy (or any copy at all) of the subtag registry to perform
160	   most searching and matching operations.

162	   The syntax of the language tag in ABNF [RFC2234bis] is:

164	   Language-Tag = (lang
165	                   *3("-" extlang)
166	                   ["-" script]
167	                   ["-" region]
168	                   *("-" variant)
169	                   *("-" extension)
170	                   ["-" privateuse])
171	                   / privateuse         ; private-use tag
172	                   / grandfathered      ; grandfathered registrations

174	   lang            = 2*4ALPHA           ; shortest ISO 639 code
175	                   / registered-lang
176	   extlang         = 3ALPHA             ; reserved for future use
177	   script          = 4ALPHA             ; ISO 15924 code
178	   region          = 2ALPHA             ; ISO 3166 code
179	                   / 3DIGIT             ; UN country number
180	   variant         =  5*8alphanum       ; registered variants
181	                   / ( DIGIT 3alphanum )
182	   extension       = singleton 1*("-" (2*8alphanum))
183	   privateuse      = ("x"/"X") 1*("-" (1*8alphanum))
184	   singleton       = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT
185	                   ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9"
186	                   ; Single letters: x/X is reserved for private use
187	   registered-lang = 4*8ALPHA          ; registered language subtag
188	   grandfathered   = 1*3ALPHA 1*2("-" (2*8alphanum))
189	                                       ; grandfathered registration
190	                                       ; Note: i is the only singleton
191	                                       ; that starts a grandfathered tag
192	   alphanum        = (ALPHA / DIGIT)   ; letters and numbers

194	                        Figure 1: Language Tag ABNF

196	   The character "-" is HYPHEN-MINUS (ABNF: %x2D).  All subtags have a
197	   maximum length of eight characters.  Note that there is a subtlety in
198	   the ABNF for 'variant': variants starting with a digit MAY be four
199	   characters long, while those starting with a letter MUST be at least
200	   five characters long.

202	   Whitespace is not permitted in a language tag.  For examples of
203	   language tags, see Appendix B.

205	   Note that although [RFC2234bis] refers to octets, the language tags
206	   described in this document are sequences of characters from the US-
207	   ASCII repertoire.  Language tags MAY be used in documents and
208	   applications that use other encodings, so long as these encompass the
209	   US-ASCII repertoire.  An example of this would be an XML document
210	   that uses the UTF-16LE [RFC2781] encoding of [Unicode].

212	   The tags and their subtags, including private-use and extensions, are
213	   to be treated as case insensitive: there exist conventions for the
214	   capitalization of some of the subtags, but these MUST not be taken to
215	   carry meaning.

217	   For example:

219	   o  [ISO639-1] recommends that language codes be written in lower case
220	      ('mn' Mongolian).

222	   o  [ISO3166] recommends that country codes be capitalized ('MN'
223	      Mongolia).

225	   o  [ISO15924] recommends that script codes use lower case with the
226	      initial letter capitalized ('Cyrl' Cyrillic).

228	   However, in the tags defined by this document, the uppercase US-ASCII
229	   letters in the range 'A' through 'Z' are considered equivalent and
230	   mapped directly to their US-ASCII lowercase equivalents in the range
231	   'a' through 'z'.  Thus the tag "mn-Cyrl-MN" is not distinct from "MN-
232	   cYRL-mn" or "mN-cYrL-Mn" (or any other combination) and each of these
233	   variations conveys the same meaning: Mongolian written in the
234	   Cyrillic script as used in Mongolia.

236	2.2  Language Subtag Sources and Interpretation

238	   The namespace of language tags and their subtags is administered by
239	   the Internet Assigned Numbers Authority (IANA) [RFC2860] according to
240	   the rules in Section 5 of this document.  The registry maintained by
241	   IANA is the source for valid subtags: other standards referenced in
242	   this section provide the source material for that registry.

244	   Terminology in this section:

246	   o  Tag or tags refers to a complete language tag, such as
247	      "fr-Latn-CA".  Examples of tags in this document are enclosed in
248	      double-quotes ("en-US").

250	   o  Subtag refers to a specific section of a tag, delimited by hyphen,
251	      such as the subtag 'Latn' in "fr-Latn-CA".  Examples of subtags in
252	      this document are enclosed in single quotes ('Latn').

254	   o  Code or codes refers to values defined in external standards (and
255	      which are used as subtags in this document).  For example, 'Latn'
256	      is an [ISO15924] script code which was used to define the 'Latn'
257	      script subtag for use in a language tag.  Examples of codes in
258	      this document are enclosed in single quotes ('en', 'Latn').

260	   The definitions in this section apply to the various subtags within
261	   the language tags defined by this document, excepting those
262	   "grandfathered" tags defined in Section 2.2.8.

264	   Language tags are designed so that each subtag type has unique length
265	   and content restrictions.  These make identification of the subtag's
266	   type possible, even if the content of the subtag itself is
267	   unrecognized.  This allows tags to be parsed and processed without
268	   reference to the latest version of the underlying standards or the
269	   IANA registry and makes the associated exception handling when
270	   parsing tags simpler.

272	   Subtags in the IANA registry that do not come from an underlying
273	   standard can only appear in specific positions in a tag.
274	   Specifically, they can only occur as primary language subtags or as
275	   variant subtags.

277	   Note that sequences of private-use and extension subtags MUST occur
278	   at the end of the sequence of subtags and MUST NOT be interspersed
279	   with subtags defined elsewhere in this document.

281	   Single letter and digit subtags are reserved for current or future
282	   use.  These include the following current uses:

284	   o  The single letter subtag 'x' is reserved to introduce a sequence
285	      of private-use subtags.  The interpretation of any private-use
286	      subtags is defined solely by private agreement and is not defined
287	      by the rules in this section or in any standard or registry
288	      defined in this document.

290	   o  All other single letter subtags are reserved to introduce
291	      standardized extension subtag sequences as described in
292	      Section 3.6.

294	   The single letter subtag 'i' is used by some grandfathered tags, such
295	   as "i-enochian", where it always appears in the first position and
296	   cannot be confused with an extension.

298	2.2.1  Primary Language Subtag

300	   The primary language subtag is the first subtag in a language tag
301	   (with the exception of private-use and certain grandfathered tags)
302	   and cannot be omitted.  The following rules apply to the primary
303	   language subtag:

305	   1.  All two character language subtags were defined in the IANA
306	       registry according to the assignments found in the standard ISO
307	       639 Part 1, "ISO 639-1:2002, Codes for the representation of
308	       names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using
309	       assignments subsequently made by the ISO 639 Part 1 maintenance
310	       agency or governing standardization bodies.

312	   2.  All three character language subtags were defined in the IANA
313	       registry according to the assignments found in ISO 639 Part 2,
314	       "ISO 639-2:1998 - Codes for the representation of names of
315	       languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2], or
316	       assignments subsequently made by the ISO 639 Part 2 maintenance
317	       agency or governing standardization bodies.

319	   3.  The subtags in the range 'qaa' through 'qtz' are reserved for
320	       private use in language tags.  These subtags correspond to codes
321	       reserved by ISO 639-2 for private use.  These codes MAY be used
322	       for non-registered primary-language subtags (instead of using
323	       private-use subtags following 'x-').  Please refer to Section 4.5
324	       for more information on private use subtags.

326	   4.  All four character language subtags are reserved for possible
327	       future standardization.

329	   5.  All language subtags of 5 to 8 characters in length in the IANA
330	       registry were defined via the registration process in Section 3.4
331	       and MAY be used to form the primary language subtag.  At the time
332	       this document was created, there were no examples of this kind of
333	       subtag and future registrations of this type will be discouraged:
334	       primary languages are strongly RECOMMENDED for registration with
335	       ISO 639 and proposals rejected by ISO 639/RA will be closely
336	       scrutinized before they are registered with IANA.

338	   6.  The single character subtag 'x' as the primary subtag indicates
339	       that the language tag consists solely of subtags whose meaning is
340	       defined by private agreement.  For example, in the tag "x-fr-CH",
341	       the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the
342	       French language or the country of Switzerland (or any other value
343	       in the IANA registry) unless there is a private agreement in
344	       place to do so.  See Section 4.5.

346	   7.  The single character subtag 'i' is used by some grandfathered
347	       tags (see Section 2.2.8) such as "i-klingon" and "i-bnn".  (Other
348	       grandfathered tags have a primary language subtag in their first
349	       position)

351	   8.  Other values MUST NOT be assigned to the primary subtag except by
352	       revision or update of this document.

354	   Note: For languages that have both an ISO 639-1 two character code
355	   and an ISO 639-2 three character code, only the ISO 639-1 two
356	   character code is defined in the IANA registry.

358	   Note: For languages that have no ISO 639-1 two character code and for
359	   which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
360	   (Bibliographic) codes differ, only the Terminology code is defined in
361	   the IANA registry.  At the time this document was created, all
362	   languages that had both kinds of three character code were also
363	   assigned a two character code; it is not expected that future
364	   assignments of this nature will occur.

366	   Note: To avoid problems with versioning and subtag choice as
367	   experienced during the transition between RFC 1766 and RFC 3066, as
368	   well as the canonical nature of subtags defined by this document, the
369	   ISO 639 Registration Authority Joint Advisory Committee (ISO 639/
370	   RA-JAC) has included the following statement in [iso639.principles]:

372	   "A language code already in ISO 639-2 at the point of freezing ISO
373	   639-1 shall not later be added to ISO 639-1.  This is to ensure
374	   consistency in usage over time, since users are directed in Internet
375	   applications to employ the alpha-3 code when an alpha-2 code for that
376	   language is not available."

378	   In order to avoid instability of the canonical form of tags, if a two
379	   character code is added to ISO 639-1 for a language for which a three
380	   character code was already included in ISO 639-2, the two character
381	   code will not be added as a subtag in the registry.  See Section 3.3.

383	   For example, if some content were tagged with 'haw' (Hawaiian), which
384	   currently has no two character code, the tag would not be invalidated
385	   if ISO 639-1 were to assign a two character code to the Hawaiian
386	   language at a later date.

388	   For example, one of the grandfathered IANA registrations is
389	   "i-enochian".  The subtag 'enochian' could be registered in the IANA
390	   registry as a primary language subtag (assuming that ISO 639 does not
391	   register this language first), making tags such as "enochian-AQ" and
392	   "enochian-Latn" valid.

394	2.2.2  Extended Language Subtags

396	   The following rules apply to the extended language subtags:

398	   1.  Three letter subtags immediately following the primary subtag are
399	       reserved for future standardization, anticipating work that is
400	       currently under way on ISO 639.

402	   2.  Extended language subtags MUST follow the primary subtag and
403	       precede any other subtags.

405	   3.  There MAY be up to three extended language subtags.

407	   4.  Extended language subtags MUST NOT be registered or used to form
408	       language tags.  Their syntax is described here so that
409	       implementations can be compatible with any future revision of
410	       this document which does provide for their registration.

412	   Extended language subtag records, once they appear in the registry,
413	   MUST include exactly one 'Prefix' field indicating an appropriate
414	   language subtag or sequence of subtags that MUST always appear as a
415	   prefix to the extended language subtag.

417	   Example: In a future revision or update of this document, the tag
418	   "zh-gan" (registered under RFC 3066) might become a valid non-
419	   grandfathered (that is, redundant) tag in which the subtag 'gan'
420	   might represent the Chinese dialect 'Gan'.

422	2.2.3  Script Subtag

424	   Script subtags are used to indicate the script or writing system
425	   variations that distinguish the written forms of a language or its
426	   dialects.  The following rules apply to the script subtags:

428	   1.  All four character subtags were defined according to
429	       [ISO15924]--"Codes for the representation of the names of
430	       scripts": alpha-4 script codes, or subsequently assigned by the
431	       ISO 15924 maintenance agency or governing standardization bodies,
432	       denoting the script or writing system used in conjunction with
433	       this language.

435	   2.  Script subtags MUST immediately follow the primary language
436	       subtag and all extended language subtags and MUST occur before
437	       any other type of subtag described below.

439	   3.  The script subtags 'Qaaa' through 'Qabx' are reserved for private
440	       use in language tags.  These subtags correspond to codes reserved
441	       by ISO 15924 for private use.  These codes MAY be used for non-
442	       registered script values.  Please refer to Section 4.5 for more
443	       information on private-use subtags.

445	   4.  Script subtags cannot be registered using the process in
446	       Section 3.4 of this document.  Variant subtags MAY be considered
447	       for registration for that purpose.

449	   5.  There MUST be at most one script subtag in a language tag and the
450	       script subtag SHOULD be omitted when it adds no distinguishing
451	       value to the tag or when the primary language subtag's record
452	       includes a Supress-Script field listing the applicable script
453	       subtag.

455	   Example: "sr-Latn" represents Serbian written using the Latin script.

457	2.2.4  Region Subtag

459	   Region subtags are used to indicate linguistic variations associated
460	   with or appropriate to a specific country, territory, or region.
461	   Typically, a region subtag is used to indicate regional dialects or
462	   usage, or region-specific spelling conventions.  A region subtag can
463	   also be used to indicate that content is expressed in a way that is
464	   appropriate for use throughout a region; for instance, Spanish
465	   content tailored to be useful throughout Latin America.

467	   The following rules apply to the region subtags:

469	   1.  Region subtags MUST follow any language, extended language, or
470	       script subtags and MUST precede all other subtags.

472	   2.  All two character subtags following the primary subtag were
473	       defined in the IANA registry according to the assignments found
474	       in [ISO3166]--"Codes for the representation of names of countries
475	       and their subdivisions - Part 1: Country codes"--alpha-2 country
476	       codes or assignments subsequently made by the ISO 3166
477	       maintenance agency or governing standardization bodies.

479	   3.  All three character subtags consisting of digit (numeric)
480	       characters following the primary subtag were defined in the IANA
481	       registry according to the assignments found in UN Standard
482	       Country or Area Codes for Statistical  Use [UN_M.49] or
483	       assignments subsequently made by the governing standards body.
484	       Note that not all of the UN M.49 codes are defined in the IANA
485	       registry.  The following rules define which codes are entered
486	       into the registry as valid subtags:

488	       A.  UN numeric codes assigned to 'macro-geographical
489	           (continental)' or sub-regions MUST be registered in the
490	           registry.  These codes are not associated with an assigned
491	           ISO 3166 alpha-2 code and represent supra-national areas,
492	           usually covering more than one nation, state, province, or
493	           territory.

495	       B.  UN numeric codes for 'economic groupings' or 'other
496	           groupings' MUST NOT be registered in the IANA registry and
497	           MUST NOT be used to form language tags.

499	       C.  UN numeric codes for countries or areas with ambiguous ISO
500	           3166 alpha-2 codes, when entered into the registry, MUST be
501	           defined according to the rules in Section 3.3 and MUST be
502	           used to form language tags that represent the country or
503	           region for which they are defined.

505	       D.  UN numeric codes for countries or areas for which there is an
506	           associated ISO 3166 alpha-2 code in the registry MUST NOT be
507	           entered into the registry and MUST NOT be used to form
508	           language tags.  Note that the ISO 3166-based subtag in the
509	           registry MUST actually be associated with the UN M.49 code in
510	           question.

512	       E.  All other UN numeric codes for countries or areas which do
513	           not have an associated ISO 3166 alpha-2 code MUST NOT be
514	           entered into the registry and MUST NOT be used to form
515	           language tags.  For more information about these codes, see
516	           Section 3.3.

518	   4.  Note: The alphanumeric codes in Appendix X of the UN document
519	       MUST NOT be entered into the registry and MUST NOT be used to
520	       form language tags.  (At the time this document was created these
521	       values match the ISO 3166 alpha-2 codes.)

523	   5.  There MUST be at most one region subtag in a language tag and the
524	       region subtag MAY be omitted, as when it adds no distinguishing
525	       value to the tag.

527	   6.  The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
528	       reserved for private use in language tags.  These subtags
529	       correspond to codes reserved by ISO 3166 for private use.  These
530	       codes MAY be used for private use region subtags (instead of
531	       using a private-use subtag sequence).  Please refer to
532	       Section 4.5 for more information on private use subtags.

534	   "de-CH" represents German ('de') as used in Switzerland ('CH').

536	   "sr-Latn-CS" represents Serbian ('sr') written using Latin script
537	   ('Latn') as used in Serbia and Montenegro ('CS').

539	   "es-419" represents Spanish ('es') appropriate to the UN-defined
540	   Latin America and Caribbean region ('419').

542	2.2.5  Variant Subtags

544	   Variant subtags are used to indicate additional, well-recognized
545	   variations that define a language or its dialects which are not
546	   covered by other available subtags.  The following rules apply to the
547	   variant subtags:

549	   1.  Variant subtags are not associated with any external standard.
550	       Variant subtags and their meanings are defined by the
551	       registration process defined in Section 3.4.

553	   2.  Variant subtags MUST follow all of the other defined subtags, but
554	       precede any extension or private-use subtag sequences.

556	   3.  More than one variant MAY be used to form the language tag.

558	   4.  Variant subtags MUST be registered with IANA according to the
559	       rules in Section 3.4 of this document before being used to form
560	       language tags.  In order to distinguish variants from other types
561	       of subtags, registrations MUST meet the following length and
562	       content restrictions:

564	       1.  Variant subtags that begin with a letter (a-z, A-Z) MUST be
565	           at least five characters long.

567	       2.  Variant subtags that begin with a digit (0-9) MUST be at
568	           least four characters long.

570	   Variant subtag records in the language subtag registry MAY include
571	   one or more 'Prefix' fields, which indicates the language tag or tags
572	   that would make a suitable prefix (with other subtags, as
573	   appropriate) in forming a language tag with the variant.  For
574	   example, the subtag 'nedis' has a Prefix of "sl", making it suitable
575	   to form language tags such as "sl-nedis" and "sl-IT-nedis", but not
576	   suitable for use in a tag such as "zh-nedis" or "it-IT-nedis".

578	   "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian.

580	   "de-CH-1996" represents German as used in Switzerland and as written
581	   using the spelling reform beginning in the year 1996 C.E.

583	   Most variants that share a prefix are mutually exclusive.  For
584	   example, the German orthographic variations '1996' and '1901' SHOULD
585	   NOT be used in the same tag, as they represent the dates of different
586	   spelling reforms.  A variant that can meaningfully be used in
587	   combination with another variant SHOULD include a 'Prefix' field in
588	   its registry record that lists that other variant.  For example, if
589	   another German variant 'example' were created that made sense to use
590	   with '1996', then 'example' should include two Prefix fields: "de"
591	   and "de-1996".

593	2.2.6  Extension Subtags

595	   Extensions provide a mechanism for extending language tags for use in
596	   various applications.  See: Section 3.6.  The following rules apply
597	   to extensions:

599	   1.   Extension subtags are separated from the other subtags defined
600	        in this document by a single-letter subtag ("singleton").  The
601	        singleton MUST be one allocated to a registration authority via
602	        the mechanism described in Section 3.6 and cannot be the letter
603	        'x', which is reserved for private-use subtag sequences.

605	   2.   Note: Private-use subtag sequences starting with the singleton
606	        subtag 'x' are described below.

608	   3.   An extension MUST follow at least a primary language subtag.
609	        That is, a language tag cannot begin with an extension.
610	        Extensions extend language tags, they do not override or replace
611	        them.  For example, "a-value" is not a well-formed language tag,
612	        while "de-a-value" is.

614	   4.   Each singleton subtag MUST appear at most one time in each tag
615	        (other than as a private-use subtag).  That is, singleton
616	        subtags MUST NOT be repeated.  For example, the tag "en-a-bbb-a-
617	        ccc" is invalid because the subtag 'a' appears twice.  Note that
618	        the tag "en-a-bbb-x-a-ccc" is valid because the second
619	        appearance of the singleton 'a' is in a private use sequence.

621	   5.   Extension subtags MUST meet all of the requirements for the
622	        content and format of subtags defined in this document.

624	   6.   Extension subtags MUST meet whatever requirements are set by the
625	        document that defines their singleton prefix and whatever
626	        requirements are provided by the maintaining authority.

628	   7.   Each extension subtag MUST be from two to eight characters long
629	        and consist solely of letters or digits, with each subtag
630	        separated by a single '-'.

632	   8.   Each singleton MUST be followed by at least one extension
633	        subtag.  For example, the tag "tlh-a-b-foo" is invalid because
634	        the first singleton 'a' is followed immediately by another
635	        singleton 'b'.

637	   9.   Extension subtags MUST follow all language, extended language,
638	        script, region and variant subtags in a tag.

640	   10.  All subtags following the singleton and before another singleton
641	        are part of the extension.  Example: In the tag "fr-a-Latn", the
642	        subtag 'Latn' does not represent the script subtag 'Latn'
643	        defined in the IANA Language Subtag Registry.  Its meaning is
644	        defined by the extension 'a'.

646	   11.  In the event that more than one extension appears in a single
647	        tag, the tag SHOULD be canonicalized as described in
648	        Section 4.4.

650	   For example, if the prefix singleton 'r' and the shown subtags were
651	   defined, then the following tag would be a valid example: "en-Latn-
652	   GB-boont-r-extended-sequence-x-private"

654	2.2.7  Private Use Subtags

656	   Private use subtags are used to indicate distinctions in language
657	   important in a given context by private agreement.  The following
658	   rules apply to private-use subtags:

660	   1.  Private-use subtags are separated from the other subtags defined
661	       in this document by the reserved single-character subtag 'x'.

663	   2.  Private-use subtags MUST follow all language, extended language,
664	       script, region, variant, and extension subtags in the tag.
665	       Another way of saying this is that all subtags following the
666	       singleton 'x' MUST be considered private use.  Example: The
667	       subtag 'US' in the tag "en-x-US" is a private use subtag.

669	   3.  A tag MAY consist entirely of private-use subtags.

671	   4.  No source is defined for private use subtags.  Use of private use
672	       subtags is by private agreement only.

674	   For example: Users who wished to utilize SIL Ethnologue for
675	   identification might agree to exchange tags such as "az-Arab-x-AZE-
676	   derbend".  This example contains two private-use subtags.  The first
677	   is 'AZE' and the second is 'derbend'.

679	2.2.8  Pre-Existing RFC 3066 Registrations

681	   Existing IANA-registered language tags from RFC 1766 and/or RFC 3066
682	   maintain their validity.  IANA will maintain these tags in the
683	   registry under either the "grandfathered" or "redundant" type.  For
684	   more information see Section 3.7.

686	   It is important to note that all language tags formed under the
687	   guidelines in this document were either legal, well-formed tags or
688	   could have been registered under RFC 3066.

690	2.2.9  Classes of Conformance

692	   Implementations sometimes need to describe their capabilities with
693	   regard to the rules and practices described in this document.  There
694	   are two classes of conforming implementations described by this
695	   document: "well-formed" processors and "validating" processors.
696	   Claims of conformance SHOULD explicitly reference one of these
697	   definitions.

699	   An implementation that claims to check for well-formed language tags
700	   MUST:

702	   o  Check that the tag and all of its subtags, including extension and
703	      private-use subtags, conform to the ABNF or that the tag is on the
704	      list of grandfathered tags.

706	   o  Check that singleton subtags that identify extensions do not
707	      repeat.  For example, the tag "en-a-xx-b-yy-a-zz" is not well-
708	      formed.

710	   Well-formed processors are strongly encouraged to implement the
711	   canonicalization rules contained in Section 4.4.

713	   An implementation that claims to be validating MUST:

715	   o  Check that the tag is well-formed.

717	   o  Specify the particular registry date for which the implementation
718	      performs validation of subtags.

720	   o  Check that either the tag is a grandfathered tag, or that all
721	      language, script, region, and variant subtags consist of valid
722	      codes for use in language tags according to the IANA registry as
723	      of the particular date specified by the implementation.

725	   o  Specify which, if any, extension RFCs as defined in Section 3.6
726	      are supported, including version, revision, and date.

728	   o  For any such extensions supported, check that all subtags used in
729	      that extension are valid.

731	   o  For variant and extended language subtags, if the registry
732	      contains one or more 'Prefix' fields for that subtag, check that
733	      the tag matches at least one prefix.  The tag matches if all the
734	      subtags in the 'Prefix' also appear in the tag.  For example, the
735	      prefix "es-CO" matches the tag "es-Latn-CO-x-private" because both
736	      the 'es' language subtag and 'CO' region subtag appear in the tag.

738	3.  Registry Format and Maintenance

740	   This section defines the Language Subtag Registry and the maintenance
741	   and update procedures associated with it.

743	   The language subtag registry will be maintained so that, except for
744	   extension subtags, it is possible to validate all of the subtags that
745	   appear in a language tag under the provisions of this document or its
746	   revisions or successors.  In addition, the meaning of the various
747	   subtags will be unambiguous and stable over time.  (The meaning of
748	   private-use subtags, of course, is not defined by the IANA registry.)

750	   The registry defined under this document contains a comprehensive
751	   list of all of the subtags valid in language tags.  This allows
752	   implementers a straightforward and reliable way to validate language
753	   tags.

755	3.1  Format of the IANA Language Subtag Registry

757	   The IANA Language Subtag Registry ("the registry") will consist of a
758	   text file that is machine readable in the format described in this
759	   section, plus copies of the registration forms approved by the
760	   Language Subtag Reviewer in accordance with the process described in
761	   Section 3.4.  With the exception of the registration forms for
762	   grandfathered and redundant tags, no registration records will be
763	   maintained for the initial set of subtags.

765	   The registry will be in a modified record-jar format text file
766	   [record-jar].  Lines are limited to 72 characters, including all
767	   whitespace.

769	   Records are separated by lines containing only the sequence "%%"
770	   (%x25.25).

772	   Each field can be viewed as a single, logical  line  of ASCII
773	   characters,  comprising  a field-name and a field-body separated by a
774	   COLON character (%x3A).  For convenience, the field-body  portion  of
775	   this  conceptual entity  can be split into a multiple-line
776	   representation; this is called "folding".  The format of the registry
777	   is described by the following ABNF (per [RFC2234bis]):

779	   registry   = record *("%%" CRLF record)
780	   record     = 1*( field-name *SP ":" *SP field-body CRLF )
781	   field-name = *(ALPHA / DIGIT / "-")
782	   field-body = *(ASCCHAR/LWSP)
783	   ASCCHAR    = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26
784	   UNICHAR    = "&#x" 2*6HEXDIG ";"
785	   The sequence '..' (%x2E.2E) in a field-body denotes a range of
786	   values.  Such a range represents all subtags of the same length that
787	   are alphabetically within that range, including the values explicitly
788	   mentioned.  For example 'a..c' denotes the values 'a', 'b', and 'c'.

790	   Characters from outside the US-ASCII repertoire, as well as the
791	   AMPERSAND character ("&", %x26) when it occurs in a field-body are
792	   represented by a "Numeric Character Reference" using hexadecimal
793	   notation in the style used by [XML10] (see
794	   <http://www.w3.org/TR/REC-xml/#dt-charref>).  This consists of the
795	   sequence "&#x" (%x26.23.78) followed by a hexadecimal representation
796	   of the character's code point in [ISO10646] followed by a closing
797	   semicolon (%x3B).  For example, the EURO SIGN, U+20AC, would be
798	   represented by the sequence "&#x20AC;".  Note that the hexadecimal
799	   notation MAY have between two and six digits.

801	   All fields whose field-body contains a date value use the "full-date"
802	   format specified in [RFC3339].  For example: "2004-06-28" represents
803	   June 28, 2004 in the Gregorian calendar.

805	   The first record in the file contains the single field whose field-
806	   name is "File-Date".  The field-body of this record contains the last
807	   modification date of this copy of the registry, making it possible to
808	   compare different versions of the registry.  The registry on the IANA
809	   website is the most current.  Versions with an older date than that
810	   one are not up-to-date.

812	   File-Date: 2004-06-28
813	   %%

815	   Subsequent records represent subtags in the registry.  Each of the
816	   fields in each record MUST occur no more than once, unless otherwise
817	   noted below.  Each record MUST contain the following fields:

819	   o  'Type'

821	      *  Type's field-value MUST consist of one of the following
822	         strings: "language", "extlang", "script", "region", "variant",
823	         "grandfathered", and "redundant" and denotes the type of tag or
824	         subtag.

826	   o  Either 'Subtag' or 'Tag'

828	      *  Subtag's field-value contains the subtag being defined.  This
829	         field MUST only appear in records of whose Type has one of
830	         these values: "language", "extlang", "script", "region", or
831	         "variant".

833	      *  Tag's field-value contains a complete language tag.  This field
834	         MUST only appear in records whose Type has one of these values:
835	         "grandfathered" or "redundant".

837	   o  Description

839	      *  Description's field-value contains a non-normative description
840	         of the subtag or tag.

842	   o  Added

844	      *  Added's field-value contains the date the record was added to
845	         the registry.

847	   The 'Subtag' or 'Tag' field MUST use lowercase letters to form the
848	   subtag or tag, with two exceptions.  Subtags whose 'Type' field is
849	   'script' (in other words, subtags defined by ISO 15924) MUST use
850	   titlecase.  Subtags whose 'Type' field is 'region' (in other words,
851	   subtags defined by ISO 3166) MUST use uppercase.  These exceptions
852	   mirror the use of case in the underlying standards.

854	   The field 'Description' MAY appear more than one time.  At least one
855	   of the  'Description' fields MUST contain a description of the tag
856	   being registered written or transcribed into the Latin script; the
857	   same or additional fields MAY also include a description in a non-
858	   Latin script.  The 'Description' field is used for identification
859	   purposes and SHOULD NOT be taken to represent the actual native name
860	   of the language or variation or to be in any particular language.
861	   Most descriptions are taken directly from source standards such as
862	   ISO 639 or ISO 3166.

864	   Note: Descriptions in registry entries that correspond to ISO 639,
865	   ISO 15924,  ISO 3166 or UN M.49 codes are intended only to indicate
866	   the meaning of that identifier as defined in the source standard at
867	   the time it was added to the registry.  The description does not
868	   replace the content of the source standard itself.  The descriptions
869	   are not intended to be the English localized names for the subtags.
870	   Localization or translation of language tag and subtag descriptions
871	   is out of scope of this document.

873	   Each record MAY also contain the following fields:

875	   o  Preferred-Value

877	      *  For fields of type 'language', 'extlang', 'script', 'region',
878	         and 'variant', 'Preferred-Value' contains a subtag of the same
879	         'Type' which is preferred for forming the language tag.

881	      *  For fields of type 'grandfathered' and 'redundant', a canonical
882	         mapping to a complete language tag.

884	   o  Deprecated

886	      *  Deprecated's field-value contains the date the record was
887	         deprecated.

889	   o  Prefix

891	      *  Prefix's field-value contains a language tag with which this
892	         subtag MAY be used to form a new language tag, perhaps with
893	         other subtags as well.  This field MUST only appear in records
894	         whose 'Type' field-value is 'variant' or 'extlang'.  For
895	         example, the 'Prefix' for the variant 'nedis' is 'sl', meaning
896	         that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate
897	         while the tag "is-nedis" is not.

899	   o  Comments

901	      *  Comments contains additional information about the subtag, as
902	         deemed appropriate for understanding the registry and
903	         implementing language tags using the subtag or tag.

905	   o  Suppress-Script

907	      *  Suppress-Script contains a script subtag that SHOULD NOT be
908	         used to form language tags with the associated primary language
909	         subtag.  This field MUST only appear in records whose 'Type'
910	         field-value is 'language'.  See Section 4.1.

912	   The field 'Deprecated' MAY be added to any record via the maintenance
913	   process described in Section 3.2 or via the registration process
914	   described in Section 3.4.  Usually the addition of a 'Deprecated'
915	   field is due to the action of one of the standards bodies, such as
916	   ISO 3166, withdrawing a code.  In some historical cases it might not
917	   have been  possible to reconstruct the original deprecation date.
918	   For these cases, an approximate date appears in the registry.
919	   Although valid in language tags, subtags and tags with a 'Deprecated'
920	   field are deprecated and validating processors SHOULD NOT generate
921	   these subtags.  Note that a record that contains a 'Deprecated' field
922	   and no corresponding 'Preferred-Value' field has no replacement
923	   mapping.

925	   Thie field 'Preferred-Value' contains a mapping between the record in
926	   which it appears and a tag or subtag which SHOULD be preferred when
927	   selected language tags.  These values form three groups:

929	      ISO 639 language codes which were later withdrawn in favor of
930	      other codes.  These values are mostly a historical curiosity.

932	      ISO 3166 region codes which have been withdrawn in favor of a new
933	      code.  This sometimes happens when a country changes its name or
934	      administration in such a way that warrants a new region code.

936	      Tags grandfathered from RFC 3066.  In many cases these tags have
937	      become obsolete because the values they represent were later
938	      encoded by ISO 639.

940	   Records that contain a 'Preferred-Value' field MUST also have a
941	   'Deprecated' field.  This field contains a date of deprecation.  Thus
942	   a language tag processor can use the registry to construct the valid,
943	   non-deprecated set of subtags for a given date.  In addition, for any
944	   given tag, a processor can construct the set of valid language tags
945	   that correspond to that tag for all dates up to the date of the
946	   registry.  The ability to do these mappings MAY be beneficial to
947	   applications that are matching, selecting, for filtering content
948	   based on its language tags.

950	   Note that 'Preferred-Value' mappings in records of type 'region' MAY
951	   NOT represent exactly the same meaning as the original value.  There
952	   are many reasons for a country code to be changed and the effect this
953	   has on the formation of language tags will depend on the nature of
954	   the change in question.

956	   In particular, the 'Preferred-Value' field does not imply retagging
957	   content that uses the affected subtag.

959	   The field 'Preferred-Value' MUST NOT be modified once created in the
960	   registry.  The field MAY be added to records of type "grandfathered"
961	   and "region" according to the rules in Section 3.2.  Otherwise the
962	   field MUST NOT be added to any record already in the registry.

964	   The 'Preferred-Value' field in records of type "grandfathered" and
965	   "redundant" contains whole language tags that are strongly
966	   RECOMMENDED for use in place of the record's value.  In many cases
967	   the mappings were created by deprecation of the tags during the
968	   period before this document was adopted.  For example, the tag "no-
969	   nyn" was deprecated in favor of the ISO 639-1 defined language code
970	   'nn'.

972	   Records of type 'variant' MAY have more than one field of type
973	   'Prefix'.  Additional fields of this type MAY be added to a 'variant'
974	   record via the registration process.

976	   Records of type 'extlang' MUST have _exactly_ one 'Prefix' field.

978	   The field-value of the 'Prefix' field consists of a language tag
979	   whose subtags are appropriate to use with this subtag.  For example,
980	   the variant subtag '1996' has a Prefix field of "de".  This means
981	   that tags starting with the sequence "de-" are appropriate with this
982	   subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while
983	   the tag "fr-1996" is an inappropriate choice.

985	   The field of type 'Prefix' MUST NOT be removed from any record.  The
986	   field-value for this type of field MUST NOT be modified.

988	   The field 'Comments' MAY appear more than once per record.  This
989	   field MAY be inserted or changed via the registration process and no
990	   guarantee of stability is provided.  The content of this field is not
991	   restricted, except by the need to register the information, the
992	   suitability of the request, and by reasonable practical size
993	   limitations.  Long screeds about a particular subtag are frowned
994	   upon.

996	   The field 'Suppress-Script' MUST only appear in records whose 'Type'
997	   field-value is 'language'.  This field MAY appear at most one time in
998	   a record.  This field indicates a script used to write the
999	   overwhelming majority of documents for the given language and which
1000	   therefore adds no distinguishing information to a language tag.  It
1001	   helps ensure greater compatibility between the language tags
1002	   generated according to the rules in this document and language tags
1003	   and tag processors or consumers based on RFC 3066.  For example,
1004	   virtually all Icelandic documents are written in the Latin script,
1005	   making the subtag 'Latn' redundant in the tag "is-Latn".

1007	3.2  Maintenance of the Registry

1009	   Maintenance of the registry requires that as codes are assigned or
1010	   withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language
1011	   Subtag Reviewer will evaluate each change, determine whether it
1012	   conflicts with existing registry entries, and submit the information
1013	   to IANA for inclusion in the registry.  If an change takes place and
1014	   the Language Subtag Reviewer does not do this in a timely manner,
1015	   then any interested party MAY use the procedure in Section 3.4 to
1016	   register the appropriate update.

1018	   Note: The redundant and grandfathered entries together are the
1019	   complete list of tags registered under [RFC3066].  The redundant tags
1020	   are those that can now be formed using the subtags defined in the
1021	   registry together with the rules of  Section 2.2.  The grandfathered
1022	   entries are those that can never be legal under those same
1023	   provisions.

1025	   The set of redundant and grandfathered tags is permanent and stable:

1027	   no new entries will be added and none of the entries will be removed.
1028	   Records of type 'grandfathered' MAY have their type converted to
1029	   'redundant': see  Section 3.7 for more information.

1031	   RFC 3066 tags that were deprecated prior to the adoption of this
1032	   document are part of the list of grandfathered tags and their
1033	   component subtags were not included as registered variants (although
1034	   they remain eligible for registration).  For example, the tag "art-
1035	   lojban" was deprecated in favor of the language subtag 'jbo'.

1037	   The Language Subtag Reviewer MUST ensure that new subtags meet the
1038	   requirements in Section 4.1 or submit an appropriate alternate subtag
1039	   as described in that section.  When either a change or addition to
1040	   the registry is needed, the Language Subtag Reviewer MUST prepare the
1041	   complete record, including all fields, and forward it to IANA for
1042	   insertion into the registry.

1044	   If record represents a new subtag that does not currently exist in
1045	   the registry, then the message's subject line MUST include the word
1046	   "INSERT".  If the record represents a change to an existing subtag,
1047	   then the subject line of the message MUST include the word "MODIFY".
1048	   The message MUST contain both the record for the subtag being
1049	   inserted or modified and the new File-Date record.  Here is an
1050	   example of what the body of the message might contain:

1052	   LANGUAGE SUBTAG MODIFICATION
1053	   File-Date: 2005-01-02
1054	   %%
1055	   Type: variant
1056	   Subtag: nedis
1057	   Description: Natisone dialect
1058	   Description: Nadiza dialect
1059	   Added: 2003-10-09
1060	   Prefix: sl
1061	   Comments: This is a comment shown
1062	     as an example.
1063	   %%

1065	                                 Figure 4

1067	   Whenever an entry is created or modified in the registry, the 'File-
1068	   Date' record at the start of the registry is updated to reflect the
1069	   most recent modification date in the [RFC3339] "full-date" format.

1071	   Values in the 'Subtag' field MUST be lowercase except as provided for
1072	   in Section 3.1.

1074	3.3  Stability of IANA Registry Entries

1076	   The stability of entries and their meaning in the registry is
1077	   critical to the long term stability of language tags.  The rules in
1078	   this section guarantee that a specific language tag's meaning is
1079	   stable over time and will not change.

1081	   These rules specifically deal with how changes to codes (including
1082	   withdrawal and deprecation of codes) maintained by ISO 639, ISO
1083	   15924, ISO 3166, and UN M.49 are reflected in the IANA Language
1084	   Subtag Registry.  Assignments to the IANA Language Subtag Registry
1085	   MUST follow the following stability rules:

1087	   o  Values in the fields 'Type', 'Subtag', 'Tag', 'Added',
1088	      'Deprecated' and 'Preferred-Value' MUST NOT be changed and are
1089	      guaranteed to be stable over time.

1091	   o  Values in the 'Description' field MUST NOT be changed in a way
1092	      that would invalidate previously-existing tags.  They MAY be
1093	      broadened somewhat in scope, changed to add information, or
1094	      adapted to the most common modern usage.  For example, countries
1095	      occasionally change their official names: an historical example of
1096	      this would be "Upper Volta" changing to "Burkina Faso".

1098	   o  Values in the field 'Prefix' MAY be added to records of type
1099	      'variant' via the registration process.

1101	   o  Values in the field 'Prefix' MAY be modified, so long as the
1102	      modifications broaden the set of prefixes.  That is, a prefix MAY
1103	      be replaced by one of its own prefixes.  For example, the prefix
1104	      "en-US" could be replaced by "en", but not by the prefixes "en-
1105	      Latn", "fr", or "en-US-boont".  If one of those prefixes were
1106	      needed, a new Prefix SHOULD be registered.

1108	   o  Values in the field 'Prefix' MUST NOT be removed.

1110	   o  The field 'Comments' MAY be added, changed, modified, or removed
1111	      via the registration process or any of the processes or
1112	      considerations described in this section.

1114	   o  The field 'Suppress-Script' MAY be added or removed via the
1115	      registration process.

1117	   o  Codes assigned by ISO 639, ISO 15924, and ISO 3166 that do not
1118	      conflict with existing subtags of the associated type and whose
1119	      meaning is not the same as an existing subtag of the same type are
1120	      entered into the IANA registry as new records.

1122	   o  Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are
1123	      withdrawn by their respective maintenance or registration
1124	      authority remain valid in language tags.  A 'Deprecated' field
1125	      containing the date of withdrawal is added to the record.  If a
1126	      new record of the same type is added that represents a replacement
1127	      value, then a 'Preferred-Value' field MAY also be added.  The
1128	      registration process MAY be used to add comments about the
1129	      withdrawal of the code by the respective standard.

1131	      *  The region code 'TL' was assigned to the country 'Timor-Leste',
1132	         replacing the code 'TP' (which was assigned to 'East Timor'
1133	         when it was under administration by Portugal).  The subtag 'TP'
1134	         remains valid in language tags, but its record contains the a
1135	         'Preferred-Value' of 'TL' and its field 'Deprecated' contains
1136	         the date the new code was assigned ('2004-07-06').

1138	   o  Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict
1139	      with existing subtags of the associated type, including subtags
1140	      that are deprecated, MUST NOT be entered into the registry.  The
1141	      following additional considerations apply to subtag values that
1142	      are reassigned:

1144	      *  For ISO 639 codes, if the newly assigned code's meaning is not
1145	         represented by a subtag in the IANA registry, the Language
1146	         Subtag Reviewer, as described in Section 3.4, SHALL prepare a
1147	         proposal for entering in the IANA registry as soon as practical
1148	         a registered language subtag as an alternate value for the new
1149	         code.  The form of the registered language subtag will be at
1150	         the discretion of the Language Subtag Reviewer and MUST conform
1151	         to other restrictions on language subtags in this document.

1153	      *  For all subtags whose meaning is derived from an external
1154	         standard (i.e.  ISO 639, ISO 15924, ISO 3166, or UN M.49), if a
1155	         new meaning is assigned to an existing code and the new meaning
1156	         broadens the meaning of that code, then the meaning for the
1157	         associated subtag MAY be changed to match.  The meaning of a
1158	         subtag MUST NOT be narrowed, however, as this can result in an
1159	         unknown proportion of the existing uses of a subtag becoming
1160	         invalid.  Note: ISO 639 MA/RA has adopted a similar stability
1161	         policy.

1163	      *  For ISO 15924 codes, if the newly assigned code's meaning is
1164	         not represented by a subtag in the IANA registry, the Language
1165	         Subtag Reviewer, as described in Section 3.4, SHALL prepare a
1166	         proposal for entering in the IANA registry as soon as practical
1167	         a registered variant subtag as an alternate value for the new
1168	         code.  The form of the registered variant subtag will be at the
1169	         discretion of the Language Subtag Reviewer and MUST conform to
1170	         other restrictions on variant subtags in this document.

1172	      *  For ISO 3166 codes, if the newly assigned code's meaning is
1173	         associated with the same UN M.49 code as another 'region'
1174	         subtag, then the existing region subtag remains as the
1175	         preferred value for that region and no new entry is created.  A
1176	         comment MAY be added to the existing region subtag indicating
1177	         the relationship to the new ISO 3166 code.

1179	      *  For ISO 3166 codes, if the newly assigned code's meaning is
1180	         associated with a UN M.49 code that is not represented by an
1181	         existing region subtag, then the Language Subtag Reviewer, as
1182	         described in Section 3.4, SHALL prepare a proposal for entering
1183	         the appropriate UN M.49 country code as an entry in the IANA
1184	         registry.

1186	      *  Codes assigned by UN M.49 to countries or areas (as opposed to
1187	         geographical regions and sub-regions) for which there is no
1188	         corresponding ISO 3166 code MUST NOT be registered, except
1189	         under the previous provision.  If it is necessary to identify a
1190	         region for which only a UN M.49 code exists in language tags,
1191	         then the registration authority for ISO 3166 SHOULD be
1192	         petitioned to assign a code, which can then be registered for
1193	         use in language tags.  At the time this document was written,
1194	         there were only four such codes: 830 (Channel Islands), 831
1195	         (Guernsey), 832 (Jersey), and 833 (Isle of Man).  This rule
1196	         exists so that UN M.49 codes remain available as the value of
1197	         last resort in cases where ISO 3166 reassigns a deprecated
1198	         value in the registry.

1200	      *  For ISO 3166 codes, if there is no associated UN numeric code,
1201	         then the Language Subtag Reviewer SHALL petition the UN to
1202	         create one.  If there is no response from the UN within ninety
1203	         days of the request being sent, the Language Subtag Reviewer
1204	         SHALL prepare a proposal for entering in the IANA registry as
1205	         soon as practical a registered variant subtag as an alternate
1206	         value for the new code.  The form of the registered variant
1207	         subtag will be at the discretion of the Language Subtag
1208	         Reviewer and MUST conform to other restrictions on variant
1209	         subtags in this document.  This situation is very unlikely to
1210	         ever occur.

1212	   o  Stability provisions apply to grandfathered tags with this
1213	      exception: should all of the subtags in a grandfathered tag become
1214	      valid subtags in the IANA registry, then the field 'Type' in that
1215	      record is changed from 'grandfathered' to 'redundant'.  Note that
1216	      this will not affect language tags that match the grandfathered
1217	      tag, since these tags will now match valid generative subtag
1218	      sequences.  For example, if the subtag 'gan' in the language tag
1219	      "zh-gan" were to be registered as an extended language subtag,
1220	      then the grandfathered tag "zh-gan" would be deprecated (but
1221	      existing content or implementations that use "zh-gan" would remain
1222	      valid).

1224	3.4  Registration Procedure for Subtags

1226	   The procedure given here MUST be used by anyone who wants to use a
1227	   subtag not currently in the IANA Language Subtag Registry.

1229	   Only subtags  of type 'language' and 'variant' will be considered for
1230	   independent registration of new subtags.  Handling of subtags needed
1231	   for stability and subtags necessary to keep the registry synchronized
1232	   with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits
1233	   defined by this document are described in Section 3.2.  Stability
1234	   provisions are described in Section 3.3.

1236	   This procedure MAY also be used to register or alter the information
1237	   for the "Description", "Comments", "Deprecated", or "Prefix" fields
1238	   in a subtag's record as described in Section 3.3.  Changes to all
1239	   other fields in the IANA registry are NOT permitted.

1241	   Registering a new subtag or requesting modifications to an existing
1242	   tag or subtag starts with the requester filling out the registration
1243	   form reproduced below.  Note that each response is not limited in
1244	   size so that the request can adequately describe the registration.
1245	   The fields in the "Record Requested" section SHOULD follow the
1246	   requirements in Section 3.1.

1248	   LANGUAGE SUBTAG REGISTRATION FORM
1249	   1. Name of requester:
1250	   2. E-mail address of requester:
1251	   3. Record Requested:

1253	   Type:
1254	   Subtag:
1255	   Description:
1256	   Prefix:
1257	   Preferred-Value:
1258	   Deprecated:
1259	   Suppress-Script:
1260	   Comments:

1262	   4. Intended meaning of the subtag:
1263	   5. Reference to published description
1264	   of the language (book or article):
1265	   6. Any other relevant information:

1267	                                 Figure 5

1269	   The subtag registration form MUST be sent to
1270	   <ietf-languages@iana.org> for a two week review period before it can
1271	   be submitted to IANA.  (This is an open list and can be joined by
1272	   sending a request to <ietf-languages-request@iana.org>.)

1274	   Variant and extlang subtags are always registered for use with a
1275	   particular range of language tags.  For example, the subtag 'rozaj'
1276	   is intended for use with language tags that start with the primary
1277	   language subtag "sl", since Resian is a dialect of Slovenian.  Thus
1278	   the subtag 'rozaj' could be included in tags such as "sl-Latn-rozaj"
1279	   or "sl-IT-rozaj".  This information is stored in the "Prefix" field
1280	   in the registry.  Variant registration requests are REQUIRED to
1281	   include at least one "Prefix" field in the registration form.

1283	   The 'Prefix' field for a given registered subtag will be maintained
1284	   in the IANA registry as a guide to usage.  Additional prefixes MAY be
1285	   added by filing an additional registration form.  In that form, the
1286	   "Any other relevant information:" field MUST indicate that it is the
1287	   addition of a prefix.

1289	   Requests to add a prefix to a variant subtag that imply a different
1290	   semantic meaning will probably be rejected.  For example, a request
1291	   to add the prefix "de" to the subtag 'nedis' so that the tag "de-
1292	   nedis" represented some German dialect would be rejected.  The
1293	   'nedis' subtag represents a particular Slovenian dialect and the
1294	   additional registration would change the semantic meaning assigned to
1295	   the subtag.  A separate subtag SHOULD be proposed instead.

1297	   The 'Description' field MUST contain a description of the tag being
1298	   registered written or transcribed into the Latin script; it MAY also
1299	   include a description in a non-Latin script.  Non-ASCII characters
1300	   MUST be escaped using the syntax described in Section 3.1.  The
1301	   'Description' field is used for identification purposes and doesn't
1302	   necessarily  represent the actual native name of the language or
1303	   variation or to be in any particular language.

1305	   While the 'Description' field itself is not guaranteed to be stable
1306	   and errata corrections MAY be undertaken from time to time, attempts
1307	   to provide translations or transcriptions of entries in the registry
1308	   itself will probably be frowned upon by the community or rejected
1309	   outright, as changes of this nature have an impact on the provisions
1310	   in Section 3.3.

1312	   The Language Subtag Reviewer is responsible for responding to
1313	   requests for the registration of subtags through the registration
1314	   process  and is appointed by the IESG.

1316	   When the two week period has passed the Language Subtag Reviewer
1317	   either forwards the record to be inserted or modified to
1318	   iana@iana.org according to the procedure described in Section 3.2, or
1319	   rejects the request because of significant objections raised on the
1320	   list or due to problems with constraints in this document (which MUST
1321	   be explicitly cited).  The reviewer MAY also extend the review period
1322	   in two week increments to permit further discussion.  The reviewer
1323	   MUST indicate on the list whether the registration has been accepted,
1324	   rejected, or extended following each two week period.

1326	   Note that the reviewer can raise objections on the list if he or she
1327	   so desires.  The important thing is that the objection MUST be made
1328	   publicly.

1330	   The applicant is free to modify a rejected application with
1331	   additional information and submit it again; this restarts the two
1332	   week comment period.

1334	   Decisions made by the reviewer MAY be appealed to the IESG [RFC2028]
1335	   under the same rules as other IETF decisions [RFC2026].

1337	   All approved registration forms are available online in the directory
1338	   http://www.iana.org/numbers.html under "languages".

1340	   Updates or changes to existing records follow the same procedure as
1341	   new registrations.  The Language Subtag Reviewer decides whether
1342	   there is consensus to update the registration following the two week
1343	   review period; normally objections by the original registrant will
1344	   carry extra weight in forming such a consensus.

1346	   Registrations are permanent and stable.  Once registered, subtags
1347	   will not be removed from the registry and will remain a valid way in
1348	   which to specify a specific language or variant.

1350	   Note: The purpose of the "Description" in the registration form is
1351	   intended as an aid to people trying to verify whether a language is
1352	   registered or what language or language variation a particular subtag
1353	   refers to.  In most cases, reference to an authoritative grammar or
1354	   dictionary of that language will be useful; in cases where no such
1355	   work exists, other well known works describing that language or in
1356	   that language MAY be appropriate.  The subtag reviewer decides what
1357	   constitutes "good enough" reference material.  This requirement is
1358	   not intended to exclude particular languages or dialects due to the
1359	   size of the speaker population or lack of a standardized orthography.
1360	   Minority languages will be considered equally on their own merits.

1362	3.5  Possibilities for Registration

1364	   Possibilities for registration of subtags or information about
1365	   subtags include:

1367	   o  Primary language subtags for languages not listed in ISO 639 that
1368	      are not variants of any listed or registered language can be
1369	      registered.  At the time this document was created there were no
1370	      examples of this form of subtag.  Before attempting to register a
1371	      language subtag, there MUST be an attempt to register the language
1372	      with ISO 639.  No language subtags will be registered for codes
1373	      that exist in ISO 639-1 or ISO 639-2, which are under
1374	      consideration by the ISO 639 maintenance or registration
1375	      authorities, or which have never been attempted for registration
1376	      with those authorities.  If ISO 639 has previously rejected a
1377	      language for registration, it is reasonable to assume that there
1378	      must be additional very compelling evidence of need before it will
1379	      be registered in the IANA registry (to the extent that it is very
1380	      unlikely that any subtags will be registered of this type).

1382	   o  Dialect or other divisions or variations within a language, its
1383	      orthography, writing system, regional or historical usage,
1384	      transliteration or other transformation, or distinguishing
1385	      variation MAY be registered as variant subtags.  An example is the
1386	      'rozaj' subtag (the Resian dialect of Slovenian).

1388	   o  The addition or maintenance of fields (generally of an
1389	      informational nature) in Tag or Subtag records as described in
1390	      Section 3.1 and subject to the stability provisions in
1391	      Section 3.3.  This includes  descriptions; comments; deprecation
1392	      and preferred values for obsolete or withdrawn codes; or the
1393	      addition of script or extlang information to primary language
1394	      subtags.

1396	   o  The addition of records and related field value changes necessary
1397	      to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and
1398	      UN  M.49 as described in Section 3.3.

1400	   This document leaves the decision on what subtags  or changes to
1401	   subtags are appropriate (or not) to the registration process
1402	   described in Section 3.4.

1404	   Note: four character primary language subtags are reserved to allow
1405	   for the possibility of  alpha4 codes in some future addition to the
1406	   ISO 639 family of standards.

1408	   ISO 639 defines a maintenance agency for additions to and changes in
1409	   the list of languages in ISO 639.  This agency is:

1411	   International Information Centre for Terminology (Infoterm)
1412	   Aichholzgasse 6/12, AT-1120
1413	   Wien, Austria
1414	   Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72

1416	   ISO 639-2 defines a maintenance agency for additions to and changes
1417	   in the list of languages in ISO 639-2.  This agency is:

1419	   Library of Congress
1420	   Network Development and MARC Standards Office
1421	   Washington, D.C. 20540 USA
1422	   Phone: +1 202 707 6237  Fax: +1 202 707 0115
1423	   URL: http://www.loc.gov/standards/iso639

1425	   The maintenance agency for ISO 3166 (country codes) is:

1427	   ISO 3166 Maintenance Agency
1428	   c/o International Organization for Standardization
1429	   Case postale 56
1430	   CH-1211 Geneva 20 Switzerland
1431	   Phone: +41 22 749 72 33  Fax: +41 22 749 73 49
1432	   URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html

1434	   The registration authority for ISO 15924 (script codes) is:

1436	   Unicode Consortium Box 391476
1437	   Mountain View, CA 94039-1476, USA
1438	   URL: http://www.unicode.org/iso15924

1440	   The Statistics Division of the United Nations Secretariat maintains
1441	   the Standard Country or Area Codes for Statistical Use and can be
1442	   reached at:

1444	   Statistical Services Branch
1445	   Statistics Division
1446	   United Nations, Room DC2-1620
1447	   New York, NY 10017, USA

1449	   Fax: +1-212-963-0623
1450	   E-mail: statistics@un.org
1451	   URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm

1453	3.6  Extensions and Extensions Namespace

1455	   Extension subtags are those introduced by single-letter subtags other
1456	   than 'x'.  They are reserved for the generation of identifiers which
1457	   contain a language component, and are compatible with applications
1458	   that understand language tags.  For example, they might be used to
1459	   define locale identifiers, which are generally based on language.

1461	   The structure and form of extensions are defined by this document so
1462	   that implementations can be created that are forward compatible with
1463	   applications that might be created using single-letter subtags in the
1464	   future.  In addition, defining a mechanism for maintaining single-
1465	   letter subtags will lend to the stability of this document by
1466	   reducing the likely need for future revisions or updates.

1468	   Allocation of a single-letter subtag SHALL take the form of an RFC
1469	   defining the name, purpose, processes, and procedures for maintaining
1470	   the subtags.  The maintaining or registering authority, including
1471	   name, contact email, discussion list email, and URL location of the
1472	   registry MUST be indicated clearly in the RFC.  The RFC MUST specify
1473	   or include each of the following:

1475	   o  The specification MUST reference the specific version or revision
1476	      of this document that governs its creation and MUST reference this
1477	      section of this document.

1479	   o  The specification and all subtags defined by the specification
1480	      MUST follow the ABNF and other rules for the formation of tags and
1481	      subtags as defined in this document.  In particular it MUST
1482	      specify that case is not significant and that subtags MUST NOT
1483	      exceed eight characters in length.

1485	   o  The specification MUST specify a canonical representation.

1487	   o  The specification of valid subtags MUST be available over the
1488	      Internet and at no cost.

1490	   o  The specification MUST be in the public domain or available via a
1491	      royalty-free license acceptable to the IETF and specified in the
1492	      RFC.

1494	   o  The specification MUST be versioned and each version of the
1495	      specification MUST be numbered, dated, and stable.

1497	   o  The specification MUST be stable.  That is, extension subtags,
1498	      once defined by a specification, MUST NOT be retracted or change
1499	      in meaning in any substantial way.

1501	   o  The specification MUST include in a separate section the
1502	      registration form reproduced in this section (below) to be used in
1503	      registering the extension upon publication as an RFC.

1505	   o  IANA MUST be informed of changes to the contact information and
1506	      URL for the specification.

1508	   IANA will maintain a registry of allocated single-letter (singleton)
1509	   subtags.  This registry will use the record-jar format described by
1510	   the ABNF in Section 3.1.  Upon publication of an extension as an RFC,
1511	   the maintaining authority defined in the RFC MUST forward this
1512	   registration form to iesg@ietf.org, who will forward the request to
1513	   iana@iana.org.  The maintaining authority of the extension MUST
1514	   maintain the accuracy of the record by sending an updated full copy
1515	   of the record to iana@iana.org with the subject line "LANGUAGE TAG
1516	   EXTENSION UPDATE" whenever content changes.  Only the 'Comments',
1517	   'Contact_Email', 'Mailing_List', and 'URL' fields MAY be modified in
1518	   these updates.

1520	   Failure to maintain this record, the corresponding registry, or meet
1521	   other conditions imposed by this section of this document MAY be
1522	   appealed to the IESG [RFC2028] under the same rules as other IETF
1523	   decisions (see [RFC2026]) and MAY result in the authority to maintain
1524	   the extension being withdrawn or reassigned by the IESG.

1526	   %%
1527	   Identifier:
1528	   Description:
1529	   Comments:
1530	   Added:
1531	   RFC:
1532	   Authority:
1533	   Contact_Email:
1534	   Mailing_List:
1535	   URL:
1536	   %%

1538	    Figure 6: Format of Records in the Language Tag Extensions Registry

1540	   'Identifier' contains the single letter subtag (singleton) assigned
1541	   to the extension.  The Internet-Draft submitted to define the
1542	   extension SHOULD specify which letter to use, although the IESG MAY
1543	   change the assignment when approving the RFC.

1545	   'Description' contains the name and description of the extension.

1547	   'Comments' is an OPTIONAL field and MAY contain a broader description
1548	   of the extension.

1550	   'Added' contains the date the RFC was published in the "full-date"
1551	   format specified in [RFC3339].  For example: 2004-06-28 represents
1552	   June 28, 2004, in the Gregorian calendar.

1554	   'RFC' contains the RFC number assigned to the extension.

1556	   'Authority' contains the name of the maintaining authority for the
1557	   extension.

1559	   'Contact_Email' contains the email address used to contact the
1560	   maintaining authority.

1562	   'Mailing_List' contains the URL or subscription email address of the
1563	   mailing list used by the maintaining authority.

1565	   'URL' contains the URL of the registry for this extension.

1567	   The determination of whether an Internet-Draft meets the above
1568	   conditions and the decision to grant or withhold such authority rests
1569	   solely with the IESG, and is subject to the normal review and appeals
1570	   process associated with the RFC process.

1572	   Extension authors are strongly cautioned that many (including most
1573	   well-formed) processors will be unaware of any special relationships
1574	   or meaning inherent in the order of extension subtags.  Extension
1575	   authors SHOULD avoid subtag relationships or canonicalization
1576	   mechanisms that interfere with matching or with length restrictions
1577	   that sometimes exist in common protocols where the extension is used.
1578	   In particular, applications MAY truncate the subtags in doing
1579	   matching or in fitting into limited lengths, so it is RECOMMENDED
1580	   that the most significant information be in the most significant
1581	   (left-most) subtags, and that the specification gracefully handle
1582	   truncated subtags.

1584	   When a language tag is to be used in a specific, known, protocol, it
1585	   is RECOMMENDED that that the language tag not contain extensions not
1586	   supported by that protocol.  In addition, note that some protocols
1587	   MAY impose upper limits on the length of the strings used to store or
1588	   transport the language tag.

1590	3.7  Initialization of the Registry

1592	   Adoption of this document will REQUIRE an initial version of the
1593	   registry containing the various subtags initially valid in a language
1594	   tag.  This collection of subtags, along with a description of the
1595	   process used to create it, is described by [initial-registry].

1597	   Registrations that are in process under the rules defined in
1598	   [RFC3066] when this document is adopted MAY be completed under the
1599	   former rules, at the discretion of the language tag reviewer.  Any
1600	   new registrations submitted after the adoption of this document MUST
1601	   be rejected.

1603	4.  Formation and Processing of Language Tags

1605	   This section addresses how to use the information in the registry
1606	   with the tag syntax to choose, form and process language tags.

1608	4.1  Choice of Language Tag

1610	   One is sometimes faced with the choice between several possible tags
1611	   for the same body of text.

1613	   Interoperability is best served when all users use the same language
1614	   tag in order to represent the same language.  If an application has
1615	   requirements that make the rules here inapplicable, then that
1616	   application risks damaging interoperability.  It is strongly
1617	   RECOMMENDED that users not define their own rules for language tag
1618	   choice.

1620	   Subtags SHOULD only be used  where they add useful distinguishing
1621	   information; extraneous subtags interfere with the meaning,
1622	   understanding, and processing of language tags.  In particular, users
1623	   and implementations SHOULD follow the 'Prefix' and 'Suppress-Script'
1624	   fields in the registry (defined in Section 3.1): these fields provide
1625	   guidance on when specific additional subtags SHOULD (and SHOULD NOT)
1626	   be used in a language tag.

1628	   Of particular note, many applications can benefit from the use of
1629	   script subtags in language tags, as long as the use is consistent for
1630	   a given context.  Script subtags were not formally defined in RFC
1631	   3066 and their use can affect matching and subtag identification by
1632	   implementations of RFC 3066, as these subtags appear between the
1633	   primary language and region subtags.  For example, if a user requests
1634	   content in an implementation of Section 2.5 of [RFC3066] using the
1635	   language range "en-US", content labeled "en-Latn-US" will not match
1636	   the request.  Therefore it is important to know when script subtags
1637	   will customarily be used and when they ought not be used.  In the
1638	   registry, the Suppress-Script field helps ensure greater
1639	   compatibility between the language tags generated according to the
1640	   rules in this document and language tags and tag processors or
1641	   consumers based on RFC 3066 by defining when users SHOULD NOT include
1642	   a script subtag with a particular primary language subtag.

1644	   Extended language subtags (type 'extlang' in the registry, see
1645	   Section 3.1) also appear between the primary language and region
1646	   subtags and are reserved for future standardization.  Applications
1647	   might benefit from their judicious use in forming language tags in
1648	   the future.  Similar recommendations are expected to apply to their
1649	   use as apply to script subtags.

1651	   Standards, protocols and applications that reference this document
1652	   normatively but apply different rules to the ones given in this
1653	   section MUST specify how the procedure varies from the one given
1654	   here.

1656	   The choice of subtags used to form a language tag SHOULD be guided by
1657	   the following rules:

1659	   1.  Use as precise a tag as possible, but no more specific than is
1660	       justified.  Avoid using subtags that are not important for
1661	       distinguishing content in an application.

1663	       *  For example, 'de' might suffice for tagging an email written
1664	          in German, while "de-CH-1996" is probably unnecessarily
1665	          precise for such a task.

1667	   2.  The script subtag SHOULD NOT be used to form language tags unless
1668	       the script adds some distinguishing information to the tag.  The
1669	       field 'Suppress-Script' in the primary language record in the
1670	       registry indicates which script subtags do not add distinguishing
1671	       information for most applications.

1673	       *  For example, the subtag 'Latn' should not be used with the
1674	          primary language 'en' because nearly all English documents are
1675	          written in the Latin script and it adds no distinguishing
1676	          information.  However, if a document were written in English
1677	          mixing Latin script with another script such as Braille
1678	          ('Brai'), then it might be appropriate to choose to indicate
1679	          both scripts to aid in content selection, such as the
1680	          application of a stylesheet.

1682	   3.  If a tag or subtag has a 'Preferred-Value' field in its registry
1683	       entry, then the  value of that field SHOULD be used to form the
1684	       language tag in preference to the tag or subtag in which the
1685	       preferred value appears.

1687	       *  For example, use 'he' for Hebrew in preference to 'iw'.

1689	   4.  The 'und' (Undetermined) primary language subtag SHOULD NOT be
1690	       used to label content, even if the language is unknown.  Omitting
1691	       the language tag altogether is preferred to using a tag with a
1692	       primary language subtag of 'und'.  The 'und' subtag MAY be useful
1693	       for protocols that require a language tag to be provided.  The
1694	       'und' subtag MAY also be useful when matching language tags in
1695	       certain situations.

1697	   5.  The 'mul' (Multiple) primary language subtag SHOULD NOT be used
1698	       whenever the protocol allows the separate tags for multiple
1699	       languages, as is the case for the Content-Language header in
1700	       HTTP.  The 'mul' subtag conveys little useful information:
1701	       content in multiple languages SHOULD individually tag the
1702	       languages where they appear or otherwise indicate the actual
1703	       language in preference to the 'mul' subtag.

1705	   6.  The same variant subtag SHOULD NOT be used more than once within
1706	       a language tag.

1708	       *  For example, do not use "de-DE-1901-1901".

1710	   To ensure consistent backward compatibility, this document contains
1711	   several provisions to account for potential instability in the
1712	   standards used to define the subtags that make up language tags.
1713	   These provisions mean that no language tag created under the rules in
1714	   this document will become obsolete.

1716	4.2  Meaning of the Language Tag

1718	   The relationship between the tag and the information it relates to is
1719	   defined by the the context in which the tag appears.  Accordingly,
1720	   this section can only give possible examples of its usage.

1722	   o  For a single information object, the associated language tags
1723	      might be interpreted as the set of languages that is necessary for
1724	      a complete comprehension of the complete object.  Example: Plain
1725	      text documents.

1727	   o  For an aggregation of information objects, the associated language
1728	      tags could be taken as the set of languages used inside components
1729	      of that aggregation.  Examples: Document stores and libraries.

1731	   o  For information objects whose purpose is to provide alternatives,
1732	      the associated language tags could be regarded as a hint that the
1733	      content is provided in several languages, and that one has to
1734	      inspect each of the alternatives in order to find its language or
1735	      languages.  In this case, the presence of multiple tags might not
1736	      mean that one needs to be multi-lingual to get complete
1737	      understanding of the document.  Example: MIME multipart/
1738	      alternative.

1740	   o  In markup languages, such as HTML and XML, language information
1741	      can be added to each part of the document identified by the markup
1742	      structure (including the whole document itself).  For example, one
1743	      could write <span lang="fr">C'est la vie.</span> inside a
1744	      Norwegian document; the Norwegian-speaking user could then access
1745	      a French-Norwegian dictionary to find out what the marked section
1746	      meant.  If the user were listening to that document through a
1747	      speech synthesis interface, this formation could be used to signal
1748	      the synthesizer to appropriately apply French text-to-speech
1749	      pronunciation rules to that span of text, instead of applying the
1750	      inappropriate Norwegian rules.

1752	   Language tags are related when they contain a similar sequence of
1753	   subtags.  For example, if a language tag B contains language tag A as
1754	   a prefix, then B is typically "narrower" or "more specific" than A.
1755	   Thus "zh-Hant-TW" is more specific than "zh-Hant".

1757	   This relationship is not guaranteed in all cases: specifically,
1758	   languages that begin with the same sequence of subtags are NOT
1759	   guaranteed to be mutually intelligible, although they might be.  For
1760	   example, the tag "az" shares a prefix with both "az-Latn"
1761	   (Azerbaijani written using the Latin script) and "az-Cyrl"
1762	   (Azerbaijani written using the Cyrillic script).  A person fluent in
1763	   one script might not be able to read the other, even though the text
1764	   might be identical.  Content tagged as "az" most probably is written
1765	   in just one script and thus might not be intelligible to a reader
1766	   familiar with the other script.

1768	4.3  Length Considerations

1770	   [RFC3066] did not provide an upper limit on the size of language
1771	   tags.  While RFC 3066 did define the semantics of particular subtags
1772	   in such a way that most language tags consisted of language and
1773	   region subtags with a combined total length of up to six characters,
1774	   larger registered tags were not only possible but were actually
1775	   registered.

1777	   Neither the language tag syntax nor other requirements in this
1778	   document  impose a fixed upper limit on the number of subtags in a
1779	   language tag (and thus an upper bound on the size of a tag).  The
1780	   language tag syntax suggests that, depending on the specific
1781	   language, more subtags (and thus a longer tag) are sometimes
1782	   necessary to completely identify the language for certain
1783	   applications; thus it is possible to envision long or complex subtag
1784	   sequences.

1786	4.3.1  Working with Limited Buffer Sizes

1788	   Some applications and protocols are forced to allocate fixed buffer
1789	   sizes or otherwise limit the length of a language tag.  A conformant
1790	   implementation or specification MAY refuse to support the storage of
1791	   language tags which exceed a specified length.  Any such limitation
1792	   SHOULD be clearly documented, and such documentation SHOULD include
1793	   what happens to longer tags (for example, whether an error value is
1794	   generated or the language tag is truncated).  A protocol that allows
1795	   tags to be truncated at an arbitrary limit, without giving any
1796	   indication of what that limit is, has the potential for causing harm
1797	   by changing the meaning of tags in substantial ways.

1799	   In practice, most language tags do not require more than a few
1800	   subtags and will not approach reasonably sized buffer limitations:
1801	   see Section 4.1.

1803	   Some specifications or protocols have limits on tag length but do not
1804	   have a fixed length limitation.  For example, [RFC2231]  has no
1805	   explicit length limitation: the length available for the language tag
1806	   is constrained by the length of other header components (such as the
1807	   charset's name) coupled with the 76 character limit in [RFC2047].
1808	   Thus the "limit" might be 50 or more characters, but it could
1809	   potentially be quite small.

1811	   The considerations for assigning a buffer limit are:

1813	      Implementations SHOULD NOT truncate language tags unless the
1814	      meaning of the tag is purposefully being changed, or unless the
1815	      tag does not fit into a limited buffer size specified by a
1816	      protocol for storage or transmission.

1818	      Implementations SHOULD warn the user when a tag is truncated since
1819	      truncation changes the semantic meaning of the tag.

1821	      Implementations of protocols or specifications that are space
1822	      constrained but do not have a fixed limit SHOULD use the longest
1823	      possible tag in preference to truncation.

1825	      Protocols or specifications that specify limited buffer sizes for
1826	      language tags MUST allow for language tags of up to 33 characters.

1828	      Protocols or specifications that specify limited buffer sizes for
1829	      language tags SHOULD allow for language tags of at least 42
1830	      characters.

1832	   The following illustration shows how the 42-character recommendation
1833	   was derived.  The combination of language and extended language
1834	   subtags was chosen for future compatibility.  At up to 15 characters,
1835	   this combination is longer than the longest possible primary language
1836	   subtag (8 characters):

1838	   language      =  3 (ISO 639-2; ISO 639-1 requires 2)
1839	   extlang1      =  4 (each subsequent subtag includes '-')
1840	   extlang2      =  4 (unlikely: needs prefix="language-extlang1")
1841	   extlang3      =  4 (extremely unlikely)
1842	   script        =  5 (if not suppressed: see Section 4.1)
1843	   region        =  4 (UN M.49; ISO 3166 requires 3)
1844	   variant1      =  9 (MUST have language as a prefix)
1845	   variant2      =  9 (MUST have language-variant1 as a prefix)

1847	   total         = 42 characters

1849	              Figure 7: Derivation of the Limit on Tag Length

1851	4.3.2  Truncation of Language Tags

1853	   Truncation of a language tag alters the meaning of the tag, and thus
1854	   SHOULD be avoided.  However, truncation of language tags is sometimes
1855	   necessary due to limited buffer sizes.  Such truncation MUST NOT
1856	   permit a subtag to be chopped off in the middle or the formation of
1857	   invalid tags (for example, one ending with the "-" character).

1859	   This means that applications or protocols which truncate tags MUST do
1860	   so by progressively removing subtags along with their preceding "-"
1861	   from the right side of the language tag until the tag is short enough
1862	   for the given buffer.  If the resulting tag ends with a single-
1863	   character subtag, that subtag and its preceding "-" MUST also be
1864	   removed.  For example:

1866	   Tag to truncate: zh-Hant-CN-variant1-a-extend1-x-wadegile-private1
1867	   1. zh-Latn-CN-variant1-a-extend1-x-wadegile
1868	   2. zh-Latn-CN-variant1-a-extend1
1869	   3. zh-Latn-CN-variant1
1870	   4. zh-Latn-CN
1871	   5. zh-Latn
1872	   6. zh

1874	                    Figure 8: Example of Tag Truncation

1876	4.4  Canonicalization of Language Tags

1878	   Since a particular language tag is sometimes used by many processes,
1879	   language tags SHOULD always be created or generated in a canonical
1880	   form.

1882	   A language tag is in canonical form when:

1884	   1.  The tag is well-formed according the rules in Section 2.1 and
1885	       Section 2.2.

1887	   2.  Subtags of type 'Region' that have a Preferred-Value mapping in
1888	       the IANA registry (see Section 3.1) SHOULD be replaced with their
1889	       mapped value.

1891	   3.  Redundant or grandfathered tags that have a Preferred-Value
1892	       mapping in the IANA registry (see Section 3.1) MUST be replaced
1893	       with their mapped value.  These items are either deprecated
1894	       mappings created before the adoption of this document (such as
1895	       the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are
1896	       the result of later registrations or additions to this document
1897	       (for example, "zh-guoyu" might be mapped to a language-extlang
1898	       combination such as "zh-cmn" by some future update of this
1899	       document).

1901	   4.  Other subtags that have a Preferred-Value mapping in the IANA
1902	       registry (see Section 3.1) MUST be replaced with their mapped
1903	       value.  These items consist entirely of clerical corrections to
1904	       ISO 639-1 in which the deprecated subtags have been maintained
1905	       for compatibility purposes.

1907	   5.  If more than one extension subtag sequence exists, the extension
1908	       sequences are ordered into case-insensitive ASCII order by
1909	       singleton subtag.

1911	   Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical
1912	   form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in
1913	   canonical form.

1915	   Example: The language tag "en-NH" (English as used in the New
1916	   Hebrides) is not canonical because the 'NH' subtag has a canonical
1917	   mapping to 'VU' (Vanuatu), although the tag "en-NH" maintains its
1918	   validity.

1920	   Canonicalization of language tags does not imply anything about the
1921	   use of upper or lowercase letters when processing or comparing
1922	   subtags (and as described in Section 2.1).  All comparisons MUST be
1923	   performed in a case-insensitive manner.

1925	   When performing canonicalization of language tags, processors MAY
1926	   regularize the case of the subtags (that is, this process is
1927	   OPTIONAL), following the case used in the registry.  Note that this
1928	   corresponds to the following casing rules: uppercase all non-initial
1929	   two-letter subtags; titlecase all non-initial four-letter subtags;
1930	   lowercase everything else.

1932	   Note: Case folding of ASCII letters in certain locales, unless
1933	   carefully handled, sometimes produces non-ASCII character values.
1934	   The Unicode Character Database file "SpecialCasing.txt" defines the
1935	   specific cases that are known to cause problems with this.  In
1936	   particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is
1937	   uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE).
1938	   Implementers SHOULD specify a locale-neutral casing operation to
1939	   ensure that case folding of subtags does not produce this value,
1940	   which is illegal in language tags.  For example, if one were to
1941	   uppercase the region subtag 'in' using Turkish locale rules, the
1942	   sequence U+0130 U+004E would result instead of the expected 'IN'.

1944	   Note: if the field 'Deprecated' appears in a registry record without
1945	   an accompanying 'Preferred-Value' field, then that tag or subtag is
1946	   deprecated without a replacement.  Validating processors SHOULD NOT
1947	   generate tags that include these values, although the values are
1948	   canonical when they appear in a language tag.

1950	   An extension MUST define any relationships that exist between the
1951	   various subtags in the extension and thus MAY define an alternate
1952	   canonicalization scheme for the extension's subtags.  Extensions MAY
1953	   define how the order of the extension's subtags are interpreted.  For
1954	   example, an extension could define that its subtags are in canonical
1955	   order when the subtags are placed into ASCII order: that is, "en-a-
1956	   aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa".  Another extension might
1957	   define that the order of the subtags influences their semantic
1958	   meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
1959	   aaa-bbb-ccc").  However, extension specifications SHOULD be designed
1960	   so that they are tolerant of the typical processes described in
1961	   Section 3.6.

1963	4.5  Considerations for Private Use Subtags

1965	   Private-use subtags require private agreement between the parties
1966	   that intend to use or exchange language tags that use them and great
1967	   caution SHOULD be used in employing them in content or protocols
1968	   intended for general use.  Private-use subtags are simply useless for
1969	   information exchange without prior arrangement.

1971	   The value and semantic meaning of private-use tags and of the subtags
1972	   used within such a language tag are not defined by this document.

1974	   The use of subtags defined in the IANA registry as having a specific
1975	   private use meaning convey more information that a purely private use
1976	   tag prefixed by the singleton subtag 'x'.  For applications this
1977	   additional information MAY be useful.

1979	   For example, the region subtags 'AA', 'ZZ' and in the ranges
1980	   'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY
1981	   be used to form a language tag.  A tag such as "zh-Hans-XQ" conveys a
1982	   great deal of public, interchangeable information about the language
1983	   material (that it is Chinese in the simplified Chinese script and is
1984	   suitable for some geographic region 'XQ').  While the precise
1985	   geographic region is not known outside of private agreement, the tag
1986	   conveys far more information than an opaque tag such as "x-someLang",
1987	   which contains no information about the language subtag or script
1988	   subtag outside of the private agreement.

1990	   However, in some cases content tagged with private use subtags MAY
1991	   interact with other systems in a different and possibly unsuitable
1992	   manner compared to tags that use opaque, privately defined subtags,
1993	   so the choice of the best approach sometimes depends on the
1994	   particular domain in question.

1996	5.  IANA Considerations

1998	   This section deals with the processes and requirements necessary for
1999	   IANA to undertake to maintain the subtag and extension registries as
2000	   defined by this document and in accordance with the requirements of
2001	   [RFC2434].

2003	   The impact on the IANA maintainers of the two registries defined by
2004	   this document will be a small increase in the frequency of new
2005	   entries or updates.

2007	5.1  Language Subtag Registry

2009	   Upon adoption of this document, the registry will be initialized by a
2010	   companion document: [initial-registry].  The criteria and process for
2011	   selecting the initial set of records is described in that document.
2012	   The initial set of records represents no impact on IANA, since the
2013	   work to create it will be performed externally.

2015	   The new registry MUST be listed under "Language Tags" at
2016	   <http://www.iana.org/numbers.html>, replacing the existing
2017	   registrations defined by [RFC3066].  The existing set of registration
2018	   forms and RFC 3066 registrations will be relabeled as "Language Tags
2019	   (Obsolete)" and maintained (but not added to or modified).

2021	   Future work on the Language Subtag Registry will be limited to
2022	   inserting or replacing whole records preformatted for IANA by the
2023	   Language Subtag Reviewer as described in Section 3.2 of this
2024	   document.  This simplifies IANA's work by limiting it to placing the
2025	   text in the appropriate location in the registry.

2027	   Each record will be sent to iana@iana.org with a subject line
2028	   indicating whether the enclosed record is an insertion of a new
2029	   record (indicated by the word "INSERT" in the subject line) or a
2030	   replacement of an existing record (indicated by the word "MODIFY" in
2031	   the subject line).  Records MUST NOT be deleted from the registry.
2032	   IANA MUST place any inserted or modified records into the appropriate
2033	   section of the language subtag registry, grouping the records by
2034	   their "Type" field.  Inserted records MAY be placed anywhere in the
2035	   appropriate section; there is no guarantee of the order of the
2036	   records beyond grouping them together by 'Type'.  Modified records
2037	   MUST overwrite the record they replace.

2039	   Included in any request to insert or modify records MUST be a new
2040	   File-Date record.  This record MUST be placed first in the registry.
2041	   In the event that the File-Date record present in the registry has a
2042	   later date then the record being inserted or modified, the existing
2043	   record MUST be preserved.

2045	5.2  Extensions Registry

2047	   The Language Tag Extensions registry will also be generated and sent
2048	   to IANA as described in Section 3.6.  This registry can contain at
2049	   most 35 records and thus changes to this registry are expected to be
2050	   very infrequent.

2052	   Future work by IANA on the Language Tag Extensions Registry is
2053	   limited to two cases.  First, the IESG MAY request that new records
2054	   be inserted into this registry from time to time.  These requests
2055	   will include the record to insert in the exact format described in
2056	   Section 3.6.  In addition, there MAY be occasional requests from the
2057	   maintaining authority for a specific extension to update the contact
2058	   information or URLs in the record.  These requests MUST include the
2059	   complete, updated record.  IANA is not responsible for validating the
2060	   information provided, only that it is properly formatted.  It should
2061	   reasonably be seen to come from the maintaining authority named in
2062	   the record present in the registry.

2064	6.  Security Considerations

2066	   Language tags used in content negotiation, like any other information
2067	   exchanged on the Internet, might be a source of concern because they
2068	   might be used to infer the nationality of the sender, and thus
2069	   identify potential targets for surveillance.

2071	   This is a special case of the general problem that anything sent is
2072	   visible to the receiving party and possibly to third parties as well.
2073	   It is useful to be aware that such concerns can exist in some cases.

2075	   The evaluation of the exact magnitude of the threat, and any possible
2076	   countermeasures, is left to each application protocol (see BCP 72
2077	   [RFC3552] for best current practice guidance on security threats and
2078	   defenses).

2080	   The language tag associated with a particular information item is of
2081	   no consequence whatsoever in determining whether that content might
2082	   contain possible homographs.  The fact that a text is tagged as being
2083	   in one language or using a particular script subtag provides no
2084	   assurance whatsoever that it does not contain characters from scripts
2085	   other than the one(s) associated with or specified by that language
2086	   tag.

2088	   Since there is no limit to the number of variant, private use, and
2089	   extension subtags, and consequently no limit on the possible length
2090	   of a tag, implementations need to guard against buffer overflow
2091	   attacks.  See Section 4.3 for details on language tag truncation,
2092	   which can occur as a consequence of defenses against buffer overflow.

2094	   Although the specification of valid subtags for an extension (see:
2095	   Section 3.6) MUST be available over the Internet, implementations
2096	   SHOULD NOT mechanically depend on it being always accessible, to
2097	   prevent denial-of-service attacks.

2099	7.  Character Set Considerations

2101	   The syntax in this document requires that language tags use only the
2102	   characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
2103	   character sets, so the composition of language tags should not have
2104	   any character set issues.

2106	   Rendering of characters based on the content of a language tag is not
2107	   addressed in this memo.  Historically, some languages have relied on
2108	   the use of specific character sets or other information in order to
2109	   infer how a specific character should be rendered (notably this
2110	   applies to language and culture specific variations of Han ideographs
2111	   as used in Japanese, Chinese, and Korean).  When language tags are
2112	   applied to spans of text, rendering engines can use that information
2113	   in deciding which font to use in the absence of other information,
2114	   particularly where languages with distinct writing traditions use the
2115	   same characters.

2117	8.  Changes from RFC 3066

2119	   The main goals for this revision of language tags were the following:

2121	   *Compatibility.* All valid RFC 3066 language tags  (including those
2122	   in the IANA registry)  remain valid in this specification.  Thus
2123	   there is complete backward compatibility of this specification with
2124	   existing content.  In addition, this document defines language tags
2125	   in such as way as to ensure future compatibility, and processors
2126	   based solely on the RFC 3066 ABNF (such as those described in
2127	   [XMLSchema]) will be able to process tags described by this document.

2129	   *Stability.* Because of the changes in underlying ISO standards, a
2130	   valid RFC 3066 language tag may become invalid (or have its meaning
2131	   change) at a later date.  With so much of the world's computing
2132	   infrastructure dependent on language tags, this is simply
2133	   unacceptable: it invalidates content that may have an extensive
2134	   shelf-life.  In this specification, once a language tag is valid, it
2135	   remains valid forever.  Previously, there was no way to determine
2136	   when two tags were equivalent.  This specification provides a stable
2137	   mechanism for doing so, through the use of canonical forms.  These
2138	   are also stable, so that implementations can depend on the use of
2139	   canonical forms to assess equivalency.

2141	   *Validity.*  The structure of language tags defined by this document
2142	   makes it possible to determine if a particular tag is well-formed
2143	   without regard for the actual content or "meaning" of the tag as a
2144	   whole.  This is important because the registry and underlying
2145	   standards  change over time.  In addition, it must be possible to
2146	   determine if a tag is valid (or not) for a given point in time in
2147	   order  to provide reproducible, testable results.  This process must
2148	   not be error-prone; otherwise even intelligent people will generate
2149	   implementations that give different results.  This specification
2150	   provides for that by having a single data file, with specific
2151	   versioning information, so that the validity of language tags at any
2152	   point in time can be precisely determined (instead of interpolating
2153	   values from many separate sources).

2155	   *Extensibility.* It is important to be able to differentiate between
2156	   written forms of language -- for many implementations this is more
2157	   important than distinguishing between spoken variants of a language.
2158	   Languages are written in a wide variety of different scripts, so this
2159	   document provides for the generative use of ISO 15924 script codes.
2160	   Like the generative use of ISO language and country codes in RFC
2161	   3066, this allows combinations to be produced without resorting to
2162	   the registration process.  The addition of UN codes provides for the
2163	   generation of language tags with regional scope, which is also
2164	   required for information technology.

2166	   The recast of the registry from containing whole language tags to
2167	   subtags is a key part of this.  An important feature of RFC 3066 was
2168	   that it allowed generative use of subtags.  This allows people to
2169	   meaningfully use generated tags, without the delays in registering
2170	   whole tags, and the burden on the registry of having to supply all of
2171	   the combinations that people may find useful.

2173	   Because of the widespread use of language tags, it is potentially
2174	   disruptive to have periodic revisions of the core specification,
2175	   despite demonstrated need.  The extension mechanism provides for a
2176	   way for independent RFCs to define extensions to language tags.
2177	   These extensions have a very constrained, well-defined structure to
2178	   prevent extensions from interfering with implementations of language
2179	   tags defined in this document.  The document also anticipates
2180	   features of ISO 639-3 with the addition of the extended language
2181	   subtags, as well as the possibility of other ISO 639 parts becoming
2182	   useful for the formation of language tags in the future.  The use and
2183	   definition of private use tags has also been modified, to allow
2184	   people to move as much information as possible out of private use
2185	   tags, and into the regular structure.  The goal is to dramatically
2186	   reduce the need to produce a revision of this document in the future.

2188	   The specific changes in this document to meet these goals are:

2190	   o  Defines the ABNF and rules for subtags so that the category of all
2191	      subtags can be determined without reference to the registry.

2193	   o  Adds the concept of well-formed vs. validating processors,
2194	      defining the rules by which an implementation can claim to be one
2195	      or the other.

2197	   o  Replaces the IANA language tag registry with a language subtag
2198	      registry that provides a complete list of valid subtags in the
2199	      IANA registry.  This allows for robust implementation and ease of
2200	      maintenance.  The language subtag registry becomes the canonical
2201	      source for forming language tags.

2203	   o  Provides a process that guarantees stability of language tags, by
2204	      handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in
2205	      the event that they register a previously used value for a new
2206	      purpose.

2208	   o  Allows ISO 15924 script code subtags and allows them to be used
2209	      generatively.  Defines a method for indicating in the registry
2210	      when script subtags are necessary for a given language tag.

2212	   o  Adds the concept of a variant subtag and allows variants to be
2213	      used generatively.

2215	   o  Adds the ability to use a class of UN M.49 tags for  supra-
2216	      national regions and to resolve conflicts in the assignment of ISO
2217	      3166 codes.

2219	   o  Defines the private-use tags in ISO 639, ISO 15924, and ISO 3166
2220	      as the mechanism for creating private-use language, script, and
2221	      region subtags respectively.

2223	   o  Adds a well-defined extension mechanism.

2225	   o  Defines an extended language subtag, possibly for use with certain
2226	      anticipated features of ISO 639-3.

2228	   Ed Note: The following items are provided for the convenience of
2229	   reviewers and will be removed from the final document.

2231	   Changes between draft-ietf-ltru-registry-06 and this version are:

2233	   o  Modified the rules for creating the initial-registry draft to
2234	      require purposefully omitted by eligible codes to be listed
2235	      (#1034)(R.Presuhn)

2237	   o  Removed the example registry.  The initial-registry draft is a
2238	      better example.  Added an informative reference to that document.
2239	      (A.Phillips)

2241	   o  Modified the introduction to Section 2.2.4 and changed the use of
2242	      "as used in" for some examples to clarify how UN M.49 codes and
2243	      other larger regional codes are related to language tags.
2244	      (K.Broome, P.Constable)

2246	   o  Removed nearly all of the text from Section 3.7 to [initial-
2247	      registry].  A bit of new glue text pointing to that document was
2248	      added.  (F.Ellermann)

2250	   o  Updated the Section 5 section to reflect the removal of most of
2251	      the text in Section 3.7 and to generally clean it up.  This
2252	      includes breaking it into subsections.  (A.Phillips)

2254	9.  References

2256	9.1  Normative References

2258	   [ISO639-1]
2259	              International Organization for Standardization, "ISO 639-
2260	              1:2002, Codes for the representation of names of languages
2261	              -- Part 1: Alpha-2 code", ISO Standard 639, 2002, <ISO
2262	              639-1>.

2264	   [ISO639-2]
2265	              International Organization for Standardization, "ISO 639-
2266	              2:1998 - Codes for the representation of names of
2267	              languages -- Part 2: Alpha-3 code - edition 1",
2268	              August 1988, <ISO 639-2>.

2270	   [ISO15924]
2271	              ISO TC46/WG3, "ISO 15924:2003 (E/F) - Codes for the
2272	              representation of names of scripts", January 2004, <ISO
2273	              15924>.

2275	   [ISO3166]  International Organization for Standardization, "Codes for
2276	              the representation of names of countries, 3rd edition",
2277	              ISO Standard 3166, August 1988, <ISO 3166>.

2279	   [UN_M.49]  Statistical Division, United Nations, "Standard Country or
2280	              Area Codes for Statistical Use", UN Standard Country or
2281	              Area Codes for Statistical Use, Revision 4 (United Nations
2282	              publication, Sales No. 98.XVII.9, June 1999, <UN M.49>.

2284	   [ISO10646]
2285	              International Organization for Standardization, "ISO/IEC
2286	              10646-1:2000. Information technology -- Universal
2287	              Multiple-Octet Coded Character Set (UCS) -- Part 1:
2288	              Architecture and Basic Multilingual Plane and ISO/IEC
2289	              10646-2:2001. Information technology -- Universal
2290	              Multiple-Octet Coded Character Set (UCS) -- Part 2:
2291	              Supplementary Planes, as, from time to time, amended,
2292	              replaced by a new edition or expanded by the addition of
2293	              new parts", 2000, <ISO/IEC 10646>.

2295	   [RFC2234bis]
2296	              Crocker, D. and P. Overell, "Augmented BNF for Syntax
2297	              Specifications: ABNF", draft-crocker-abnf-rfc2234bis-00
2298	              (work in progress), March 2005.

2300	   [RFC2026]  Bradner, S., "The Internet Standards Process -- Revision
2301	              3", BCP 9, RFC 2026, October 1996.

2303	   [RFC2028]  Hovey, R. and S. Bradner, "The Organizations Involved in
2304	              the IETF Standards Process", BCP 11, RFC 2028,
2305	              October 1996.

2307	   [RFC2047]  Moore, K., "MIME (Multipurpose Internet Mail Extensions)
2308	              Part Three: Message Header Extensions for Non-ASCII Text",
2309	              RFC 2047, November 1996.

2311	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2312	              Requirement Levels", BCP 14, RFC 2119, March 1997.

2314	   [RFC2434]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
2315	              IANA Considerations Section in RFCs", BCP 26, RFC 2434,
2316	              October 1998.

2318	   [RFC2781]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO
2319	              10646", RFC 2781, February 2000.

2321	   [RFC2860]  Carpenter, B., Baker, F., and M. Roberts, "Memorandum of
2322	              Understanding Concerning the Technical Work of the
2323	              Internet Assigned Numbers Authority", RFC 2860, June 2000.

2325	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
2326	              Timestamps", RFC 3339, July 2002.

2328	   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
2329	              Text on Security Considerations", BCP 72, RFC 3552,
2330	              July 2003.

2332	9.2  Informative References

2334	   [initial-registry]
2335	              Ewell, D., Ed., "Initial Language Subtag Registry",
2336	              June 2005, <http://www.ietf.org/internet-drafts/
2337	              draft-ietf-ltru-initial-registry-00.txt>.

2339	   [iso639.principles]
2340	              ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory
2341	              Committee:  Working principles for ISO 639 maintenance",
2342	              March 2000,
2343	              <http://www.loc.gov/standards/iso639-2/
2344	              iso639jac_n3r.html>.

2346	   [record-jar]
2347	              Raymond, E., "The Art of Unix Programming", 2003.

2349	   [XML10]    Bray (et al), T., "Extensible Markup Language (XML) 1.0",
2350	              02 2004.

2352	   [XMLSchema]
2353	              Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part 2:
2354	              Datatypes Second Edition", 10 2004, <
2355	              http://www.w3.org/TR/xmlschema-2/>.

2357	   [Unicode]  Unicode Consortium, "The Unicode Consortium. The Unicode
2358	              Standard, Version 4.1.0, defined by: The Unicode Standard,
2359	              Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-
2360	              18578-1), as amended by Unicode 4.0.1
2361	              (http://www.unicode.org/versions/Unicode4.0.1) and by
2362	              Unicode 4.1.0
2363	              (http://www.unicode.org/versions/Unicode4.1.0).",
2364	              March 2005.

2366	   [RFC1766]  Alvestrand, H., "Tags for the Identification of
2367	              Languages", RFC 1766, March 1995.

2369	   [RFC2231]  Freed, N. and K. Moore, "MIME Parameter Value and Encoded
2370	              Word Extensions: Character Sets, Languages, and
2371	              Continuations", RFC 2231, November 1997.

2373	   [RFC3066]  Alvestrand, H., "Tags for the Identification of
2374	              Languages", BCP 47, RFC 3066, January 2001.

2376	Authors' Addresses

2378	   Addison Phillips (editor)
2379	   Quest Software

2381	   Email: addison.phillips@quest.com

2383	   Mark Davis (editor)
2384	   IBM

2386	   Email: mark.davis@us.ibm.com

2388	Appendix A.  Acknowledgements

2390	   Any list of contributors is bound to be incomplete; please regard the
2391	   following as only a selection from the group of people who have
2392	   contributed to make this document what it is today.

2394	   The contributors to RFC 3066 and RFC 1766, the precursors of this
2395	   document, made enormous contributions directly or indirectly to this
2396	   document and are generally responsible for the success of language
2397	   tags.

2399	   The following people (in alphabetical order) contributed to this
2400	   document or to RFCs 1766 and 3066:

2402	   Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet,
2403	   Nathaniel Borenstein, Karen Broome, Eric Brunner, Sean M. Burke, M.T.
2404	   Carrasco Benitez, Jeremy Carroll, John Clews, Jim Conklin, Peter
2405	   Constable, John Cowan, Mark Crispin, Dave Crocker, Martin Duerst,
2406	   Frank Ellerman, Michael Everson, Doug Ewell, Ned Freed, Tim Goodwin,
2407	   Dirk-Willem van Gulik, Marion Gunn, Joel Halpren, Elliotte Rusty
2408	   Harold, Paul Hoffman, Scott Hollenbeck, Richard Ishida, Olle
2409	   Jarnefors, Kent Karlsson, John Klensin, Alain LaBonte, Eric Mader,
2410	   Ira McDonald, Keith Moore, Chris Newman, Masataka Ohta, Randy
2411	   Presuhn, George Rhoten, Markus Scherer, Keld Jorn Simonsen, Thierry
2412	   Sourbier, Otto Stolz, Tex Texin, Andrea Vine, Rhys Weatherley, Misha
2413	   Wolf, Francois Yergeau and many, many others.

2415	   Very special thanks must go to Harald Tveit Alvestrand, who
2416	   originated RFCs 1766 and 3066, and without whom this document would
2417	   not have been possible.  Special thanks must go to Michael Everson,
2418	   who has served as language tag reviewer for almost the complete
2419	   period since the publication of RFC 1766.  Special thanks to Doug
2420	   Ewell, for his production of the first complete subtag registry, and
2421	   his work in producing a test parser for verifying language tags.

2423	Appendix B.  Examples of Language Tags (Informative)

2425	   Simple language subtag:

2427	      de (German)

2429	      fr (French)

2431	      ja (Japanese)

2433	      i-enochian (example of a grandfathered tag)

2435	   Language subtag plus Script subtag:

2437	      zh-Hant (Chinese written using the Traditional Chinese script)

2439	      zh-Hans (Chinese written using the Simplified Chinese script)

2441	      sr-Cyrl (Serbian written using the  Cyrillic script)

2443	      sr-Latn (Serbian written using the Latin script)

2445	   Language-Script-Region:

2447	      zh-Hans-CN (Chinese written using the Simplified script as used in
2448	      mainland China)

2450	      sr-Latn-CS (Serbian written using the Latin script as used in
2451	      Serbia and Montenegro)

2453	   Language-Variant:

2455	      sl-rozaj (Resian dialect of Slovenian

2457	      sl-nedis (Nadiza dialect of Slovenian)

2459	   Language-Region-Variant:

2461	      de-CH-1901 (German as used in Switzerland using the 1901 variant
2462	      [othography])

2464	      sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)

2466	   Language-Script-Region-Variant:

2468	      sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the
2469	      Latin script as used in Italy.  Note that this tag is NOT
2470	      RECOMMENDED because subtag 'sl' has a Suppress-Script value of
2471	      'Latn')

2473	   Language-Region:

2475	      de-DE (German for Germany)

2477	      en-US (English as used in the United States)

2479	      es-419 (Spanish appropriate for the Latin America and Caribbean
2480	      region using the UN region code)

2482	   Private-use subtags:

2484	      de-CH-x-phonebk

2486	      az-Arab-x-AZE-derbend

2488	   Extended language subtags (examples ONLY: extended languages MUST be
2489	   defined by revision or update to this document):

2491	      zh-min

2493	      zh-min-nan-Hant-CN

2495	   Private-use registry values:

2497	      x-whatever (private use using the singleton 'x')

2499	      qaa-Qaaa-QM-x-southern (all private tags)

2501	      de-Qaaa (German, with a private script)

2503	      sr-Latn-QM (Serbian, Latin-script, private region)

2505	      sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro)

2507	   Tags that use extensions (examples ONLY: extensions MUST be defined
2508	   by revision or update to this document or by RFC):

2510	      en-US-u-islamCal

2512	      zh-CN-a-myExt-x-private
2513	      en-a-myExt-b-another

2515	   Some Invalid Tags:

2517	      de-419-DE (two region tags)

2519	      a-DE (use of a single character subtag in primary position; note
2520	      that there are a few grandfathered tags that start with "i-" that
2521	      are valid)

2523	      ar-a-aaa-b-bbb-a-ccc (two extensions with same single letter
2524	      prefix)

2526	Intellectual Property Statement

2528	   The IETF takes no position regarding the validity or scope of any
2529	   Intellectual Property Rights or other rights that might be claimed to
2530	   pertain to the implementation or use of the technology described in
2531	   this document or the extent to which any license under such rights
2532	   might or might not be available; nor does it represent that it has
2533	   made any independent effort to identify any such rights.  Information
2534	   on the procedures with respect to rights in RFC documents can be
2535	   found in BCP 78 and BCP 79.

2537	   Copies of IPR disclosures made to the IETF Secretariat and any
2538	   assurances of licenses to be made available, or the result of an
2539	   attempt made to obtain a general license or permission for the use of
2540	   such proprietary rights by implementers or users of this
2541	   specification can be obtained from the IETF on-line IPR repository at
2542	   http://www.ietf.org/ipr.

2544	   The IETF invites any interested party to bring to its attention any
2545	   copyrights, patents or patent applications, or other proprietary
2546	   rights that may cover technology that may be required to implement
2547	   this standard.  Please address the information to the IETF at
2548	   ietf-ipr@ietf.org.

2550	Disclaimer of Validity

2552	   This document and the information contained herein are provided on an
2553	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2554	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
2555	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
2556	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
2557	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2558	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2560	Copyright Statement

2562	   Copyright (C) The Internet Society (2005).  This document is subject
2563	   to the rights, licenses and restrictions contained in BCP 78, and
2564	   except as set forth therein, the authors retain all their rights.

2566	Acknowledgment

2568	   Funding for the RFC Editor function is currently provided by the
2569	   Internet Society.