idnits 2.17.1 

draft-ietf-ltru-registry-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 2592.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2569.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2576.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2582.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 785 has weird spacing: '...al line  of AS...'

  == Line 787 has weird spacing: '...portion  of...'

  == Line 788 has weird spacing: '...   this  conce...'

  == Line 1247 has weird spacing: '...subtags  of ty...'

  == Line 1320 has weird spacing: '...ssarily  repre...'

  == (10 more instances...)

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The tags and their subtags, including private use and extensions,
     are to be treated as case insensitive: there exist conventions for the
     capitalization of some of the subtags, but these MUST not be taken to
     carry meaning.

  == The expression 'MAY NOT', while looking like RFC 2119 requirements text,
     is not defined in RFC 2119, and should not be used.  Consider using 'MUST
     NOT' instead (if that is what you mean).
     
     Found 'MAY NOT' in this paragraph:
     
     Note that 'Preferred-Value' mappings in records of type 'region'
     MAY NOT represent exactly the same meaning as the original value.  There
     are many reasons for a country code to be changed and the effect this has
     on the formation of language tags will depend on the nature of the change
     in question.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 11, 2005) is 6862 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC1766' is defined on line 2399, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO15924'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  ** Obsolete normative reference: RFC 2028 (Obsoleted by RFC 9281)

  ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226)

  ** Downref: Normative reference to an Informational RFC: RFC 2781

  ** Downref: Normative reference to an Informational RFC: RFC 2860

  -- Obsolete informational reference (is this intentional?): RFC 1766
     (Obsoleted by RFC 3066, RFC 3282)

  -- Obsolete informational reference (is this intentional?): RFC 3066
     (Obsoleted by RFC 4646, RFC 4647)


     Summary: 7 errors (**), 0 flaws (~~), 12 warnings (==), 14 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                   A. Phillips, Ed.
3	Internet-Draft                                            Quest Software
4	Expires: January 12, 2006                                  M. Davis, Ed.
5	                                                                     IBM
6	                                                           July 11, 2005

8	                     Tags for Identifying Languages
9	                      draft-ietf-ltru-registry-09

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on January 12, 2006.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2005).

40	Abstract

42	   This document describes the structure, content, construction, and
43	   semantics of language tags for use in cases where it is desirable to
44	   indicate the language used in an information object.  It also
45	   describes how to register values for use in language tags and the
46	   creation of user defined extensions for private interchange.

48	Table of Contents

50	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
51	   2.  The Language Tag . . . . . . . . . . . . . . . . . . . . . . .  4
52	     2.1   Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .  4
53	     2.2   Language Subtag Sources and Interpretation . . . . . . . .  6
54	       2.2.1   Primary Language Subtag  . . . . . . . . . . . . . . .  7
55	       2.2.2   Extended Language Subtags  . . . . . . . . . . . . . .  9
56	       2.2.3   Script Subtag  . . . . . . . . . . . . . . . . . . . . 10
57	       2.2.4   Region Subtag  . . . . . . . . . . . . . . . . . . . . 11
58	       2.2.5   Variant Subtags  . . . . . . . . . . . . . . . . . . . 13
59	       2.2.6   Extension Subtags  . . . . . . . . . . . . . . . . . . 14
60	       2.2.7   Private Use Subtags  . . . . . . . . . . . . . . . . . 15
61	       2.2.8   Pre-Existing RFC 3066 Registrations  . . . . . . . . . 16
62	       2.2.9   Classes of Conformance . . . . . . . . . . . . . . . . 16
63	   3.  Registry Format and Maintenance  . . . . . . . . . . . . . . . 18
64	     3.1   Format of the IANA Language Subtag Registry  . . . . . . . 18
65	     3.2   Maintenance of the Registry  . . . . . . . . . . . . . . . 23
66	     3.3   Stability of IANA Registry Entries . . . . . . . . . . . . 25
67	     3.4   Registration Procedure for Subtags . . . . . . . . . . . . 28
68	     3.5   Possibilities for Registration . . . . . . . . . . . . . . 31
69	     3.6   Extensions and Extensions Namespace  . . . . . . . . . . . 33
70	     3.7   Initialization of the Registry . . . . . . . . . . . . . . 36
71	   4.  Formation and Processing of Language Tags  . . . . . . . . . . 37
72	     4.1   Choice of Language Tag . . . . . . . . . . . . . . . . . . 37
73	     4.2   Meaning of the Language Tag  . . . . . . . . . . . . . . . 39
74	     4.3   Length Considerations  . . . . . . . . . . . . . . . . . . 40
75	       4.3.1   Working with Limited Buffer Sizes  . . . . . . . . . . 40
76	       4.3.2   Truncation of Language Tags  . . . . . . . . . . . . . 42
77	     4.4   Canonicalization of Language Tags  . . . . . . . . . . . . 42
78	     4.5   Considerations for Private Use Subtags . . . . . . . . . . 44
79	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 46
80	     5.1   Language Subtag Registry . . . . . . . . . . . . . . . . . 46
81	     5.2   Extensions Registry  . . . . . . . . . . . . . . . . . . . 47
82	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 48
83	   7.  Character Set Considerations . . . . . . . . . . . . . . . . . 49
84	   8.  Changes from RFC 3066  . . . . . . . . . . . . . . . . . . . . 50
85	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 54
86	     9.1   Normative References . . . . . . . . . . . . . . . . . . . 54
87	     9.2   Informative References . . . . . . . . . . . . . . . . . . 55
88	       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 56
89	   A.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 57
90	   B.  Examples of Language Tags (Informative)  . . . . . . . . . . . 58
91	       Intellectual Property and Copyright Statements . . . . . . . . 61

93	1.  Introduction

95	   Human beings on our planet have, past and present, used a number of
96	   languages.  There are many reasons why one would want to identify the
97	   language used when presenting or requesting information.

99	   User's language preferences often need to be identified so that
100	   appropriate processing can be applied.  For example, the user's
101	   language preferences in a Web browser can be used to select Web pages
102	   appropriately.  Language preferences can also be used to select among
103	   tools (such as dictionaries) to assist in the processing or
104	   understanding of content in different languages.

106	   In addition, knowledge about the particular language used by some
107	   piece of information content might be useful or even required by some
108	   types of processing; for example spell-checking, computer-synthesized
109	   speech, Braille transcription, or high-quality print renderings.

111	   One means of indicating the language used is by labeling the
112	   information content with an identifier or "tag".  These tags can be
113	   used to specify user preferences when selecting information content,
114	   or for labeling additional attributes of content and associated
115	   resources.

117	   Tags can also be used to indicate additional language attributes of
118	   content.  For example, indicating specific information about the
119	   dialect, writing system, or orthography used in a document or
120	   resource may enable the user to obtain information in a form that
121	   they can understand, or important in processing or rendering the
122	   given content into an appropriate form or style.

124	   This document specifies a particular identifier mechanism (the
125	   language tag) and a registration function for values to be used to
126	   form tags.  It also defines a mechanism for private use values and
127	   future extension.

129	   This document replaces RFC 3066, which replaced RFC 1766.  For a list
130	   of changes in this document, see Section 8.

132	   The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
133	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
134	   document are to be interpreted as described in [RFC2119].

136	2.  The Language Tag

138	   The language tag always defines a language as used (which includes
139	   being spoken, written, signed, or otherwise signaled) by human beings
140	   for communication of information to other human beings.  Computer
141	   languages such as programming languages are explicitly excluded.

143	2.1  Syntax

145	   The language tag is composed of one or more parts or "subtags".  Each
146	   subtag consists of a sequence of alpha-numeric characters.  Subtags
147	   are distinguished and separated from one another by a hyphen ("-",
148	   ABNF %x2D).  A language tag consists of a "primary language" subtag
149	   and a (possibly empty) series of subsequent subtags, each of which
150	   refines or narrows the range of language identified by the overall
151	   tag.

153	   Each type of subtag is distinguished by length, position in the tag,
154	   and content: subtags can be recognized solely by these features.
155	   This makes it possible to construct a parser that can extract and
156	   assign some semantic information to the subtags, even if the specific
157	   subtag values are not recognized.  Thus a parser need not have an up-
158	   to-date copy (or any copy at all) of the subtag registry to perform
159	   most searching and matching operations.

161	   The syntax of the language tag in ABNF [RFC2234bis] is:

163	   Language-Tag = (lang
164	                   *3("-" extlang)
165	                   ["-" script]
166	                   ["-" region]
167	                   *("-" variant)
168	                   *("-" extension)
169	                   ["-" privateuse])
170	                   / privateuse         ; private use tag
171	                   / grandfathered      ; grandfathered registrations

173	   lang            = 2*4ALPHA           ; shortest ISO 639 code
174	                   / registered-lang
175	   extlang         = 3ALPHA             ; reserved for future use
176	   script          = 4ALPHA             ; ISO 15924 code
177	   region          = 2ALPHA             ; ISO 3166 code
178	                   / 3DIGIT             ; UN country number
179	   variant         =  5*8alphanum       ; registered variants
180	                   / ( DIGIT 3alphanum )
181	   extension       = singleton 1*("-" (2*8alphanum))
182	   privateuse      = ("x"/"X") 1*("-" (1*8alphanum))
183	   singleton       = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT
184	                   ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9"
185	                   ; Single letters: x/X is reserved for private use
186	   registered-lang = 4*8ALPHA          ; registered language subtag
187	   grandfathered   = 1*3ALPHA 1*2("-" (2*8alphanum))
188	                                       ; grandfathered registration
189	                                       ; Note: i is the only singleton
190	                                       ; that starts a grandfathered tag
191	   alphanum        = (ALPHA / DIGIT)   ; letters and numbers

193	                        Figure 1: Language Tag ABNF

195	   Note: There is a subtlety in the ABNF for 'variant': variants
196	   starting with a digit MAY be four characters long, while those
197	   starting with a letter MUST be at least five characters long.

199	   All subtags have a maximum length of eight characters and whitespace
200	   is not permitted in a language tag.  For examples of language tags,
201	   see Appendix B.

203	   Note that although [RFC2234bis] refers to octets, the language tags
204	   described in this document are sequences of characters from the US-
205	   ASCII repertoire.  Language tags MAY be used in documents and
206	   applications that use other encodings, so long as these encompass the
207	   US-ASCII repertoire.  An example of this would be an XML document
208	   that uses the UTF-16LE [RFC2781] encoding of [Unicode].

210	   The tags and their subtags, including private use and extensions, are
211	   to be treated as case insensitive: there exist conventions for the
212	   capitalization of some of the subtags, but these MUST not be taken to
213	   carry meaning.

215	   For example:

217	   o  [ISO639-1] recommends that language codes be written in lower case
218	      ('mn' Mongolian).

220	   o  [ISO3166] recommends that country codes be capitalized ('MN'
221	      Mongolia).

223	   o  [ISO15924] recommends that script codes use lower case with the
224	      initial letter capitalized ('Cyrl' Cyrillic).

226	   However, in the tags defined by this document, the uppercase US-ASCII
227	   letters in the range 'A' through 'Z' are considered equivalent and
228	   mapped directly to their US-ASCII lowercase equivalents in the range
229	   'a' through 'z'.  Thus the tag "mn-Cyrl-MN" is not distinct from "MN-
230	   cYRL-mn" or "mN-cYrL-Mn" (or any other combination) and each of these
231	   variations conveys the same meaning: Mongolian written in the
232	   Cyrillic script as used in Mongolia.

234	2.2  Language Subtag Sources and Interpretation

236	   The namespace of language tags and their subtags is administered by
237	   the Internet Assigned Numbers Authority (IANA) [RFC2860] according to
238	   the rules in Section 5 of this document.  The registry maintained by
239	   IANA is the source for valid subtags: other standards referenced in
240	   this section provide the source material for that registry.

242	   Terminology in this section:

244	   o  Tag or tags refers to a complete language tag, such as
245	      "fr-Latn-CA".  Examples of tags in this document are enclosed in
246	      double-quotes ("en-US").

248	   o  Subtag refers to a specific section of a tag, delimited by hyphen,
249	      such as the subtag 'Latn' in "fr-Latn-CA".  Examples of subtags in
250	      this document are enclosed in single quotes ('Latn').

252	   o  Code or codes refers to values defined in external standards (and
253	      which are used as subtags in this document).  For example, 'Latn'
254	      is an [ISO15924] script code which was used to define the 'Latn'
255	      script subtag for use in a language tag.  Examples of codes in
256	      this document are enclosed in single quotes ('en', 'Latn').

258	   The definitions in this section apply to the various subtags within
259	   the language tags defined by this document, excepting those
260	   "grandfathered" tags defined in Section 2.2.8.

262	   Language tags are designed so that each subtag type has unique length
263	   and content restrictions.  These make identification of the subtag's
264	   type possible, even if the content of the subtag itself is
265	   unrecognized.  This allows tags to be parsed and processed without
266	   reference to the latest version of the underlying standards or the
267	   IANA registry and makes the associated exception handling when
268	   parsing tags simpler.

270	   Subtags in the IANA registry that do not come from an underlying
271	   standard can only appear in specific positions in a tag.
272	   Specifically, they can only occur as primary language subtags or as
273	   variant subtags.

275	   Note that sequences of private use and extension subtags MUST occur
276	   at the end of the sequence of subtags and MUST NOT be interspersed
277	   with subtags defined elsewhere in this document.

279	   Single letter and digit subtags are reserved for current or future
280	   use.  These include the following current uses:

282	   o  The single letter subtag 'x' is reserved to introduce a sequence
283	      of private use subtags.  The interpretation of any private use
284	      subtags is defined solely by private agreement and is not defined
285	      by the rules in this section or in any standard or registry
286	      defined in this document.

288	   o  All other single letter subtags are reserved to introduce
289	      standardized extension subtag sequences as described in
290	      Section 3.6.

292	   The single letter subtag 'i' is used by some grandfathered tags, such
293	   as "i-enochian", where it always appears in the first position and
294	   cannot be confused with an extension.

296	2.2.1  Primary Language Subtag

298	   The primary language subtag is the first subtag in a language tag
299	   (with the exception of private use and certain grandfathered tags)
300	   and cannot be omitted.  The following rules apply to the primary
301	   language subtag:

303	   1.  All two character language subtags were defined in the IANA
304	       registry according to the assignments found in the standard ISO
305	       639 Part 1, "ISO 639-1:2002, Codes for the representation of
306	       names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using
307	       assignments subsequently made by the ISO 639 Part 1 maintenance
308	       agency or governing standardization bodies.

310	   2.  All three character language subtags were defined in the IANA
311	       registry according to the assignments found in ISO 639 Part 2,
312	       "ISO 639-2:1998 - Codes for the representation of names of
313	       languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2], or
314	       assignments subsequently made by the ISO 639 Part 2 maintenance
315	       agency or governing standardization bodies.

317	   3.  The subtags in the range 'qaa' through 'qtz' are reserved for
318	       private use in language tags.  These subtags correspond to codes
319	       reserved by ISO 639-2 for private use.  These codes MAY be used
320	       for non-registered primary-language subtags (instead of using
321	       private use subtags following 'x-').  Please refer to Section 4.5
322	       for more information on private use subtags.

324	   4.  All four character language subtags are reserved for possible
325	       future standardization.

327	   5.  All language subtags of 5 to 8 characters in length in the IANA
328	       registry were defined via the registration process in Section 3.4
329	       and MAY be used to form the primary language subtag.  At the time
330	       this document was created, there were no examples of this kind of
331	       subtag and future registrations of this type will be discouraged:
332	       primary languages are strongly RECOMMENDED for registration with
333	       ISO 639 and proposals rejected by ISO 639/RA will be closely
334	       scrutinized before they are registered with IANA.

336	   6.  The single character subtag 'x' as the primary subtag indicates
337	       that the language tag consists solely of subtags whose meaning is
338	       defined by private agreement.  For example, in the tag "x-fr-CH",
339	       the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the
340	       French language or the country of Switzerland (or any other value
341	       in the IANA registry) unless there is a private agreement in
342	       place to do so.  See Section 4.5.

344	   7.  The single character subtag 'i' is used by some grandfathered
345	       tags (see Section 2.2.8) such as "i-klingon" and "i-bnn".  (Other
346	       grandfathered tags have a primary language subtag in their first
347	       position)

349	   8.  Other values MUST NOT be assigned to the primary subtag except by
350	       revision or update of this document.

352	   Note: For languages that have both an ISO 639-1 two character code
353	   and an ISO 639-2 three character code, only the ISO 639-1 two
354	   character code is defined in the IANA registry.

356	   Note: For languages that have no ISO 639-1 two character code and for
357	   which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
358	   (Bibliographic) codes differ, only the Terminology code is defined in
359	   the IANA registry.  At the time this document was created, all
360	   languages that had both kinds of three character code were also
361	   assigned a two character code; it is not expected that future
362	   assignments of this nature will occur.

364	   Note: To avoid problems with versioning and subtag choice as
365	   experienced during the transition between RFC 1766 and RFC 3066, as
366	   well as the canonical nature of subtags defined by this document, the
367	   ISO 639 Registration Authority Joint Advisory Committee (ISO 639/
368	   RA-JAC) has included the following statement in [iso639.principles]:

370	   "A language code already in ISO 639-2 at the point of freezing ISO
371	   639-1 shall not later be added to ISO 639-1.  This is to ensure
372	   consistency in usage over time, since users are directed in Internet
373	   applications to employ the alpha-3 code when an alpha-2 code for that
374	   language is not available."

376	   In order to avoid instability of the canonical form of tags, if a two
377	   character code is added to ISO 639-1 for a language for which a three
378	   character code was already included in ISO 639-2, the two character
379	   code will not be added as a subtag in the registry.  See Section 3.3.

381	   For example, if some content were tagged with 'haw' (Hawaiian), which
382	   currently has no two character code, the tag would not be invalidated
383	   if ISO 639-1 were to assign a two character code to the Hawaiian
384	   language at a later date.

386	   For example, one of the grandfathered IANA registrations is
387	   "i-enochian".  The subtag 'enochian' could be registered in the IANA
388	   registry as a primary language subtag (assuming that ISO 639 does not
389	   register this language first), making tags such as "enochian-AQ" and
390	   "enochian-Latn" valid.

392	2.2.2  Extended Language Subtags

394	   The following rules apply to the extended language subtags:

396	   1.  Three letter subtags immediately following the primary subtag are
397	       reserved for future standardization, anticipating work that is
398	       currently under way on ISO 639.

400	   2.  Extended language subtags MUST follow the primary subtag and
401	       precede any other subtags.

403	   3.  There MAY be up to three extended language subtags.

405	   4.  Extended language subtags MUST NOT be registered or used to form
406	       language tags.  Their syntax is described here so that
407	       implementations can be compatible with any future revision of
408	       this document which does provide for their registration.

410	   Extended language subtag records, once they appear in the registry,
411	   MUST include exactly one 'Prefix' field indicating an appropriate
412	   language subtag or sequence of subtags that MUST always appear as a
413	   prefix to the extended language subtag.

415	   Example: In a future revision or update of this document, the tag
416	   "zh-gan" (registered under RFC 3066) might become a valid non-
417	   grandfathered (that is, redundant) tag in which the subtag 'gan'
418	   might represent the Chinese dialect 'Gan'.

420	2.2.3  Script Subtag

422	   Script subtags are used to indicate the script or writing system
423	   variations that distinguish the written forms of a language or its
424	   dialects.  The following rules apply to the script subtags:

426	   1.  All four character subtags were defined according to
427	       [ISO15924]--"Codes for the representation of the names of
428	       scripts": alpha-4 script codes, or subsequently assigned by the
429	       ISO 15924 maintenance agency or governing standardization bodies,
430	       denoting the script or writing system used in conjunction with
431	       this language.

433	   2.  Script subtags MUST immediately follow the primary language
434	       subtag and all extended language subtags and MUST occur before
435	       any other type of subtag described below.

437	   3.  The script subtags 'Qaaa' through 'Qabx' are reserved for private
438	       use in language tags.  These subtags correspond to codes reserved
439	       by ISO 15924 for private use.  These codes MAY be used for non-
440	       registered script values.  Please refer to Section 4.5 for more
441	       information on private use subtags.

443	   4.  Script subtags cannot be registered using the process in
444	       Section 3.4 of this document.  Variant subtags MAY be considered
445	       for registration for that purpose.

447	   5.  There MUST be at most one script subtag in a language tag and the
448	       script subtag SHOULD be omitted when it adds no distinguishing
449	       value to the tag or when the primary language subtag's record
450	       includes a Suppress-Script field listing the applicable script
451	       subtag.

453	   Example: "sr-Latn" represents Serbian written using the Latin script.

455	2.2.4  Region Subtag

457	   Region subtags are used to indicate linguistic variations associated
458	   with or appropriate to a specific country, territory, or region.
459	   Typically, a region subtag is used to indicate regional dialects or
460	   usage, or region-specific spelling conventions.  A region subtag can
461	   also be used to indicate that content is expressed in a way that is
462	   appropriate for use throughout a region; for instance, Spanish
463	   content tailored to be useful throughout Latin America.

465	   The following rules apply to the region subtags:

467	   1.  Region subtags MUST follow any language, extended language, or
468	       script subtags and MUST precede all other subtags.

470	   2.  All two character subtags following the primary subtag were
471	       defined in the IANA registry according to the assignments found
472	       in [ISO3166]--"Codes for the representation of names of countries
473	       and their subdivisions - Part 1: Country codes"--alpha-2 country
474	       codes or assignments subsequently made by the ISO 3166
475	       maintenance agency or governing standardization bodies.

477	   3.  All three character subtags consisting of digit (numeric)
478	       characters following the primary subtag were defined in the IANA
479	       registry according to the assignments found in UN Standard
480	       Country or Area Codes for Statistical  Use [UN_M.49] or
481	       assignments subsequently made by the governing standards body.
482	       Note that not all of the UN M.49 codes are defined in the IANA
483	       registry.  The following rules define which codes are entered
484	       into the registry as valid subtags:

486	       A.  UN numeric codes assigned to 'macro-geographical
487	           (continental)' or sub-regions MUST be registered in the
488	           registry.  These codes are not associated with an assigned
489	           ISO 3166 alpha-2 code and represent supra-national areas,
490	           usually covering more than one nation, state, province, or
491	           territory.

493	       B.  UN numeric codes for 'economic groupings' or 'other
494	           groupings' MUST NOT be registered in the IANA registry and
495	           MUST NOT be used to form language tags.

497	       C.  UN numeric codes for countries or areas with ambiguous ISO
498	           3166 alpha-2 codes, when entered into the registry, MUST be
499	           defined according to the rules in Section 3.3 and MUST be
500	           used to form language tags that represent the country or
501	           region for which they are defined.

503	       D.  UN numeric codes for countries or areas for which there is an
504	           associated ISO 3166 alpha-2 code in the registry MUST NOT be
505	           entered into the registry and MUST NOT be used to form
506	           language tags.  Note that the ISO 3166-based subtag in the
507	           registry MUST actually be associated with the UN M.49 code in
508	           question.

510	       E.  UN numeric codes and ISO 3166 alpha-2 codes for countries or
511	           areas listed as eligible for registration in [initial-
512	           registry] but not presently registered MAY be entered into
513	           the IANA registry via the process described in Section 3.4.
514	           Once registered, these codes MAY be used to form language
515	           tags.

517	       F.  All other UN numeric codes for countries or areas which do
518	           not have an associated ISO 3166 alpha-2 code MUST NOT be
519	           entered into the registry and MUST NOT be used to form
520	           language tags.  For more information about these codes, see
521	           Section 3.3.

523	   4.  Note: The alphanumeric codes in Appendix X of the UN document
524	       MUST NOT be entered into the registry and MUST NOT be used to
525	       form language tags.  (At the time this document was created these
526	       values match the ISO 3166 alpha-2 codes.)

528	   5.  There MUST be at most one region subtag in a language tag and the
529	       region subtag MAY be omitted, as when it adds no distinguishing
530	       value to the tag.

532	   6.  The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
533	       reserved for private use in language tags.  These subtags
534	       correspond to codes reserved by ISO 3166 for private use.  These
535	       codes MAY be used for private use region subtags (instead of
536	       using a private use subtag sequence).  Please refer to
537	       Section 4.5 for more information on private use subtags.

539	   "de-CH" represents German ('de') as used in Switzerland ('CH').

541	   "sr-Latn-CS" represents Serbian ('sr') written using Latin script
542	   ('Latn') as used in Serbia and Montenegro ('CS').

544	   "es-419" represents Spanish ('es') appropriate to the UN-defined
545	   Latin America and Caribbean region ('419').

547	2.2.5  Variant Subtags

549	   Variant subtags are used to indicate additional, well-recognized
550	   variations that define a language or its dialects which are not
551	   covered by other available subtags.  The following rules apply to the
552	   variant subtags:

554	   1.  Variant subtags are not associated with any external standard.
555	       Variant subtags and their meanings are defined by the
556	       registration process defined in Section 3.4.

558	   2.  Variant subtags MUST follow all of the other defined subtags, but
559	       precede any extension or private use subtag sequences.

561	   3.  More than one variant MAY be used to form the language tag.

563	   4.  Variant subtags MUST be registered with IANA according to the
564	       rules in Section 3.4 of this document before being used to form
565	       language tags.  In order to distinguish variants from other types
566	       of subtags, registrations MUST meet the following length and
567	       content restrictions:

569	       1.  Variant subtags that begin with a letter (a-z, A-Z) MUST be
570	           at least five characters long.

572	       2.  Variant subtags that begin with a digit (0-9) MUST be at
573	           least four characters long.

575	   Variant subtag records in the language subtag registry MAY include
576	   one or more 'Prefix' fields, which indicates the language tag or tags
577	   that would make a suitable prefix (with other subtags, as
578	   appropriate) in forming a language tag with the variant.  For
579	   example, the subtag 'nedis' has a Prefix of "sl", making it suitable
580	   to form language tags such as "sl-nedis" and "sl-IT-nedis", but not
581	   suitable for use in a tag such as "zh-nedis" or "it-IT-nedis".

583	   "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian.

585	   "de-CH-1996" represents German as used in Switzerland and as written
586	   using the spelling reform beginning in the year 1996 C.E.

588	   Most variants that share a prefix are mutually exclusive.  For
589	   example, the German orthographic variations '1996' and '1901' SHOULD
590	   NOT be used in the same tag, as they represent the dates of different
591	   spelling reforms.  A variant that can meaningfully be used in
592	   combination with another variant SHOULD include a 'Prefix' field in
593	   its registry record that lists that other variant.  For example, if
594	   another German variant 'example' were created that made sense to use
595	   with '1996', then 'example' should include two Prefix fields: "de"
596	   and "de-1996".

598	2.2.6  Extension Subtags

600	   Extensions provide a mechanism for extending language tags for use in
601	   various applications.  See: Section 3.6.  The following rules apply
602	   to extensions:

604	   1.   Extension subtags are separated from the other subtags defined
605	        in this document by a single-letter subtag ("singleton").  The
606	        singleton MUST be one allocated to a registration authority via
607	        the mechanism described in Section 3.6 and cannot be the letter
608	        'x', which is reserved for private use subtag sequences.

610	   2.   Note: Private use subtag sequences starting with the singleton
611	        subtag 'x' are described below.

613	   3.   An extension MUST follow at least a primary language subtag.
614	        That is, a language tag cannot begin with an extension.
615	        Extensions extend language tags, they do not override or replace
616	        them.  For example, "a-value" is not a well-formed language tag,
617	        while "de-a-value" is.

619	   4.   Each singleton subtag MUST appear at most one time in each tag
620	        (other than as a private use subtag).  That is, singleton
621	        subtags MUST NOT be repeated.  For example, the tag "en-a-bbb-a-
622	        ccc" is invalid because the subtag 'a' appears twice.  Note that
623	        the tag "en-a-bbb-x-a-ccc" is valid because the second
624	        appearance of the singleton 'a' is in a private use sequence.

626	   5.   Extension subtags MUST meet all of the requirements for the
627	        content and format of subtags defined in this document.

629	   6.   Extension subtags MUST meet whatever requirements are set by the
630	        document that defines their singleton prefix and whatever
631	        requirements are provided by the maintaining authority.

633	   7.   Each extension subtag MUST be from two to eight characters long
634	        and consist solely of letters or digits, with each subtag
635	        separated by a single '-'.

637	   8.   Each singleton MUST be followed by at least one extension
638	        subtag.  For example, the tag "tlh-a-b-foo" is invalid because
639	        the first singleton 'a' is followed immediately by another
640	        singleton 'b'.

642	   9.   Extension subtags MUST follow all language, extended language,
643	        script, region and variant subtags in a tag.

645	   10.  All subtags following the singleton and before another singleton
646	        are part of the extension.  Example: In the tag "fr-a-Latn", the
647	        subtag 'Latn' does not represent the script subtag 'Latn'
648	        defined in the IANA Language Subtag Registry.  Its meaning is
649	        defined by the extension 'a'.

651	   11.  In the event that more than one extension appears in a single
652	        tag, the tag SHOULD be canonicalized as described in
653	        Section 4.4.

655	   For example, if the prefix singleton 'r' and the shown subtags were
656	   defined, then the following tag would be a valid example: "en-Latn-
657	   GB-boont-r-extended-sequence-x-private"

659	2.2.7  Private Use Subtags

661	   Private use subtags are used to indicate distinctions in language
662	   important in a given context by private agreement.  The following
663	   rules apply to private use subtags:

665	   1.  Private use subtags are separated from the other subtags defined
666	       in this document by the reserved single-character subtag 'x'.

668	   2.  Private use subtags MUST conform to the format and content
669	       constraints defined in the ABNF for all subtags.

671	   3.  Private use subtags MUST follow all language, extended language,
672	       script, region, variant, and extension subtags in the tag.
673	       Another way of saying this is that all subtags following the
674	       singleton 'x' MUST be considered private use.  Example: The
675	       subtag 'US' in the tag "en-x-US" is a private use subtag.

677	   4.  A tag MAY consist entirely of private use subtags.

679	   5.  No source is defined for private use subtags.  Use of private use
680	       subtags is by private agreement only.

682	   6.  Private use subtags are NOT RECOMMENDED where alternatives exist
683	       or for general interchange.  See Section 4.5 for more information
684	       on private use subtag choice.

686	   For example: Users who wished to utilize codes from the Ethnologue
687	   publication of SIL International for language identification might
688	   agree to exchange tags such as "az-Arab-x-AZE-derbend".  This example
689	   contains two private use subtags.  The first is 'AZE' and the second
690	   is 'derbend'.

692	2.2.8  Pre-Existing RFC 3066 Registrations

694	   Existing IANA-registered language tags from RFC 1766 and/or RFC 3066
695	   maintain their validity.  IANA will maintain these tags in the
696	   registry under either the "grandfathered" or "redundant" type.  For
697	   more information see Section 3.7.

699	   It is important to note that all language tags formed under the
700	   guidelines in this document were either legal, well-formed tags or
701	   could have been registered under RFC 3066.

703	2.2.9  Classes of Conformance

705	   Implementations sometimes need to describe their capabilities with
706	   regard to the rules and practices described in this document.  There
707	   are two classes of conforming implementations described by this
708	   document: "well-formed" processors and "validating" processors.
709	   Claims of conformance SHOULD explicitly reference one of these
710	   definitions.

712	   An implementation that claims to check for well-formed language tags
713	   MUST:

715	   o  Check that the tag and all of its subtags, including extension and
716	      private use subtags, conform to the ABNF or that the tag is on the
717	      list of grandfathered tags.

719	   o  Check that singleton subtags that identify extensions do not
720	      repeat.  For example, the tag "en-a-xx-b-yy-a-zz" is not well-
721	      formed.

723	   Well-formed processors are strongly encouraged to implement the
724	   canonicalization rules contained in Section 4.4.

726	   An implementation that claims to be validating MUST:

728	   o  Check that the tag is well-formed.

730	   o  Specify the particular registry date for which the implementation
731	      performs validation of subtags.

733	   o  Check that either the tag is a grandfathered tag, or that all
734	      language, script, region, and variant subtags consist of valid
735	      codes for use in language tags according to the IANA registry as
736	      of the particular date specified by the implementation.

738	   o  Specify which, if any, extension RFCs as defined in Section 3.6
739	      are supported, including version, revision, and date.

741	   o  For any such extensions supported, check that all subtags used in
742	      that extension are valid.

744	   o  For variant and extended language subtags, if the registry
745	      contains one or more 'Prefix' fields for that subtag, check that
746	      the tag matches at least one prefix.  The tag matches if all the
747	      subtags in the 'Prefix' also appear in the tag.  For example, the
748	      prefix "es-CO" matches the tag "es-Latn-CO-x-private" because both
749	      the 'es' language subtag and 'CO' region subtag appear in the tag.

751	3.  Registry Format and Maintenance

753	   This section defines the Language Subtag Registry and the maintenance
754	   and update procedures associated with it.

756	   The language subtag registry will be maintained so that, except for
757	   extension subtags, it is possible to validate all of the subtags that
758	   appear in a language tag under the provisions of this document or its
759	   revisions or successors.  In addition, the meaning of the various
760	   subtags will be unambiguous and stable over time.  (The meaning of
761	   private use subtags, of course, is not defined by the IANA registry.)

763	   The registry defined under this document contains a comprehensive
764	   list of all of the subtags valid in language tags.  This allows
765	   implementers a straightforward and reliable way to validate language
766	   tags.

768	3.1  Format of the IANA Language Subtag Registry

770	   The IANA Language Subtag Registry ("the registry") will consist of a
771	   text file that is machine readable in the format described in this
772	   section, plus copies of the registration forms approved by the
773	   Language Subtag Reviewer in accordance with the process described in
774	   Section 3.4.  With the exception of the registration forms for
775	   grandfathered and redundant tags, no registration records will be
776	   maintained for the initial set of subtags.

778	   The registry will be in a modified record-jar format text file
779	   [record-jar].  Lines are limited to 72 characters, including all
780	   whitespace.

782	   Records are separated by lines containing only the sequence "%%"
783	   (%x25.25).

785	   Each field can be viewed as a single, logical line  of ASCII
786	   characters,  comprising a field-name and a field-body separated by a
787	   COLON character (%x3A).  For convenience, the field-body portion  of
788	   this  conceptual entity  can be split into a multiple-line
789	   representation; this is called "folding".  The format of the registry
790	   is described by the following ABNF (per [RFC2234bis]):

792	   registry   = record *("%%" CRLF record)
793	   record     = 1*( field-name *SP ":" *SP field-body CRLF )
794	   field-name = *(ALPHA / DIGIT / "-")
795	   field-body = *(ASCCHAR/LWSP)
796	   ASCCHAR    = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26
797	   UNICHAR    = "&#x" 2*6HEXDIG ";"
798	   The sequence '..' (%x2E.2E) in a field-body denotes a range of
799	   values.  Such a range represents all subtags of the same length that
800	   are alphabetically within that range, including the values explicitly
801	   mentioned.  For example 'a..c' denotes the values 'a', 'b', and 'c'.

803	   Characters from outside the US-ASCII repertoire, as well as the
804	   AMPERSAND character ("&", %x26) when it occurs in a field-body are
805	   represented by a "Numeric Character Reference" using hexadecimal
806	   notation in the style used by [XML10] (see
807	   <http://www.w3.org/TR/REC-xml/#dt-charref>).  This consists of the
808	   sequence "&#x" (%x26.23.78) followed by a hexadecimal representation
809	   of the character's code point in [ISO10646] followed by a closing
810	   semicolon (%x3B).  For example, the EURO SIGN, U+20AC, would be
811	   represented by the sequence "&#x20AC;".  Note that the hexadecimal
812	   notation MAY have between two and six digits.

814	   All fields whose field-body contains a date value use the "full-date"
815	   format specified in [RFC3339].  For example: "2004-06-28" represents
816	   June 28, 2004 in the Gregorian calendar.

818	   The first record in the file contains the single field whose field-
819	   name is "File-Date".  The field-body of this record contains the last
820	   modification date of this copy of the registry, making it possible to
821	   compare different versions of the registry.  The registry on the IANA
822	   website is the most current.  Versions with an older date than that
823	   one are not up-to-date.

825	   File-Date: 2004-06-28
826	   %%

828	   Subsequent records represent subtags in the registry.  Each of the
829	   fields in each record MUST occur no more than once, unless otherwise
830	   noted below.  Each record MUST contain the following fields:

832	   o  'Type'

834	      *  Type's field-value MUST consist of one of the following
835	         strings: "language", "extlang", "script", "region", "variant",
836	         "grandfathered", and "redundant" and denotes the type of tag or
837	         subtag.

839	   o  Either 'Subtag' or 'Tag'

841	      *  Subtag's field-value contains the subtag being defined.  This
842	         field MUST only appear in records of whose Type has one of
843	         these values: "language", "extlang", "script", "region", or
844	         "variant".

846	      *  Tag's field-value contains a complete language tag.  This field
847	         MUST only appear in records whose Type has one of these values:
848	         "grandfathered" or "redundant".

850	   o  Description

852	      *  Description's field-value contains a non-normative description
853	         of the subtag or tag.

855	   o  Added

857	      *  Added's field-value contains the date the record was added to
858	         the registry.

860	   The 'Subtag' or 'Tag' field MUST use lowercase letters to form the
861	   subtag or tag, with two exceptions.  Subtags whose 'Type' field is
862	   'script' (in other words, subtags defined by ISO 15924) MUST use
863	   titlecase.  Subtags whose 'Type' field is 'region' (in other words,
864	   subtags defined by ISO 3166) MUST use uppercase.  These exceptions
865	   mirror the use of case in the underlying standards.

867	   The field 'Description' MAY appear more than one time.  At least one
868	   of the  'Description' fields MUST contain a description of the tag
869	   being registered written or transcribed into the Latin script; the
870	   same or additional fields MAY also include a description in a non-
871	   Latin script.  The 'Description' field is used for identification
872	   purposes and SHOULD NOT be taken to represent the actual native name
873	   of the language or variation or to be in any particular language.
874	   Most descriptions are taken directly from source standards such as
875	   ISO 639 or ISO 3166.

877	   Note: Descriptions in registry entries that correspond to ISO 639,
878	   ISO 15924,  ISO 3166 or UN M.49 codes are intended only to indicate
879	   the meaning of that identifier as defined in the source standard at
880	   the time it was added to the registry.  The description does not
881	   replace the content of the source standard itself.  The descriptions
882	   are not intended to be the English localized names for the subtags.
883	   Localization or translation of language tag and subtag descriptions
884	   is out of scope of this document.

886	   Each record MAY also contain the following fields:

888	   o  Preferred-Value

890	      *  For fields of type 'language', 'extlang', 'script', 'region',
891	         and 'variant', 'Preferred-Value' contains a subtag of the same
892	         'Type' which is preferred for forming the language tag.

894	      *  For fields of type 'grandfathered' and 'redundant', a canonical
895	         mapping to a complete language tag.

897	   o  Deprecated

899	      *  Deprecated's field-value contains the date the record was
900	         deprecated.

902	   o  Prefix

904	      *  Prefix's field-value contains a language tag with which this
905	         subtag MAY be used to form a new language tag, perhaps with
906	         other subtags as well.  This field MUST only appear in records
907	         whose 'Type' field-value is 'variant' or 'extlang'.  For
908	         example, the 'Prefix' for the variant 'nedis' is 'sl', meaning
909	         that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate
910	         while the tag "is-nedis" is not.

912	   o  Comments

914	      *  Comments contains additional information about the subtag, as
915	         deemed appropriate for understanding the registry and
916	         implementing language tags using the subtag or tag.

918	   o  Suppress-Script

920	      *  Suppress-Script contains a script subtag that SHOULD NOT be
921	         used to form language tags with the associated primary language
922	         subtag.  This field MUST only appear in records whose 'Type'
923	         field-value is 'language'.  See Section 4.1.

925	   The field 'Deprecated' MAY be added to any record via the maintenance
926	   process described in Section 3.2 or via the registration process
927	   described in Section 3.4.  Usually the addition of a 'Deprecated'
928	   field is due to the action of one of the standards bodies, such as
929	   ISO 3166, withdrawing a code.  In some historical cases it might not
930	   have been possible to reconstruct the original deprecation date.  For
931	   these cases, an approximate date appears in the registry.  Although
932	   valid in language tags, subtags and tags with a 'Deprecated' field
933	   are deprecated and validating processors SHOULD NOT generate these
934	   subtags.  Note that a record that contains a 'Deprecated' field and
935	   no corresponding 'Preferred-Value' field has no replacement mapping.

937	   The field 'Preferred-Value' contains a mapping between the record in
938	   which it appears and a tag or subtag which SHOULD be preferred when
939	   selected language tags.  These values form three groups:

941	      ISO 639 language codes which were later withdrawn in favor of
942	      other codes.  These values are mostly a historical curiosity.

944	      ISO 3166 region codes which have been withdrawn in favor of a new
945	      code.  This sometimes happens when a country changes its name or
946	      administration in such a way that warrants a new region code.

948	      Tags grandfathered from RFC 3066.  In many cases these tags have
949	      become obsolete because the values they represent were later
950	      encoded by ISO 639.

952	   Records that contain a 'Preferred-Value' field MUST also have a
953	   'Deprecated' field.  This field contains a date of deprecation.  Thus
954	   a language tag processor can use the registry to construct the valid,
955	   non-deprecated set of subtags for a given date.  In addition, for any
956	   given tag, a processor can construct the set of valid language tags
957	   that correspond to that tag for all dates up to the date of the
958	   registry.  The ability to do these mappings MAY be beneficial to
959	   applications that are matching, selecting, for filtering content
960	   based on its language tags.

962	   Note that 'Preferred-Value' mappings in records of type 'region' MAY
963	   NOT represent exactly the same meaning as the original value.  There
964	   are many reasons for a country code to be changed and the effect this
965	   has on the formation of language tags will depend on the nature of
966	   the change in question.

968	   In particular, the 'Preferred-Value' field does not imply retagging
969	   content that uses the affected subtag.

971	   The field 'Preferred-Value' MUST NOT be modified once created in the
972	   registry.  The field MAY be added to records of type "grandfathered"
973	   and "region" according to the rules in Section 3.2.  Otherwise the
974	   field MUST NOT be added to any record already in the registry.

976	   The 'Preferred-Value' field in records of type "grandfathered" and
977	   "redundant" contains whole language tags that are strongly
978	   RECOMMENDED for use in place of the record's value.  In many cases
979	   the mappings were created by deprecation of the tags during the
980	   period before this document was adopted.  For example, the tag "no-
981	   nyn" was deprecated in favor of the ISO 639-1 defined language code
982	   'nn'.

984	   Records of type 'variant' MAY have more than one field of type
985	   'Prefix'.  Additional fields of this type MAY be added to a 'variant'
986	   record via the registration process.

988	   Records of type 'extlang' MUST have _exactly_ one 'Prefix' field.

990	   The field-value of the 'Prefix' field consists of a language tag
991	   whose subtags are appropriate to use with this subtag.  For example,
992	   the variant subtag '1996' has a Prefix field of "de".  This means
993	   that tags starting with the sequence "de-" are appropriate with this
994	   subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while
995	   the tag "fr-1996" is an inappropriate choice.

997	   The field of type 'Prefix' MUST NOT be removed from any record.  The
998	   field-value for this type of field MUST NOT be modified.

1000	   The field 'Comments' MAY appear more than once per record.  This
1001	   field MAY be inserted or changed via the registration process and no
1002	   guarantee of stability is provided.  The content of this field is not
1003	   restricted, except by the need to register the information, the
1004	   suitability of the request, and by reasonable practical size
1005	   limitations.  Long screeds about a particular subtag are frowned
1006	   upon.

1008	   The field 'Suppress-Script' MUST only appear in records whose 'Type'
1009	   field-value is 'language'.  This field MAY appear at most one time in
1010	   a record.  This field indicates a script used to write the
1011	   overwhelming majority of documents for the given language and which
1012	   therefore adds no distinguishing information to a language tag.  It
1013	   helps ensure greater compatibility between the language tags
1014	   generated according to the rules in this document and language tags
1015	   and tag processors or consumers based on RFC 3066.  For example,
1016	   virtually all Icelandic documents are written in the Latin script,
1017	   making the subtag 'Latn' redundant in the tag "is-Latn".

1019	3.2  Maintenance of the Registry

1021	   Maintenance of the registry requires that as codes are assigned or
1022	   withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language
1023	   Subtag Reviewer will evaluate each change, determine whether it
1024	   conflicts with existing registry entries, and submit the information
1025	   to IANA for inclusion in the registry.  If an change takes place and
1026	   the Language Subtag Reviewer does not do this in a timely manner,
1027	   then any interested party MAY use the procedure in Section 3.4 to
1028	   register the appropriate update.

1030	   Note: The redundant and grandfathered entries together are the
1031	   complete list of tags registered under [RFC3066].  The redundant tags
1032	   are those that can now be formed using the subtags defined in the
1033	   registry together with the rules of  Section 2.2.  The grandfathered
1034	   entries are those that can never be legal under those same
1035	   provisions.

1037	   The set of redundant and grandfathered tags is permanent and stable:

1039	   no new entries will be added and none of the entries will be removed.
1040	   Records of type 'grandfathered' MAY have their type converted to
1041	   'redundant': see  Section 3.7 for more information.

1043	   RFC 3066 tags that were deprecated prior to the adoption of this
1044	   document are part of the list of grandfathered tags and their
1045	   component subtags were not included as registered variants (although
1046	   they remain eligible for registration).  For example, the tag "art-
1047	   lojban" was deprecated in favor of the language subtag 'jbo'.

1049	   The Language Subtag Reviewer MUST ensure that new subtags meet the
1050	   requirements in Section 4.1 or submit an appropriate alternate subtag
1051	   as described in that section.  When either a change or addition to
1052	   the registry is needed, the Language Subtag Reviewer MUST prepare the
1053	   complete record, including all fields, and forward it to IANA for
1054	   insertion into the registry.

1056	   If record represents a new subtag that does not currently exist in
1057	   the registry, then the message's subject line MUST include the word
1058	   "INSERT".  If the record represents a change to an existing subtag,
1059	   then the subject line of the message MUST include the word "MODIFY".
1060	   The message MUST contain both the record for the subtag being
1061	   inserted or modified and the new File-Date record.  Here is an
1062	   example of what the body of the message might contain:

1064	   LANGUAGE SUBTAG MODIFICATION
1065	   File-Date: 2005-01-02
1066	   %%
1067	   Type: variant
1068	   Subtag: nedis
1069	   Description: Natisone dialect
1070	   Description: Nadiza dialect
1071	   Added: 2003-10-09
1072	   Prefix: sl
1073	   Comments: This is a comment shown
1074	     as an example.
1075	   %%

1077	                                 Figure 4

1079	   Whenever an entry is created or modified in the registry, the 'File-
1080	   Date' record at the start of the registry is updated to reflect the
1081	   most recent modification date in the [RFC3339] "full-date" format.

1083	   Values in the 'Subtag' field MUST be lowercase except as provided for
1084	   in Section 3.1.

1086	3.3  Stability of IANA Registry Entries

1088	   The stability of entries and their meaning in the registry is
1089	   critical to the long term stability of language tags.  The rules in
1090	   this section guarantee that a specific language tag's meaning is
1091	   stable over time and will not change.

1093	   These rules specifically deal with how changes to codes (including
1094	   withdrawal and deprecation of codes) maintained by ISO 639, ISO
1095	   15924, ISO 3166, and UN M.49 are reflected in the IANA Language
1096	   Subtag Registry.  Assignments to the IANA Language Subtag Registry
1097	   MUST follow the following stability rules:

1099	   1.   Values in the fields 'Type', 'Subtag', 'Tag', 'Added',
1100	        'Deprecated' and 'Preferred-Value' MUST NOT be changed and are
1101	        guaranteed to be stable over time.

1103	   2.   Values in the 'Description' field MUST NOT be changed in a way
1104	        that would invalidate previously-existing tags.  They MAY be
1105	        broadened somewhat in scope, changed to add information, or
1106	        adapted to the most common modern usage.  For example, countries
1107	        occasionally change their official names: an historical example
1108	        of this would be "Upper Volta" changing to "Burkina Faso".

1110	   3.   Values in the field 'Prefix' MAY be added to records of type
1111	        'variant' via the registration process.

1113	   4.   Values in the field 'Prefix' MAY be modified, so long as the
1114	        modifications broaden the set of prefixes.  That is, a prefix
1115	        MAY be replaced by one of its own prefixes.  For example, the
1116	        prefix "en-US" could be replaced by "en", but not by the
1117	        prefixes "en-Latn", "fr", or "en-US-boont".  If one of those
1118	        prefixes were needed, a new Prefix SHOULD be registered.

1120	   5.   Values in the field 'Prefix' MUST NOT be removed.

1122	   6.   The field 'Comments' MAY be added, changed, modified, or removed
1123	        via the registration process or any of the processes or
1124	        considerations described in this section.

1126	   7.   The field 'Suppress-Script' MAY be added or removed via the
1127	        registration process.

1129	   8.   Codes assigned by ISO 639, ISO 15924, and ISO 3166 that do not
1130	        conflict with existing subtags of the associated type and whose
1131	        meaning is not the same as an existing subtag of the same type
1132	        are entered into the IANA registry as new records.

1134	   9.   Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are
1135	        withdrawn by their respective maintenance or registration
1136	        authority remain valid in language tags.  A 'Deprecated' field
1137	        containing the date of withdrawal is added to the record.  If a
1138	        new record of the same type is added that represents a
1139	        replacement value, then a 'Preferred-Value' field MAY also be
1140	        added.  The registration process MAY be used to add comments
1141	        about the withdrawal of the code by the respective standard.

1143	        1.  The region code 'TL' was assigned to the country 'Timor-
1144	            Leste', replacing the code 'TP' (which was assigned to 'East
1145	            Timor' when it was under administration by Portugal).  The
1146	            subtag 'TP' remains valid in language tags, but its record
1147	            contains the a 'Preferred-Value' of 'TL' and its field
1148	            'Deprecated' contains the date the new code was assigned
1149	            ('2004-07-06').

1151	   10.  Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict
1152	        with existing subtags of the associated type, including subtags
1153	        that are deprecated, MUST NOT be entered into the registry.  The
1154	        following additional considerations apply to subtag values that
1155	        are reassigned:

1157	        A.  For ISO 639 codes, if the newly assigned code's meaning is
1158	            not represented by a subtag in the IANA registry, the
1159	            Language Subtag Reviewer, as described in Section 3.4, SHALL
1160	            prepare a proposal for entering in the IANA registry as soon
1161	            as practical a registered language subtag as an alternate
1162	            value for the new code.  The form of the registered language
1163	            subtag will be at the discretion of the Language Subtag
1164	            Reviewer and MUST conform to other restrictions on language
1165	            subtags in this document.

1167	        B.  For all subtags whose meaning is derived from an external
1168	            standard (i.e.  ISO 639, ISO 15924, ISO 3166, or UN M.49),
1169	            if a new meaning is assigned to an existing code and the new
1170	            meaning broadens the meaning of that code, then the meaning
1171	            for the associated subtag MAY be changed to match.  The
1172	            meaning of a subtag MUST NOT be narrowed, however, as this
1173	            can result in an unknown proportion of the existing uses of
1174	            a subtag becoming invalid.  Note: ISO 639 MA/RA has adopted
1175	            a similar stability policy.

1177	        C.  For ISO 15924 codes, if the newly assigned code's meaning is
1178	            not represented by a subtag in the IANA registry, the
1179	            Language Subtag Reviewer, as described in Section 3.4, SHALL
1180	            prepare a proposal for entering in the IANA registry as soon
1181	            as practical a registered variant subtag as an alternate
1182	            value for the new code.  The form of the registered variant
1183	            subtag will be at the discretion of the Language Subtag
1184	            Reviewer and MUST conform to other restrictions on variant
1185	            subtags in this document.

1187	        D.  For ISO 3166 codes, if the newly assigned code's meaning is
1188	            associated with the same UN M.49 code as another 'region'
1189	            subtag, then the existing region subtag remains as the
1190	            preferred value for that region and no new entry is created.
1191	            A comment MAY be added to the existing region subtag
1192	            indicating the relationship to the new ISO 3166 code.

1194	        E.  For ISO 3166 codes, if the newly assigned code's meaning is
1195	            associated with a UN M.49 code that is not represented by an
1196	            existing region subtag, then the Language Subtag Reviewer,
1197	            as described in Section 3.4, SHALL prepare a proposal for
1198	            entering the appropriate UN M.49 country code as an entry in
1199	            the IANA registry.

1201	        F.  For ISO 3166 codes, if there is no associated UN numeric
1202	            code, then the Language Subtag Reviewer SHALL petition the
1203	            UN to create one.  If there is no response from the UN
1204	            within ninety days of the request being sent, the Language
1205	            Subtag Reviewer SHALL prepare a proposal for entering in the
1206	            IANA registry as soon as practical a registered variant
1207	            subtag as an alternate value for the new code.  The form of
1208	            the registered variant subtag will be at the discretion of
1209	            the Language Subtag Reviewer and MUST conform to other
1210	            restrictions on variant subtags in this document.  This
1211	            situation is very unlikely to ever occur.

1213	   11.  UN M.49 has codes for both countries and areas (such as '276'
1214	        for Germany) and geographical regions and sub-regions (such as
1215	        '150' for Europe).  UN M.49 country or area codes for which
1216	        there is no corresponding ISO 3166 code SHOULD NOT be
1217	        registered, except as a surrogate for an ISO 3166 code that is
1218	        blocked from registration by an existing subtag.  If such a code
1219	        becomes necessary, then the registration authority for ISO 3166
1220	        SHOULD first be petitioned to assign a code to the region.  If
1221	        the petition for a code assignment by ISO 3166 is refused or not
1222	        acted on in a timely manner, the registration process described
1223	        in Section 3.4 MAY then be used to register the corresponding UN
1224	        M.49 code.  At the time this document was written, there were
1225	        only four such codes: 830 (Channel Islands), 831 (Guernsey), 832
1226	        (Jersey), and 833 (Isle of Man).  This way UN M.49 codes remain
1227	        available as the value of last resort in cases where ISO 3166
1228	        reassigns a deprecated value in the registry.

1230	   12.  Stability provisions apply to grandfathered tags with this
1231	        exception: should all of the subtags in a grandfathered tag
1232	        become valid subtags in the IANA registry, then the field 'Type'
1233	        in that record is changed from 'grandfathered' to 'redundant'.
1234	        Note that this will not affect language tags that match the
1235	        grandfathered tag, since these tags will now match valid
1236	        generative subtag sequences.  For example, if the subtag 'gan'
1237	        in the language tag "zh-gan" were to be registered as an
1238	        extended language subtag, then the grandfathered tag "zh-gan"
1239	        would be deprecated (but existing content or implementations
1240	        that use "zh-gan" would remain valid).

1242	3.4  Registration Procedure for Subtags

1244	   The procedure given here MUST be used by anyone who wants to use a
1245	   subtag not currently in the IANA Language Subtag Registry.

1247	   Only subtags  of type 'language' and 'variant' will be considered for
1248	   independent registration of new subtags.  Handling of subtags needed
1249	   for stability and subtags necessary to keep the registry synchronized
1250	   with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits
1251	   defined by this document are described in Section 3.2.  Stability
1252	   provisions are described in Section 3.3.

1254	   This procedure MAY also be used to register or alter the information
1255	   for the "Description", "Comments", "Deprecated", or "Prefix" fields
1256	   in a subtag's record as described in Section 3.3.  Changes to all
1257	   other fields in the IANA registry are NOT permitted.

1259	   Registering a new subtag or requesting modifications to an existing
1260	   tag or subtag starts with the requester filling out the registration
1261	   form reproduced below.  Note that each response is not limited in
1262	   size so that the request can adequately describe the registration.
1263	   The fields in the "Record Requested" section SHOULD follow the
1264	   requirements in Section 3.1.

1266	   LANGUAGE SUBTAG REGISTRATION FORM
1267	   1. Name of requester:
1268	   2. E-mail address of requester:
1269	   3. Record Requested:

1271	   Type:
1272	   Subtag:
1273	   Description:
1274	   Prefix:
1275	   Preferred-Value:
1276	   Deprecated:
1277	   Suppress-Script:
1278	   Comments:

1280	   4. Intended meaning of the subtag:
1281	   5. Reference to published description
1282	   of the language (book or article):
1283	   6. Any other relevant information:

1285	                                 Figure 5

1287	   The subtag registration form MUST be sent to
1288	   <ietf-languages@iana.org> for a two week review period before it can
1289	   be submitted to IANA.  (This is an open list and can be joined by
1290	   sending a request to <ietf-languages-request@iana.org>.)

1292	   Variant and extlang subtags are always registered for use with a
1293	   particular range of language tags.  For example, the subtag 'rozaj'
1294	   is intended for use with language tags that start with the primary
1295	   language subtag "sl", since Resian is a dialect of Slovenian.  Thus
1296	   the subtag 'rozaj' could be included in tags such as "sl-Latn-rozaj"
1297	   or "sl-IT-rozaj".  This information is stored in the "Prefix" field
1298	   in the registry.  Variant registration requests are REQUIRED to
1299	   include at least one "Prefix" field in the registration form.

1301	   The 'Prefix' field for a given registered subtag will be maintained
1302	   in the IANA registry as a guide to usage.  Additional prefixes MAY be
1303	   added by filing an additional registration form.  In that form, the
1304	   "Any other relevant information:" field MUST indicate that it is the
1305	   addition of a prefix.

1307	   Requests to add a prefix to a variant subtag that imply a different
1308	   semantic meaning will probably be rejected.  For example, a request
1309	   to add the prefix "de" to the subtag 'nedis' so that the tag "de-
1310	   nedis" represented some German dialect would be rejected.  The
1311	   'nedis' subtag represents a particular Slovenian dialect and the
1312	   additional registration would change the semantic meaning assigned to
1313	   the subtag.  A separate subtag SHOULD be proposed instead.

1315	   The 'Description' field MUST contain a description of the tag being
1316	   registered written or transcribed into the Latin script; it MAY also
1317	   include a description in a non-Latin script.  Non-ASCII characters
1318	   MUST be escaped using the syntax described in Section 3.1.  The
1319	   'Description' field is used for identification purposes and doesn't
1320	   necessarily  represent the actual native name of the language or
1321	   variation or to be in any particular language.

1323	   While the 'Description' field itself is not guaranteed to be stable
1324	   and errata corrections MAY be undertaken from time to time, attempts
1325	   to provide translations or transcriptions of entries in the registry
1326	   itself will probably be frowned upon by the community or rejected
1327	   outright, as changes of this nature have an impact on the provisions
1328	   in Section 3.3.

1330	   The Language Subtag Reviewer is responsible for responding to
1331	   requests for the registration of subtags through the registration
1332	   process  and is appointed by the IESG.

1334	   When the two week period has passed the Language Subtag Reviewer
1335	   either forwards the record to be inserted or modified to
1336	   iana@iana.org according to the procedure described in Section 3.2, or
1337	   rejects the request because of significant objections raised on the
1338	   list or due to problems with constraints in this document (which MUST
1339	   be explicitly cited).  The reviewer MAY also extend the review period
1340	   in two week increments to permit further discussion.  The reviewer
1341	   MUST indicate on the list whether the registration has been accepted,
1342	   rejected, or extended following each two week period.

1344	   Note that the reviewer can raise objections on the list if he or she
1345	   so desires.  The important thing is that the objection MUST be made
1346	   publicly.

1348	   The applicant is free to modify a rejected application with
1349	   additional information and submit it again; this restarts the two
1350	   week comment period.

1352	   Decisions made by the reviewer MAY be appealed to the IESG [RFC2028]
1353	   under the same rules as other IETF decisions [RFC2026].

1355	   All approved registration forms are available online in the directory
1356	   http://www.iana.org/numbers.html under "languages".

1358	   Updates or changes to existing records follow the same procedure as
1359	   new registrations.  The Language Subtag Reviewer decides whether
1360	   there is consensus to update the registration following the two week
1361	   review period; normally objections by the original registrant will
1362	   carry extra weight in forming such a consensus.

1364	   Registrations are permanent and stable.  Once registered, subtags
1365	   will not be removed from the registry and will remain a valid way in
1366	   which to specify a specific language or variant.

1368	   Note: The purpose of the "Description" in the registration form is
1369	   intended as an aid to people trying to verify whether a language is
1370	   registered or what language or language variation a particular subtag
1371	   refers to.  In most cases, reference to an authoritative grammar or
1372	   dictionary of that language will be useful; in cases where no such
1373	   work exists, other well known works describing that language or in
1374	   that language MAY be appropriate.  The subtag reviewer decides what
1375	   constitutes "good enough" reference material.  This requirement is
1376	   not intended to exclude particular languages or dialects due to the
1377	   size of the speaker population or lack of a standardized orthography.
1378	   Minority languages will be considered equally on their own merits.

1380	3.5  Possibilities for Registration

1382	   Possibilities for registration of subtags or information about
1383	   subtags include:

1385	   o  Primary language subtags for languages not listed in ISO 639 that
1386	      are not variants of any listed or registered language can be
1387	      registered.  At the time this document was created there were no
1388	      examples of this form of subtag.  Before attempting to register a
1389	      language subtag, there MUST be an attempt to register the language
1390	      with ISO 639.  No language subtags will be registered for codes
1391	      that exist in ISO 639-1 or ISO 639-2, which are under
1392	      consideration by the ISO 639 maintenance or registration
1393	      authorities, or which have never been attempted for registration
1394	      with those authorities.  If ISO 639 has previously rejected a
1395	      language for registration, it is reasonable to assume that there
1396	      must be additional very compelling evidence of need before it will
1397	      be registered in the IANA registry (to the extent that it is very
1398	      unlikely that any subtags will be registered of this type).

1400	   o  Dialect or other divisions or variations within a language, its
1401	      orthography, writing system, regional or historical usage,
1402	      transliteration or other transformation, or distinguishing
1403	      variation MAY be registered as variant subtags.  An example is the
1404	      'rozaj' subtag (the Resian dialect of Slovenian).

1406	   o  The addition or maintenance of fields (generally of an
1407	      informational nature) in Tag or Subtag records as described in
1408	      Section 3.1 and subject to the stability provisions in
1409	      Section 3.3.  This includes  descriptions; comments; deprecation
1410	      and preferred values for obsolete or withdrawn codes; or the
1411	      addition of script or extlang information to primary language
1412	      subtags.

1414	   o  The addition of records and related field value changes necessary
1415	      to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and
1416	      UN  M.49 as described in Section 3.3.

1418	   This document leaves the decision on what subtags  or changes to
1419	   subtags are appropriate (or not) to the registration process
1420	   described in Section 3.4.

1422	   Note: four character primary language subtags are reserved to allow
1423	   for the possibility of  alpha4 codes in some future addition to the
1424	   ISO 639 family of standards.

1426	   ISO 639 defines a maintenance agency for additions to and changes in
1427	   the list of languages in ISO 639.  This agency is:

1429	   International Information Centre for Terminology (Infoterm)
1430	   Aichholzgasse 6/12, AT-1120
1431	   Wien, Austria
1432	   Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72

1434	   ISO 639-2 defines a maintenance agency for additions to and changes
1435	   in the list of languages in ISO 639-2.  This agency is:

1437	   Library of Congress
1438	   Network Development and MARC Standards Office
1439	   Washington, D.C. 20540 USA
1440	   Phone: +1 202 707 6237  Fax: +1 202 707 0115
1441	   URL: http://www.loc.gov/standards/iso639

1443	   The maintenance agency for ISO 3166 (country codes) is:

1445	   ISO 3166 Maintenance Agency
1446	   c/o International Organization for Standardization
1447	   Case postale 56
1448	   CH-1211 Geneva 20 Switzerland
1449	   Phone: +41 22 749 72 33  Fax: +41 22 749 73 49
1450	   URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html

1452	   The registration authority for ISO 15924 (script codes) is:

1454	   Unicode Consortium Box 391476
1455	   Mountain View, CA 94039-1476, USA
1456	   URL: http://www.unicode.org/iso15924

1458	   The Statistics Division of the United Nations Secretariat maintains
1459	   the Standard Country or Area Codes for Statistical Use and can be
1460	   reached at:

1462	   Statistical Services Branch
1463	   Statistics Division
1464	   United Nations, Room DC2-1620
1465	   New York, NY 10017, USA

1467	   Fax: +1-212-963-0623
1468	   E-mail: statistics@un.org
1469	   URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm

1471	3.6  Extensions and Extensions Namespace

1473	   Extension subtags are those introduced by single-letter subtags other
1474	   than 'x'.  They are reserved for the generation of identifiers which
1475	   contain a language component, and are compatible with applications
1476	   that understand language tags.  For example, they might be used to
1477	   define locale identifiers, which are generally based on language.

1479	   The structure and form of extensions are defined by this document so
1480	   that implementations can be created that are forward compatible with
1481	   applications that might be created using single-letter subtags in the
1482	   future.  In addition, defining a mechanism for maintaining single-
1483	   letter subtags will lend to the stability of this document by
1484	   reducing the likely need for future revisions or updates.

1486	   Allocation of a single-letter subtag SHALL take the form of an RFC
1487	   defining the name, purpose, processes, and procedures for maintaining
1488	   the subtags.  The maintaining or registering authority, including
1489	   name, contact email, discussion list email, and URL location of the
1490	   registry MUST be indicated clearly in the RFC.  The RFC MUST specify
1491	   or include each of the following:

1493	   o  The specification MUST reference the specific version or revision
1494	      of this document that governs its creation and MUST reference this
1495	      section of this document.

1497	   o  The specification and all subtags defined by the specification
1498	      MUST follow the ABNF and other rules for the formation of tags and
1499	      subtags as defined in this document.  In particular it MUST
1500	      specify that case is not significant and that subtags MUST NOT
1501	      exceed eight characters in length.

1503	   o  The specification MUST specify a canonical representation.

1505	   o  The specification of valid subtags MUST be available over the
1506	      Internet and at no cost.

1508	   o  The specification MUST be in the public domain or available via a
1509	      royalty-free license acceptable to the IETF and specified in the
1510	      RFC.

1512	   o  The specification MUST be versioned and each version of the
1513	      specification MUST be numbered, dated, and stable.

1515	   o  The specification MUST be stable.  That is, extension subtags,
1516	      once defined by a specification, MUST NOT be retracted or change
1517	      in meaning in any substantial way.

1519	   o  The specification MUST include in a separate section the
1520	      registration form reproduced in this section (below) to be used in
1521	      registering the extension upon publication as an RFC.

1523	   o  IANA MUST be informed of changes to the contact information and
1524	      URL for the specification.

1526	   IANA will maintain a registry of allocated single-letter (singleton)
1527	   subtags.  This registry will use the record-jar format described by
1528	   the ABNF in Section 3.1.  Upon publication of an extension as an RFC,
1529	   the maintaining authority defined in the RFC MUST forward this
1530	   registration form to iesg@ietf.org, who will forward the request to
1531	   iana@iana.org.  The maintaining authority of the extension MUST
1532	   maintain the accuracy of the record by sending an updated full copy
1533	   of the record to iana@iana.org with the subject line "LANGUAGE TAG
1534	   EXTENSION UPDATE" whenever content changes.  Only the 'Comments',
1535	   'Contact_Email', 'Mailing_List', and 'URL' fields MAY be modified in
1536	   these updates.

1538	   Failure to maintain this record, the corresponding registry, or meet
1539	   other conditions imposed by this section of this document MAY be
1540	   appealed to the IESG [RFC2028] under the same rules as other IETF
1541	   decisions (see [RFC2026]) and MAY result in the authority to maintain
1542	   the extension being withdrawn or reassigned by the IESG.

1544	   %%
1545	   Identifier:
1546	   Description:
1547	   Comments:
1548	   Added:
1549	   RFC:
1550	   Authority:
1551	   Contact_Email:
1552	   Mailing_List:
1553	   URL:
1554	   %%

1556	    Figure 6: Format of Records in the Language Tag Extensions Registry

1558	   'Identifier' contains the single letter subtag (singleton) assigned
1559	   to the extension.  The Internet-Draft submitted to define the
1560	   extension SHOULD specify which letter to use, although the IESG MAY
1561	   change the assignment when approving the RFC.

1563	   'Description' contains the name and description of the extension.

1565	   'Comments' is an OPTIONAL field and MAY contain a broader description
1566	   of the extension.

1568	   'Added' contains the date the RFC was published in the "full-date"
1569	   format specified in [RFC3339].  For example: 2004-06-28 represents
1570	   June 28, 2004, in the Gregorian calendar.

1572	   'RFC' contains the RFC number assigned to the extension.

1574	   'Authority' contains the name of the maintaining authority for the
1575	   extension.

1577	   'Contact_Email' contains the email address used to contact the
1578	   maintaining authority.

1580	   'Mailing_List' contains the URL or subscription email address of the
1581	   mailing list used by the maintaining authority.

1583	   'URL' contains the URL of the registry for this extension.

1585	   The determination of whether an Internet-Draft meets the above
1586	   conditions and the decision to grant or withhold such authority rests
1587	   solely with the IESG, and is subject to the normal review and appeals
1588	   process associated with the RFC process.

1590	   Extension authors are strongly cautioned that many (including most
1591	   well-formed) processors will be unaware of any special relationships
1592	   or meaning inherent in the order of extension subtags.  Extension
1593	   authors SHOULD avoid subtag relationships or canonicalization
1594	   mechanisms that interfere with matching or with length restrictions
1595	   that sometimes exist in common protocols where the extension is used.
1596	   In particular, applications MAY truncate the subtags in doing
1597	   matching or in fitting into limited lengths, so it is RECOMMENDED
1598	   that the most significant information be in the most significant
1599	   (left-most) subtags, and that the specification gracefully handle
1600	   truncated subtags.

1602	   When a language tag is to be used in a specific, known, protocol, it
1603	   is RECOMMENDED that that the language tag not contain extensions not
1604	   supported by that protocol.  In addition, note that some protocols
1605	   MAY impose upper limits on the length of the strings used to store or
1606	   transport the language tag.

1608	3.7  Initialization of the Registry

1610	   Adoption of this document will REQUIRE an initial version of the
1611	   registry containing the various subtags initially valid in a language
1612	   tag.  This collection of subtags, along with a description of the
1613	   process used to create it, is described by [initial-registry].

1615	   Registrations that are in process under the rules defined in
1616	   [RFC3066] when this document is adopted MAY be completed under the
1617	   former rules, at the discretion of the language tag reviewer.  Any
1618	   new registrations submitted after the adoption of this document MUST
1619	   be rejected.

1621	4.  Formation and Processing of Language Tags

1623	   This section addresses how to use the information in the registry
1624	   with the tag syntax to choose, form and process language tags.

1626	4.1  Choice of Language Tag

1628	   One is sometimes faced with the choice between several possible tags
1629	   for the same body of text.

1631	   Interoperability is best served when all users use the same language
1632	   tag in order to represent the same language.  If an application has
1633	   requirements that make the rules here inapplicable, then that
1634	   application risks damaging interoperability.  It is strongly
1635	   RECOMMENDED that users not define their own rules for language tag
1636	   choice.

1638	   Subtags SHOULD only be used  where they add useful distinguishing
1639	   information; extraneous subtags interfere with the meaning,
1640	   understanding, and processing of language tags.  In particular, users
1641	   and implementations SHOULD follow the 'Prefix' and 'Suppress-Script'
1642	   fields in the registry (defined in Section 3.1): these fields provide
1643	   guidance on when specific additional subtags SHOULD (and SHOULD NOT)
1644	   be used in a language tag.

1646	   Of particular note, many applications can benefit from the use of
1647	   script subtags in language tags, as long as the use is consistent for
1648	   a given context.  Script subtags were not formally defined in RFC
1649	   3066 and their use can affect matching and subtag identification by
1650	   implementations of RFC 3066, as these subtags appear between the
1651	   primary language and region subtags.  For example, if a user requests
1652	   content in an implementation of Section 2.5 of [RFC3066] using the
1653	   language range "en-US", content labeled "en-Latn-US" will not match
1654	   the request.  Therefore it is important to know when script subtags
1655	   will customarily be used and when they ought not be used.  In the
1656	   registry, the Suppress-Script field helps ensure greater
1657	   compatibility between the language tags generated according to the
1658	   rules in this document and language tags and tag processors or
1659	   consumers based on RFC 3066 by defining when users SHOULD NOT include
1660	   a script subtag with a particular primary language subtag.

1662	   Extended language subtags (type 'extlang' in the registry, see
1663	   Section 3.1) also appear between the primary language and region
1664	   subtags and are reserved for future standardization.  Applications
1665	   might benefit from their judicious use in forming language tags in
1666	   the future.  Similar recommendations are expected to apply to their
1667	   use as apply to script subtags.

1669	   Standards, protocols and applications that reference this document
1670	   normatively but apply different rules to the ones given in this
1671	   section MUST specify how the procedure varies from the one given
1672	   here.

1674	   The choice of subtags used to form a language tag SHOULD be guided by
1675	   the following rules:

1677	   1.  Use as precise a tag as possible, but no more specific than is
1678	       justified.  Avoid using subtags that are not important for
1679	       distinguishing content in an application.

1681	       *  For example, 'de' might suffice for tagging an email written
1682	          in German, while "de-CH-1996" is probably unnecessarily
1683	          precise for such a task.

1685	   2.  The script subtag SHOULD NOT be used to form language tags unless
1686	       the script adds some distinguishing information to the tag.  The
1687	       field 'Suppress-Script' in the primary language record in the
1688	       registry indicates which script subtags do not add distinguishing
1689	       information for most applications.

1691	       *  For example, the subtag 'Latn' should not be used with the
1692	          primary language 'en' because nearly all English documents are
1693	          written in the Latin script and it adds no distinguishing
1694	          information.  However, if a document were written in English
1695	          mixing Latin script with another script such as Braille
1696	          ('Brai'), then it might be appropriate to choose to indicate
1697	          both scripts to aid in content selection, such as the
1698	          application of a style sheet.

1700	   3.  If a tag or subtag has a 'Preferred-Value' field in its registry
1701	       entry, then the  value of that field SHOULD be used to form the
1702	       language tag in preference to the tag or subtag in which the
1703	       preferred value appears.

1705	       *  For example, use 'he' for Hebrew in preference to 'iw'.

1707	   4.  The 'und' (Undetermined) primary language subtag SHOULD NOT be
1708	       used to label content, even if the language is unknown.  Omitting
1709	       the language tag altogether is preferred to using a tag with a
1710	       primary language subtag of 'und'.  The 'und' subtag MAY be useful
1711	       for protocols that require a language tag to be provided.  The
1712	       'und' subtag MAY also be useful when matching language tags in
1713	       certain situations.

1715	   5.  The 'mul' (Multiple) primary language subtag SHOULD NOT be used
1716	       whenever the protocol allows the separate tags for multiple
1717	       languages, as is the case for the Content-Language header in
1718	       HTTP.  The 'mul' subtag conveys little useful information:
1719	       content in multiple languages SHOULD individually tag the
1720	       languages where they appear or otherwise indicate the actual
1721	       language in preference to the 'mul' subtag.

1723	   6.  The same variant subtag SHOULD NOT be used more than once within
1724	       a language tag.

1726	       *  For example, do not use "de-DE-1901-1901".

1728	   To ensure consistent backward compatibility, this document contains
1729	   several provisions to account for potential instability in the
1730	   standards used to define the subtags that make up language tags.
1731	   These provisions mean that no language tag created under the rules in
1732	   this document will become obsolete.

1734	4.2  Meaning of the Language Tag

1736	   The relationship between the tag and the information it relates to is
1737	   defined by the context in which the tag appears.  Accordingly, this
1738	   section can only give possible examples of its usage.

1740	   o  For a single information object, the associated language tags
1741	      might be interpreted as the set of languages that is necessary for
1742	      a complete comprehension of the complete object.  Example: Plain
1743	      text documents.

1745	   o  For an aggregation of information objects, the associated language
1746	      tags could be taken as the set of languages used inside components
1747	      of that aggregation.  Examples: Document stores and libraries.

1749	   o  For information objects whose purpose is to provide alternatives,
1750	      the associated language tags could be regarded as a hint that the
1751	      content is provided in several languages, and that one has to
1752	      inspect each of the alternatives in order to find its language or
1753	      languages.  In this case, the presence of multiple tags might not
1754	      mean that one needs to be multi-lingual to get complete
1755	      understanding of the document.  Example: MIME multipart/
1756	      alternative.

1758	   o  In markup languages, such as HTML and XML, language information
1759	      can be added to each part of the document identified by the markup
1760	      structure (including the whole document itself).  For example, one
1761	      could write <span lang="fr">C'est la vie.</span> inside a
1762	      Norwegian document; the Norwegian-speaking user could then access
1763	      a French-Norwegian dictionary to find out what the marked section
1764	      meant.  If the user were listening to that document through a
1765	      speech synthesis interface, this formation could be used to signal
1766	      the synthesizer to appropriately apply French text-to-speech
1767	      pronunciation rules to that span of text, instead of applying the
1768	      inappropriate Norwegian rules.

1770	   Language tags are related when they contain a similar sequence of
1771	   subtags.  For example, if a language tag B contains language tag A as
1772	   a prefix, then B is typically "narrower" or "more specific" than A.
1773	   Thus "zh-Hant-TW" is more specific than "zh-Hant".

1775	   This relationship is not guaranteed in all cases: specifically,
1776	   languages that begin with the same sequence of subtags are NOT
1777	   guaranteed to be mutually intelligible, although they might be.  For
1778	   example, the tag "az" shares a prefix with both "az-Latn"
1779	   (Azerbaijani written using the Latin script) and "az-Cyrl"
1780	   (Azerbaijani written using the Cyrillic script).  A person fluent in
1781	   one script might not be able to read the other, even though the text
1782	   might be identical.  Content tagged as "az" most probably is written
1783	   in just one script and thus might not be intelligible to a reader
1784	   familiar with the other script.

1786	4.3  Length Considerations

1788	   [RFC3066] did not provide an upper limit on the size of language
1789	   tags.  While RFC 3066 did define the semantics of particular subtags
1790	   in such a way that most language tags consisted of language and
1791	   region subtags with a combined total length of up to six characters,
1792	   larger registered tags were not only possible but were actually
1793	   registered.

1795	   Neither the language tag syntax nor other requirements in this
1796	   document  impose a fixed upper limit on the number of subtags in a
1797	   language tag (and thus an upper bound on the size of a tag).  The
1798	   language tag syntax suggests that, depending on the specific
1799	   language, more subtags (and thus a longer tag) are sometimes
1800	   necessary to completely identify the language for certain
1801	   applications; thus it is possible to envision long or complex subtag
1802	   sequences.

1804	4.3.1  Working with Limited Buffer Sizes

1806	   Some applications and protocols are forced to allocate fixed buffer
1807	   sizes or otherwise limit the length of a language tag.  A conformant
1808	   implementation or specification MAY refuse to support the storage of
1809	   language tags which exceed a specified length.  Any such limitation
1810	   SHOULD be clearly documented, and such documentation SHOULD include
1811	   what happens to longer tags (for example, whether an error value is
1812	   generated or the language tag is truncated).  A protocol that allows
1813	   tags to be truncated at an arbitrary limit, without giving any
1814	   indication of what that limit is, has the potential for causing harm
1815	   by changing the meaning of tags in substantial ways.

1817	   In practice, most language tags do not require more than a few
1818	   subtags and will not approach reasonably sized buffer limitations:
1819	   see Section 4.1.

1821	   Some specifications or protocols have limits on tag length but do not
1822	   have a fixed length limitation.  For example, [RFC2231]  has no
1823	   explicit length limitation: the length available for the language tag
1824	   is constrained by the length of other header components (such as the
1825	   charset's name) coupled with the 76 character limit in [RFC2047].
1826	   Thus the "limit" might be 50 or more characters, but it could
1827	   potentially be quite small.

1829	   The considerations for assigning a buffer limit are:

1831	      Implementations SHOULD NOT truncate language tags unless the
1832	      meaning of the tag is purposefully being changed, or unless the
1833	      tag does not fit into a limited buffer size specified by a
1834	      protocol for storage or transmission.

1836	      Implementations SHOULD warn the user when a tag is truncated since
1837	      truncation changes the semantic meaning of the tag.

1839	      Implementations of protocols or specifications that are space
1840	      constrained but do not have a fixed limit SHOULD use the longest
1841	      possible tag in preference to truncation.

1843	      Protocols or specifications that specify limited buffer sizes for
1844	      language tags MUST allow for language tags of up to 33 characters.

1846	      Protocols or specifications that specify limited buffer sizes for
1847	      language tags SHOULD allow for language tags of at least 42
1848	      characters.

1850	   The following illustration shows how the 42-character recommendation
1851	   was derived.  The combination of language and extended language
1852	   subtags was chosen for future compatibility.  At up to 15 characters,
1853	   this combination is longer than the longest possible primary language
1854	   subtag (8 characters):

1856	   language      =  3 (ISO 639-2; ISO 639-1 requires 2)
1857	   extlang1      =  4 (each subsequent subtag includes '-')
1858	   extlang2      =  4 (unlikely: needs prefix="language-extlang1")
1859	   extlang3      =  4 (extremely unlikely)
1860	   script        =  5 (if not suppressed: see Section 4.1)
1861	   region        =  4 (UN M.49; ISO 3166 requires 3)
1862	   variant1      =  9 (MUST have language as a prefix)
1863	   variant2      =  9 (MUST have language-variant1 as a prefix)

1865	   total         = 42 characters

1867	              Figure 7: Derivation of the Limit on Tag Length

1869	4.3.2  Truncation of Language Tags

1871	   Truncation of a language tag alters the meaning of the tag, and thus
1872	   SHOULD be avoided.  However, truncation of language tags is sometimes
1873	   necessary due to limited buffer sizes.  Such truncation MUST NOT
1874	   permit a subtag to be chopped off in the middle or the formation of
1875	   invalid tags (for example, one ending with the "-" character).

1877	   This means that applications or protocols which truncate tags MUST do
1878	   so by progressively removing subtags along with their preceding "-"
1879	   from the right side of the language tag until the tag is short enough
1880	   for the given buffer.  If the resulting tag ends with a single-
1881	   character subtag, that subtag and its preceding "-" MUST also be
1882	   removed.  For example:

1884	   Tag to truncate: zh-Hant-CN-variant1-a-extend1-x-wadegile-private1
1885	   1. zh-Latn-CN-variant1-a-extend1-x-wadegile
1886	   2. zh-Latn-CN-variant1-a-extend1
1887	   3. zh-Latn-CN-variant1
1888	   4. zh-Latn-CN
1889	   5. zh-Latn
1890	   6. zh

1892	                    Figure 8: Example of Tag Truncation

1894	4.4  Canonicalization of Language Tags

1896	   Since a particular language tag is sometimes used by many processes,
1897	   language tags SHOULD always be created or generated in a canonical
1898	   form.

1900	   A language tag is in canonical form when:

1902	   1.  The tag is well-formed according the rules in Section 2.1 and
1903	       Section 2.2.

1905	   2.  Subtags of type 'Region' that have a Preferred-Value mapping in
1906	       the IANA registry (see Section 3.1) SHOULD be replaced with their
1907	       mapped value.

1909	   3.  Redundant or grandfathered tags that have a Preferred-Value
1910	       mapping in the IANA registry (see Section 3.1) MUST be replaced
1911	       with their mapped value.  These items are either deprecated
1912	       mappings created before the adoption of this document (such as
1913	       the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are
1914	       the result of later registrations or additions to this document
1915	       (for example, "zh-guoyu" might be mapped to a language-extlang
1916	       combination such as "zh-cmn" by some future update of this
1917	       document).

1919	   4.  Other subtags that have a Preferred-Value mapping in the IANA
1920	       registry (see Section 3.1) MUST be replaced with their mapped
1921	       value.  These items consist entirely of clerical corrections to
1922	       ISO 639-1 in which the deprecated subtags have been maintained
1923	       for compatibility purposes.

1925	   5.  If more than one extension subtag sequence exists, the extension
1926	       sequences are ordered into case-insensitive ASCII order by
1927	       singleton subtag.

1929	   Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical
1930	   form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in
1931	   canonical form.

1933	   Example: The language tag "en-NH" (English as used in the New
1934	   Hebrides) is not canonical because the 'NH' subtag has a canonical
1935	   mapping to 'VU' (Vanuatu), although the tag "en-NH" maintains its
1936	   validity.

1938	   Canonicalization of language tags does not imply anything about the
1939	   use of upper or lowercase letters when processing or comparing
1940	   subtags (and as described in Section 2.1).  All comparisons MUST be
1941	   performed in a case-insensitive manner.

1943	   When performing canonicalization of language tags, processors MAY
1944	   regularize the case of the subtags (that is, this process is
1945	   OPTIONAL), following the case used in the registry.  Note that this
1946	   corresponds to the following casing rules: uppercase all non-initial
1947	   two-letter subtags; titlecase all non-initial four-letter subtags;
1948	   lowercase everything else.

1950	   Note: Case folding of ASCII letters in certain locales, unless
1951	   carefully handled, sometimes produces non-ASCII character values.
1952	   The Unicode Character Database file "SpecialCasing.txt" defines the
1953	   specific cases that are known to cause problems with this.  In
1954	   particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is
1955	   uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE).
1956	   Implementers SHOULD specify a locale-neutral casing operation to
1957	   ensure that case folding of subtags does not produce this value,
1958	   which is illegal in language tags.  For example, if one were to
1959	   uppercase the region subtag 'in' using Turkish locale rules, the
1960	   sequence U+0130 U+004E would result instead of the expected 'IN'.

1962	   Note: if the field 'Deprecated' appears in a registry record without
1963	   an accompanying 'Preferred-Value' field, then that tag or subtag is
1964	   deprecated without a replacement.  Validating processors SHOULD NOT
1965	   generate tags that include these values, although the values are
1966	   canonical when they appear in a language tag.

1968	   An extension MUST define any relationships that exist between the
1969	   various subtags in the extension and thus MAY define an alternate
1970	   canonicalization scheme for the extension's subtags.  Extensions MAY
1971	   define how the order of the extension's subtags are interpreted.  For
1972	   example, an extension could define that its subtags are in canonical
1973	   order when the subtags are placed into ASCII order: that is, "en-a-
1974	   aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa".  Another extension might
1975	   define that the order of the subtags influences their semantic
1976	   meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
1977	   aaa-bbb-ccc").  However, extension specifications SHOULD be designed
1978	   so that they are tolerant of the typical processes described in
1979	   Section 3.6.

1981	4.5  Considerations for Private Use Subtags

1983	   Private use subtags, like all other subtags, MUST conform to the
1984	   format and content constraints in the ABNF.  Private use subtags have
1985	   no meaning outside the private agreement between the parties that
1986	   intend to use or exchange language tags that employ them.  The same
1987	   subtags could be used with a different meaning under a separate
1988	   private agreement.  They SHOULD NOT be used where alternatives exist
1989	   and SHOULD NOT be used in content or protocols intended for general
1990	   use.

1992	   Private use subtags are simply useless for information exchange
1993	   without prior arrangement.  The value and semantic meaning of private
1994	   use tags and of the subtags used within such a language tag are not
1995	   defined by this document.

1997	   Subtags defined in the IANA registry as having a specific private use
1998	   meaning convey more information that a purely private use tag
1999	   prefixed by the singleton subtag 'x'.  For applications this
2000	   additional information MAY be useful.

2002	   For example, the region subtags 'AA', 'ZZ' and in the ranges
2003	   'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY
2004	   be used to form a language tag.  A tag such as "zh-Hans-XQ" conveys a
2005	   great deal of public, interchangeable information about the language
2006	   material (that it is Chinese in the simplified Chinese script and is
2007	   suitable for some geographic region 'XQ').  While the precise
2008	   geographic region is not known outside of private agreement, the tag
2009	   conveys far more information than an opaque tag such as "x-someLang",
2010	   which contains no information about the language subtag or script
2011	   subtag outside of the private agreement.

2013	   However, in some cases content tagged with private use subtags MAY
2014	   interact with other systems in a different and possibly unsuitable
2015	   manner compared to tags that use opaque, privately defined subtags,
2016	   so the choice of the best approach sometimes depends on the
2017	   particular domain in question.

2019	5.  IANA Considerations

2021	   This section deals with the processes and requirements necessary for
2022	   IANA to undertake to maintain the subtag and extension registries as
2023	   defined by this document and in accordance with the requirements of
2024	   [RFC2434].

2026	   The impact on the IANA maintainers of the two registries defined by
2027	   this document will be a small increase in the frequency of new
2028	   entries or updates.

2030	5.1  Language Subtag Registry

2032	   Upon adoption of this document, the registry will be initialized by a
2033	   companion document: [initial-registry].  The criteria and process for
2034	   selecting the initial set of records is described in that document.
2035	   The initial set of records represents no impact on IANA, since the
2036	   work to create it will be performed externally.

2038	   The new registry MUST be listed under "Language Tags" at
2039	   <http://www.iana.org/numbers.html>, replacing the existing
2040	   registrations defined by [RFC3066].  The existing set of registration
2041	   forms and RFC 3066 registrations will be relabeled as "Language Tags
2042	   (Obsolete)" and maintained (but not added to or modified).

2044	   Future work on the Language Subtag Registry will be limited to
2045	   inserting or replacing whole records preformatted for IANA by the
2046	   Language Subtag Reviewer as described in Section 3.2 of this
2047	   document.  This simplifies IANA's work by limiting it to placing the
2048	   text in the appropriate location in the registry.

2050	   Each record will be sent to iana@iana.org with a subject line
2051	   indicating whether the enclosed record is an insertion of a new
2052	   record (indicated by the word "INSERT" in the subject line) or a
2053	   replacement of an existing record (indicated by the word "MODIFY" in
2054	   the subject line).  Records MUST NOT be deleted from the registry.
2055	   IANA MUST place any inserted or modified records into the appropriate
2056	   section of the language subtag registry, grouping the records by
2057	   their "Type" field.  Inserted records MAY be placed anywhere in the
2058	   appropriate section; there is no guarantee of the order of the
2059	   records beyond grouping them together by 'Type'.  Modified records
2060	   MUST overwrite the record they replace.

2062	   Included in any request to insert or modify records MUST be a new
2063	   File-Date record.  This record MUST be placed first in the registry.
2064	   In the event that the File-Date record present in the registry has a
2065	   later date then the record being inserted or modified, the existing
2066	   record MUST be preserved.

2068	5.2  Extensions Registry

2070	   The Language Tag Extensions registry will also be generated and sent
2071	   to IANA as described in Section 3.6.  This registry can contain at
2072	   most 35 records and thus changes to this registry are expected to be
2073	   very infrequent.

2075	   Future work by IANA on the Language Tag Extensions Registry is
2076	   limited to two cases.  First, the IESG MAY request that new records
2077	   be inserted into this registry from time to time.  These requests
2078	   will include the record to insert in the exact format described in
2079	   Section 3.6.  In addition, there MAY be occasional requests from the
2080	   maintaining authority for a specific extension to update the contact
2081	   information or URLs in the record.  These requests MUST include the
2082	   complete, updated record.  IANA is not responsible for validating the
2083	   information provided, only that it is properly formatted.  It should
2084	   reasonably be seen to come from the maintaining authority named in
2085	   the record present in the registry.

2087	6.  Security Considerations

2089	   Language tags used in content negotiation, like any other information
2090	   exchanged on the Internet, might be a source of concern because they
2091	   might be used to infer the nationality of the sender, and thus
2092	   identify potential targets for surveillance.

2094	   This is a special case of the general problem that anything sent is
2095	   visible to the receiving party and possibly to third parties as well.
2096	   It is useful to be aware that such concerns can exist in some cases.

2098	   The evaluation of the exact magnitude of the threat, and any possible
2099	   countermeasures, is left to each application protocol (see BCP 72
2100	   [RFC3552] for best current practice guidance on security threats and
2101	   defenses).

2103	   The language tag associated with a particular information item is of
2104	   no consequence whatsoever in determining whether that content might
2105	   contain possible homographs.  The fact that a text is tagged as being
2106	   in one language or using a particular script subtag provides no
2107	   assurance whatsoever that it does not contain characters from scripts
2108	   other than the one(s) associated with or specified by that language
2109	   tag.

2111	   Since there is no limit to the number of variant, private use, and
2112	   extension subtags, and consequently no limit on the possible length
2113	   of a tag, implementations need to guard against buffer overflow
2114	   attacks.  See Section 4.3 for details on language tag truncation,
2115	   which can occur as a consequence of defenses against buffer overflow.

2117	   Although the specification of valid subtags for an extension (see:
2118	   Section 3.6) MUST be available over the Internet, implementations
2119	   SHOULD NOT mechanically depend on it being always accessible, to
2120	   prevent denial-of-service attacks.

2122	7.  Character Set Considerations

2124	   The syntax in this document requires that language tags use only the
2125	   characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
2126	   character sets, so the composition of language tags should not have
2127	   any character set issues.

2129	   Rendering of characters based on the content of a language tag is not
2130	   addressed in this memo.  Historically, some languages have relied on
2131	   the use of specific character sets or other information in order to
2132	   infer how a specific character should be rendered (notably this
2133	   applies to language and culture specific variations of Han ideographs
2134	   as used in Japanese, Chinese, and Korean).  When language tags are
2135	   applied to spans of text, rendering engines can use that information
2136	   in deciding which font to use in the absence of other information,
2137	   particularly where languages with distinct writing traditions use the
2138	   same characters.

2140	8.  Changes from RFC 3066

2142	   The main goals for this revision of language tags were the following:

2144	   *Compatibility.* All RFC 3066 language tags  (including those in the
2145	   IANA registry)  remain valid in this specification.  The changes in
2146	   this document represent additional constraints on language tags.
2147	   That is, in no case is the syntax more permissive and processors
2148	   based on the RFC 3066 ABNF (such as those described in [XMLSchema])
2149	   will be able to process the tags described by this document.  In
2150	   addition, this document defines language tags in such as way as to
2151	   ensure future compatibility.

2153	   *Stability.* Because of changes in the past in the underlying ISO
2154	   standards, a valid RFC 3066 language tag could become invalid or have
2155	   its meaning change.  This has the potential of invalidating content
2156	   that may have an extensive shelf-life.  In this specification, once a
2157	   language tag is valid, it remains valid forever.

2159	   *Validity.*  The structure of language tags defined by this document
2160	   makes it possible to determine if a particular tag is well-formed
2161	   without regard for the actual content or "meaning" of the tag as a
2162	   whole.  This is important because the registry grows and underlying
2163	   standards  change over time.  In addition, it must be possible to
2164	   determine if a tag is valid (or not) for a given point in time in
2165	   order  to provide reproducible, testable results.  This process must
2166	   not be error-prone; otherwise implementations might give different
2167	   results.  By having an authoritative registry with specific
2168	   versioning information, the validity of language tags at any point in
2169	   time can be precisely determined (instead of interpolating values
2170	   from many separate sources).

2172	   *Utility.* It is sometimes important to be able to differentiate
2173	   between written forms of a language -- for many implementations this
2174	   is more important than distinguishing between the spoken variants of
2175	   a language.  Languages are written in a wide variety of different
2176	   scripts, so this document provides for the generative use of ISO
2177	   15924 script codes.  Like the generative use of ISO language and
2178	   country codes in RFC 3066, this allows combinations to be produced
2179	   without resorting to the registration process.  The addition of UN
2180	   M.49 codes provides for the generation of language tags with regional
2181	   scope, which is also required by some applications.

2183	   The recast of the registry from containing whole language tags to
2184	   subtags is a key part of this.  An important feature of RFC 3066 was
2185	   that it allowed generative use of subtags.  This allows people to
2186	   meaningfully use generated tags, without the delays in registering
2187	   whole tags or the need to register all of the combinations that might
2188	   be useful.

2190	   The choice of placing the extended language and script subtags
2191	   between the primary language and region subtags was widely debated.
2192	   This design was chosen because the prevalent matching and content
2193	   negotiation schemes rely on the subtags being arranged in order of
2194	   increasing specificity.  That is, the subtags that mark a greater
2195	   barrier to mutual intelligibility appear left-most in a tag.  For
2196	   example, when selecting content written in Azerbaijani, the script
2197	   (Arabic, Cyrillic, or Latin) represents a greater barrier to
2198	   understanding than any regional variations (those associated with
2199	   Azerbaijan or Iran, for example).  Individuals who prefer documents
2200	   in a particular script, but can deal with the minor regional
2201	   differences, can therefore select appropriate content.  Applications
2202	   that do not deal with written content will continue to omit these
2203	   subtags.

2205	   *Extensibility.* Because of the widespread use of language tags, it
2206	   is disruptive to have periodic revisions of the core specification,
2207	   even in the face of demonstrated need.  The extension mechanism
2208	   provides for a way for independent RFCs to define extensions to
2209	   language tags.  These extensions have a very constrained, well-
2210	   defined structure that prevent extensions from interfering with
2211	   implementations of language tags defined in this document.

2213	   The document also anticipates features of ISO 639-3 with the addition
2214	   of the extended language subtags, as well as the possibility of other
2215	   ISO 639 parts becoming useful for the formation of language tags in
2216	   the future.

2218	   The use and definition of private use tags has also been modified, to
2219	   allow people to use private use subtags to extend or modify defined
2220	   tags and to move as much information as possible out of private use
2221	   and into the regular structure.

2223	   The goal for each of these modifications is to reduce or eliminate
2224	   the need for future revisions of this document.

2226	   The specific changes in this document to meet these goals are:

2228	   o  Defines the ABNF and rules for subtags so that the category of all
2229	      subtags can be determined without reference to the registry.

2231	   o  Adds the concept of well-formed vs. validating processors,
2232	      defining the rules by which an implementation can claim to be one
2233	      or the other.

2235	   o  Replaces the IANA language tag registry with a language subtag
2236	      registry that provides a complete list of valid subtags in the
2237	      IANA registry.  This allows for robust implementation and ease of
2238	      maintenance.  The language subtag registry becomes the canonical
2239	      source for forming language tags.

2241	   o  Provides a process that guarantees stability of language tags, by
2242	      handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in
2243	      the event that they register a previously used value for a new
2244	      purpose.

2246	   o  Allows ISO 15924 script code subtags and allows them to be used
2247	      generatively.  Defines a method for indicating in the registry
2248	      when script subtags are necessary for a given language tag.

2250	   o  Adds the concept of a variant subtag and allows variants to be
2251	      used generatively.

2253	   o  Adds the ability to use a class of UN M.49 tags for  supra-
2254	      national regions and to resolve conflicts in the assignment of ISO
2255	      3166 codes.

2257	   o  Defines the private use tags in ISO 639, ISO 15924, and ISO 3166
2258	      as the mechanism for creating private use language, script, and
2259	      region subtags respectively.

2261	   o  Adds a well-defined extension mechanism.

2263	   o  Defines an extended language subtag, possibly for use with certain
2264	      anticipated features of ISO 639-3.

2266	   Ed Note: The following items are provided for the convenience of
2267	   reviewers and will be removed from the final document.

2269	   Changes between draft-ietf-ltru-registry-08 and this version are:

2271	   o  Added a reference URI to the editor's address.  (F.Ellermann)

2273	   o  Various nit fixings.

2275	   o  Fixed rule #11 in Section 3.3 to allow UN M.49 codes to be
2276	      registered in extreme situations (#1026) (F.Ellermann, R.Presuhn,
2277	      etc.)

2279	   o  Added more cautionary text about private use subtags to
2280	      Section 4.5. (#1061) (D.Pierce)

2282	   o  Regularized "private-use" to always use the form "private use".
2283	      (A.Phillips)

2285	   o  Additional wordsmithing on rule #11 in Section 3.3.  (F.Ellermann)

2287	9.  References

2289	9.1  Normative References

2291	   [ISO639-1]
2292	              International Organization for Standardization, "ISO 639-
2293	              1:2002, Codes for the representation of names of languages
2294	              -- Part 1: Alpha-2 code", ISO Standard 639, 2002, <ISO
2295	              639-1>.

2297	   [ISO639-2]
2298	              International Organization for Standardization, "ISO 639-
2299	              2:1998 - Codes for the representation of names of
2300	              languages -- Part 2: Alpha-3 code - edition 1",
2301	              August 1988, <ISO 639-2>.

2303	   [ISO15924]
2304	              ISO TC46/WG3, "ISO 15924:2003 (E/F) - Codes for the
2305	              representation of names of scripts", January 2004, <ISO
2306	              15924>.

2308	   [ISO3166]  International Organization for Standardization, "Codes for
2309	              the representation of names of countries, 3rd edition",
2310	              ISO Standard 3166, August 1988, <ISO 3166>.

2312	   [UN_M.49]  Statistical Division, United Nations, "Standard Country or
2313	              Area Codes for Statistical Use", UN Standard Country or
2314	              Area Codes for Statistical Use, Revision 4 (United Nations
2315	              publication, Sales No. 98.XVII.9, June 1999, <UN M.49>.

2317	   [ISO10646]
2318	              International Organization for Standardization, "ISO/IEC
2319	              10646-1:2000. Information technology -- Universal
2320	              Multiple-Octet Coded Character Set (UCS) -- Part 1:
2321	              Architecture and Basic Multilingual Plane and ISO/IEC
2322	              10646-2:2001. Information technology -- Universal
2323	              Multiple-Octet Coded Character Set (UCS) -- Part 2:
2324	              Supplementary Planes, as, from time to time, amended,
2325	              replaced by a new edition or expanded by the addition of
2326	              new parts", 2000, <ISO/IEC 10646>.

2328	   [RFC2234bis]
2329	              Crocker, D. and P. Overell, "Augmented BNF for Syntax
2330	              Specifications: ABNF", draft-crocker-abnf-rfc2234bis-00
2331	              (work in progress), March 2005.

2333	   [RFC2026]  Bradner, S., "The Internet Standards Process -- Revision
2334	              3", BCP 9, RFC 2026, October 1996.

2336	   [RFC2028]  Hovey, R. and S. Bradner, "The Organizations Involved in
2337	              the IETF Standards Process", BCP 11, RFC 2028,
2338	              October 1996.

2340	   [RFC2047]  Moore, K., "MIME (Multipurpose Internet Mail Extensions)
2341	              Part Three: Message Header Extensions for Non-ASCII Text",
2342	              RFC 2047, November 1996.

2344	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2345	              Requirement Levels", BCP 14, RFC 2119, March 1997.

2347	   [RFC2434]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
2348	              IANA Considerations Section in RFCs", BCP 26, RFC 2434,
2349	              October 1998.

2351	   [RFC2781]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO
2352	              10646", RFC 2781, February 2000.

2354	   [RFC2860]  Carpenter, B., Baker, F., and M. Roberts, "Memorandum of
2355	              Understanding Concerning the Technical Work of the
2356	              Internet Assigned Numbers Authority", RFC 2860, June 2000.

2358	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
2359	              Timestamps", RFC 3339, July 2002.

2361	   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
2362	              Text on Security Considerations", BCP 72, RFC 3552,
2363	              July 2003.

2365	9.2  Informative References

2367	   [initial-registry]
2368	              Ewell, D., Ed., "Initial Language Subtag Registry",
2369	              June 2005, <http://www.ietf.org/internet-drafts/
2370	              draft-ietf-ltru-initial-registry-00.txt>.

2372	   [iso639.principles]
2373	              ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory
2374	              Committee:  Working principles for ISO 639 maintenance",
2375	              March 2000,
2376	              <http://www.loc.gov/standards/iso639-2/
2377	              iso639jac_n3r.html>.

2379	   [record-jar]
2380	              Raymond, E., "The Art of Unix Programming", 2003.

2382	   [XML10]    Bray (et al), T., "Extensible Markup Language (XML) 1.0",
2383	              02 2004.

2385	   [XMLSchema]
2386	              Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part 2:
2387	              Datatypes Second Edition", 10 2004, <
2388	              http://www.w3.org/TR/xmlschema-2/>.

2390	   [Unicode]  Unicode Consortium, "The Unicode Consortium. The Unicode
2391	              Standard, Version 4.1.0, defined by: The Unicode Standard,
2392	              Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-
2393	              18578-1), as amended by Unicode 4.0.1
2394	              (http://www.unicode.org/versions/Unicode4.0.1) and by
2395	              Unicode 4.1.0
2396	              (http://www.unicode.org/versions/Unicode4.1.0).",
2397	              March 2005.

2399	   [RFC1766]  Alvestrand, H., "Tags for the Identification of
2400	              Languages", RFC 1766, March 1995.

2402	   [RFC2231]  Freed, N. and K. Moore, "MIME Parameter Value and Encoded
2403	              Word Extensions: Character Sets, Languages, and
2404	              Continuations", RFC 2231, November 1997.

2406	   [RFC3066]  Alvestrand, H., "Tags for the Identification of
2407	              Languages", BCP 47, RFC 3066, January 2001.

2409	Authors' Addresses

2411	   Addison Phillips (editor)
2412	   Quest Software

2414	   Email: addison.phillips@quest.com
2415	   URI:   http://www.inter-locale.com

2417	   Mark Davis (editor)
2418	   IBM

2420	   Email: mark.davis@us.ibm.com

2422	Appendix A.  Acknowledgements

2424	   Any list of contributors is bound to be incomplete; please regard the
2425	   following as only a selection from the group of people who have
2426	   contributed to make this document what it is today.

2428	   The contributors to RFC 3066 and RFC 1766, the precursors of this
2429	   document, made enormous contributions directly or indirectly to this
2430	   document and are generally responsible for the success of language
2431	   tags.

2433	   The following people (in alphabetical order) contributed to this
2434	   document or to RFCs 1766 and 3066:

2436	   Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet,
2437	   Nathaniel Borenstein, Karen Broome, Eric Brunner, Sean M. Burke, M.T.
2438	   Carrasco Benitez, Jeremy Carroll, John Clews, Jim Conklin, Peter
2439	   Constable, John Cowan, Mark Crispin, Dave Crocker, Martin Duerst,
2440	   Frank Ellerman, Michael Everson, Doug Ewell, Ned Freed, Tim Goodwin,
2441	   Dirk-Willem van Gulik, Marion Gunn, Joel Halpren, Elliotte Rusty
2442	   Harold, Paul Hoffman, Scott Hollenbeck, Richard Ishida, Olle
2443	   Jarnefors, Kent Karlsson, John Klensin, Alain LaBonte, Eric Mader,
2444	   Ira McDonald, Keith Moore, Chris Newman, Masataka Ohta, Dylan Pierce,
2445	   Randy Presuhn, George Rhoten, Markus Scherer, Keld Jorn Simonsen,
2446	   Thierry Sourbier, Otto Stolz, Tex Texin, Andrea Vine, Rhys
2447	   Weatherley, Misha Wolf, Francois Yergeau and many, many others.

2449	   Very special thanks must go to Harald Tveit Alvestrand, who
2450	   originated RFCs 1766 and 3066, and without whom this document would
2451	   not have been possible.  Special thanks must go to Michael Everson,
2452	   who has served as language tag reviewer for almost the complete
2453	   period since the publication of RFC 1766.  Special thanks to Doug
2454	   Ewell, for his production of the first complete subtag registry, and
2455	   his work in producing a test parser for verifying language tags.

2457	Appendix B.  Examples of Language Tags (Informative)

2459	   Simple language subtag:

2461	      de (German)

2463	      fr (French)

2465	      ja (Japanese)

2467	      i-enochian (example of a grandfathered tag)

2469	   Language subtag plus Script subtag:

2471	      zh-Hant (Chinese written using the Traditional Chinese script)

2473	      zh-Hans (Chinese written using the Simplified Chinese script)

2475	      sr-Cyrl (Serbian written using the  Cyrillic script)

2477	      sr-Latn (Serbian written using the Latin script)

2479	   Language-Script-Region:

2481	      zh-Hans-CN (Chinese written using the Simplified script as used in
2482	      mainland China)

2484	      sr-Latn-CS (Serbian written using the Latin script as used in
2485	      Serbia and Montenegro)

2487	   Language-Variant:

2489	      sl-rozaj (Resian dialect of Slovenian

2491	      sl-nedis (Nadiza dialect of Slovenian)

2493	   Language-Region-Variant:

2495	      de-CH-1901 (German as used in Switzerland using the 1901 variant
2496	      [orthography])

2498	      sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)

2500	   Language-Script-Region-Variant:

2502	      sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the
2503	      Latin script as used in Italy.  Note that this tag is NOT
2504	      RECOMMENDED because subtag 'sl' has a Suppress-Script value of
2505	      'Latn')

2507	   Language-Region:

2509	      de-DE (German for Germany)

2511	      en-US (English as used in the United States)

2513	      es-419 (Spanish appropriate for the Latin America and Caribbean
2514	      region using the UN region code)

2516	   Private use subtags:

2518	      de-CH-x-phonebk

2520	      az-Arab-x-AZE-derbend

2522	   Extended language subtags (examples ONLY: extended languages MUST be
2523	   defined by revision or update to this document):

2525	      zh-min

2527	      zh-min-nan-Hant-CN

2529	   Private use registry values:

2531	      x-whatever (private use using the singleton 'x')

2533	      qaa-Qaaa-QM-x-southern (all private tags)

2535	      de-Qaaa (German, with a private script)

2537	      sr-Latn-QM (Serbian, Latin-script, private region)

2539	      sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro)

2541	   Tags that use extensions (examples ONLY: extensions MUST be defined
2542	   by revision or update to this document or by RFC):

2544	      en-US-u-islamCal

2546	      zh-CN-a-myExt-x-private
2547	      en-a-myExt-b-another

2549	   Some Invalid Tags:

2551	      de-419-DE (two region tags)

2553	      a-DE (use of a single character subtag in primary position; note
2554	      that there are a few grandfathered tags that start with "i-" that
2555	      are valid)

2557	      ar-a-aaa-b-bbb-a-ccc (two extensions with same single letter
2558	      prefix)

2560	Intellectual Property Statement

2562	   The IETF takes no position regarding the validity or scope of any
2563	   Intellectual Property Rights or other rights that might be claimed to
2564	   pertain to the implementation or use of the technology described in
2565	   this document or the extent to which any license under such rights
2566	   might or might not be available; nor does it represent that it has
2567	   made any independent effort to identify any such rights.  Information
2568	   on the procedures with respect to rights in RFC documents can be
2569	   found in BCP 78 and BCP 79.

2571	   Copies of IPR disclosures made to the IETF Secretariat and any
2572	   assurances of licenses to be made available, or the result of an
2573	   attempt made to obtain a general license or permission for the use of
2574	   such proprietary rights by implementers or users of this
2575	   specification can be obtained from the IETF on-line IPR repository at
2576	   http://www.ietf.org/ipr.

2578	   The IETF invites any interested party to bring to its attention any
2579	   copyrights, patents or patent applications, or other proprietary
2580	   rights that may cover technology that may be required to implement
2581	   this standard.  Please address the information to the IETF at
2582	   ietf-ipr@ietf.org.

2584	Disclaimer of Validity

2586	   This document and the information contained herein are provided on an
2587	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2588	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
2589	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
2590	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
2591	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2592	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2594	Copyright Statement

2596	   Copyright (C) The Internet Society (2005).  This document is subject
2597	   to the rights, licenses and restrictions contained in BCP 78, and
2598	   except as set forth therein, the authors retain all their rights.

2600	Acknowledgment

2602	   Funding for the RFC Editor function is currently provided by the
2603	   Internet Society.