idnits 2.17.1 

draft-davis-t-langtag-ext-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 3 instances of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 5, 2011) is 4519 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'TBD' is mentioned on line 317, but not defined


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                 M. Davis
3	Internet-Draft                                                    Google
4	Intended status: Informational                               A. Phillips
5	Expires: June 7, 2012                                             Lab126
6	                                                               Y. Umaoka
7	                                                                     IBM
8	                                                                 C. Falk
9	                                                       Infinite Automata
10	                                                        December 5, 2011

12	                BCP 47 Extension T - Transformed Content
13	                      draft-davis-t-langtag-ext-07

15	Abstract

17	   This document specifies an Extension to BCP 47 which provides subtags
18	   for specifying the source language or script of transformed content,
19	   including content that has been transliterated, transcribed, or
20	   translated, or in some other way influenced by the source.  It also
21	   provides for additional information used for identification.

23	Status of this Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on June 7, 2012.

40	Copyright Notice

42	   Copyright (c) 2011 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  4
56	   2.  BCP47 Required Information . . . . . . . . . . . . . . . . . .  4
57	     2.1.  Overview . . . . . . . . . . . . . . . . . . . . . . . . .  4
58	     2.2.  Structure  . . . . . . . . . . . . . . . . . . . . . . . .  6
59	     2.3.  Canonicalization . . . . . . . . . . . . . . . . . . . . .  7
60	     2.4.  BCP47 Registration Form  . . . . . . . . . . . . . . . . .  8
61	     2.5.  Field Definitions  . . . . . . . . . . . . . . . . . . . .  8
62	     2.6.  Registration of Field Subtags  . . . . . . . . . . . . . . 10
63	     2.7.  Registration of Additional Fields  . . . . . . . . . . . . 10
64	     2.8.  Committee Responses to Registration Proposals  . . . . . . 11
65	     2.9.  Machine-Readable Data  . . . . . . . . . . . . . . . . . . 11
66	   3.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
67	   4.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
68	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
69	   6.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
70	     6.1.  Normative References . . . . . . . . . . . . . . . . . . . 14
71	     6.2.  Informative References . . . . . . . . . . . . . . . . . . 14
72	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15

74	1.  Introduction

76	   [BCP47] permits the definition and registration of language tag
77	   extensions "that contain a language component and are compatible with
78	   applications that understand language tags".  This document defines
79	   an extension for specifying the source of content that has been
80	   transformed, including text that has been transliterated,
81	   transcribed, or translated, or in some other way influenced by the
82	   source.  It may be used in queries to request content that has been
83	   transformed.  The "singleton" identifier for this extension is 't'.

85	   Language tags, as defined by [BCP47], are useful for identifying the
86	   language of content.  There are mechanisms for specifying variant
87	   subtags for special purposes.  However, these variants are
88	   insufficient for specifying content that has undergone
89	   transformations, including content that has been transliterated,
90	   transcribed, or translated.  The correct interpretation of the
91	   content may depend upon knowledge of the conventions used for the
92	   transformation.

94	   Suppose that Italian or Russian cities on a map are transcribed for
95	   Japanese users.  Each name needs to be transliterated into katakana
96	   using rules appropriate for the specific source and target language.
97	   When tagging such data, it is important to be able to indicate not
98	   only the resulting content language ("ja" in this case), but also the
99	   source language.

101	   Transforms such as transliterations may vary depending not only on
102	   the basis of the source and target script, but also on the source and
103	   target language.  Thus the Russian <U+041F U+0443 U+0442 U+0438
104	   U+043D> (which corresponds to the Cyrillic <PE, U, TE, I, EN>)
105	   transliterates into "Putin" in English but "Poutine" in French.  The
106	   identifier could be used to indicate a desired mechanical
107	   transformation in an API, or could be used to tag data that has been
108	   converted (mechanically or by hand) according to a transliteration
109	   method.

111	   In addition, many different conventions have arisen for how to
112	   transform text, even between the same languages and scripts.  For
113	   example, "Gaddafi" is commonly transliterated from Arabic to English
114	   as any of (G/Q/K/Kh)a(d/dh/dd/dhdh/th/zz)af(i/y).  Some examples of
115	   standardized conventions used for transcribing or transliterating
116	   text include:

118	   a.  United Nations Group of Experts on Geographical Names (UNGEGN)

120	   b.  US Library of Congress (LOC)
121	   c.  US Board on Geographic Names (BGN)

123	   d.  Korean Ministry of Culture, Sports and Tourism (MCST)

125	   e.  International Organization for Standardization (ISO)

127	   The usage of this extension is not limited to formal transformations,
128	   and may include other instances where the content is in some other
129	   way influenced by the source.  For example, this extension could be
130	   used to designate a request for a speech recognizer that is tailored
131	   specifically for 2nd-language speakers who are 1st-language speakers
132	   of a particular language (e.g. a recognizer for "English spoken with
133	   a Chinese accent").

135	1.1.  Requirements Language

137	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
138	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
139	   document are to be interpreted as described in RFC 2119.

141	2.  BCP47 Required Information

143	2.1.  Overview

145	   Identification of transformed content can be done using the 't'
146	   extension defined in this document.  This extension is formed by the
147	   't' singleton followed by a sequence of subtags that would form a
148	   language tag as defined by [BCP47].  This allows for the source
149	   language or script to be specified to the degree of precision
150	   required.  There are restrictions on the sequence of subtags.  They
151	   MUST form a regular, valid, canonical language tag, and MUST neither
152	   include extensions nor private use sequences introduced by the
153	   singleton 'x'.  Where only the script is relevant (such as
154	   identifying a script-script transliteration) then 'und' is used for
155	   the primary language subtag.

157	   For example:

159	   +---------------------+---------------------------------------------+
160	   | Language Tag        | Description                                 |
161	   +---------------------+---------------------------------------------+
162	   | ja-t-it             | The content is Japanese, transformed from   |
163	   |                     | Italian.                                    |
164	   | ja-Kana-t-it        | The content is Japanese Katakana,           |
165	   |                     | transformed from Italian.                   |
166	   | und-Latn-t-und-cyrl | The content is in the Latin script,         |
167	   |                     | transformed from the Cyrillic script.       |
168	   +---------------------+---------------------------------------------+

170	   Note that the sequence of subtags governed by 't' cannot contain a
171	   singleton (a single-character subtag), because that would start a new
172	   extension.  For example, the tag "ja-t-i-ami" does not indicate that
173	   the source is in "i-ami", because "i-ami" is not a regular language
174	   tag in [BCP47].  That tag would express an empty 't' extension
175	   followed by an 'i' extension.

177	   The 't' extension is not intended for use in structured data that
178	   already provides separate source and target language identifiers.
179	   For example, this is the case in localization interchange formats
180	   such as XLIFF.  In such cases, it would be inappropriate to use "ja-
181	   t-it" for the target language tag because the source language tag
182	   "it" would already be present in the data.  Instead one would use the
183	   language tag "ja".

185	   As noted earlier, it is sometimes necessary to indicate additional
186	   information about a transformation.  This additional information is
187	   optionally supplied after the source in a series of one or more
188	   fields, where each field consists of a field separator subtag
189	   followed by one or more non-separator subtags.  Each field separator
190	   subtag consists of a single letter followed by a single digit.

192	   A transformation mechanism is an optional field that indicates the
193	   specification used for the transformation, such as "UNGEGN" for the
194	   the United Nations Group of Experts on Geographical Names
195	   transliterations and transcriptions.  It uses the 'm0' field
196	   separator followed by certain subtags.

198	   For example:

200	   +------------------------------------+------------------------------+
201	   | Language Tag                       | Description                  |
202	   +------------------------------------+------------------------------+
203	   | und-Cyrl-t-und-latn-m0-ungegn-2007 | the content is in Cyrillic,  |
204	   |                                    | transformed from Latn,       |
205	   |                                    | according to a UNGEGN        |
206	   |                                    | specification dated 2007.    |
207	   +------------------------------------+------------------------------+

209	   The field separator subtags such as 'm0' were chosen because they are
210	   short, visually distinctive, and cannot occur in a language subtag
211	   (outside of an extension and after 'x'), thus eliminating the
212	   potential for collision or confusion with the source language tag.

214	   The field subtags are defined by Section 3 [1] of Unicode Technical
215	   Standard #35: Unicode Locale Data Markup Language [UTS35] (LDML), the
216	   main specification for the Unicode Common Locale Data Repository
217	   (CLDR) project.  As required by BCP 47, subtags follow the language
218	   tag ABNF and other rules for the formation of language tags and
219	   subtags, are restricted to the ASCII letters and digits, are not case
220	   sensitive, and do not exceed eight characters in length.

222	   EDITORIAL NOTE: This new facility has been accepted by the Unicode
223	   CLDR committee for incorporation into the next versions of CLDR and
224	   LDML, parallel with the structure of the 'u' extension [RFC6067], for
225	   which it is already the maintaining authority.  The data and
226	   specification will be available by the time this internet draft has
227	   been approved.

229	   The LDML specification is available over the Internet and at no cost,
230	   and is available via a royalty-free license at
231	   http://unicode.org/copyright.html.  LDML is versioned, and each
232	   version of LDML is numbered, dated, and stable.  Extension subtags,
233	   once defined by LDML, are never retracted or substantially changed in
234	   meaning.

236	   The maintaining authority for the 't' extension is the Unicode
237	   Consortium:

239	   +---------------+---------------------------------------------------+
240	   | Item          | Value                                             |
241	   +---------------+---------------------------------------------------+
242	   | Name          | Unicode Consortium                                |
243	   | Contact Email | cldr-contact@unicode.org                          |
244	   | Discussion    | cldr-users@unicode.org                            |
245	   | List Email    |                                                   |
246	   | URL Location  | cldr.unicode.org                                  |
247	   | Specification | Unicode Technical Standard #35 Unicode Locale     |
248	   |               | Data Markup Language (LDML),                      |
249	   |               | http://unicode.org/reports/tr35/                  |
250	   | Section       | Section 3 Unicode Language and Locale Identifiers |
251	   +---------------+---------------------------------------------------+

253	2.2.  Structure

255	   The subtags in the 't' extension are of the following form:

257	   t-ext=    "t"                      ; Extension
258	             (("-" lang *("-" field)) ; Source + optional field(s)
259	             / 1*("-" field))         ; Field(s) only (no source)

261	   lang=     language                 ; BCP47, with restrictions
262	             ["-" script]
263	             ["-" region]
264	             *("-" variant)

266	   field=    sep 1*("-" 3*8alphanum)  ; With restrictions

268	   sep=      ALPHA DIGIT              ; Subtag separators
269	   alphanum= ALPHA / DIGIT

271	   where <language>, <script>, <region>, and <variant> rules are
272	   specified in [BCP47], <ALPHA> and <DIGIT> rules - in [RFC5234].

274	   Description and restrictions:

276	   a.  The 't' extension MUST have at least one subtag.

278	   b.  The 't' extension normally starts with a source language tag,
279	       which MUST be a regular, canonical language tag as specified by
280	       [BCP47].  Tags described by the 'irregular' production in BCP 47
281	       MUST NOT be used to form the language tag.  The source language
282	       tag MAY be omitted: some field values do not require it.

284	   c.  There is optionally a sequence of fields, where each field has a
285	       separator followed by a sequence of one or more subtags.  Two
286	       identical field separators MUST NOT be present in the language
287	       tag.

289	   d.  The order of the fields in a 't' extension is not significant.
290	       The order of subtags within a field is significant.  (See
291	       Section 2.3 Canonicalization.)

293	   e.  The 't' subtag fields are defined by Section 3 [1] of Unicode
294	       Technical Standard #35: Unicode Locale Data Markup Language
295	       [UTS35].

297	2.3.  Canonicalization

299	   As required by [BCP47], the use of uppercase or lowercase letters is
300	   not significant in the subtags used in this extension.  The canonical
301	   form for all subtags in the extension is lowercase, with the fields
302	   ordered by the separators, alphabetically.  The order of subtags
303	   within a field is significant, and MUST NOT be changed in the process
304	   of canonicalizing.

306	2.4.  BCP47 Registration Form

308	   Per RFC 5646, Section 3.7 [BCP47]:

310	   %%
311	   Identifier: t
312	   Description: Specifying Transformed Content
313	   Comments: Subtags for the identification of content that has been
314	   transformed, including but not limited to:
315	   transliteration, transcription, and translation.
316	   Added: 2010-mm-dd
317	   RFC: [TBD]
318	   Authority: Unicode Consortium
319	   Contact_Email: cldr-contact@unicode.org
320	   Mailing_List: cldr-users@unicode.org
321	   URL: http://www.unicode.org/Public/cldr/latest/core.zip
322	   %%

324	2.5.  Field Definitions

326	   Assignment of 't' field subtags is determined by the Unicode CLDR
327	   Technical Committee, in accordance with the policies and procedures
328	   in http://www.unicode.org/consortium/tc-procedures.html, and subject
329	   to the Unicode Consortium Policies on
330	   http://www.unicode.org/policies/policies.html.

332	   Assignments that can be made by successive versions of LDML [UTS35]
333	   by the Unicode Consortium without requiring a new RFC include:

335	   o  The allocation of new field separator subtags for use after the
336	      't' extension.

338	   o  The allocation of subtags valid after a field separator subtag.

340	   o  The addition of subtag aliases and descriptions.

342	   o  The modification of subtag descriptions.

344	   Changes to the syntax or meaning of the 't' extension would require a
345	   new RFC that obsoletes this document; such an RFC would break
346	   stability, and would thus be contrary to the policies of the Unicode
347	   Consortium.

349	   At the time this document was published, one field was specified in
350	   [UTS35]: the transform mechanism.  That field is summarized here:

352	   a.  The transform mechanism consists of a sequence of subtags
353	       starting with the 'm0' separator followed by one or more
354	       mechanism subtags.  Each mechanism subtag has a length of 3 to 8
355	       alphanumeric characters.  The sequence as a whole provides an
356	       identification of the specification for the transform, such as
357	       the mechanism subtag 'ungegn' in "und-Cyrl-t-und-latn-m0-ungegn".
358	       In many cases, only one mechanism subtag is necessary, but
359	       multiple subtags MAY be defined in [UTS35] where necessary.

361	   b.  Any purely numeric subtag is a representation of a date in the
362	       Gregorian calendar.  It MAY occur in any mechanism field, but it
363	       SHOULD only be used where necessary.  If it does occur:

365	       *  it MUST occur as the final subtag in the field

367	       *  it MUST NOT be the only subtag in the field

369	       *  it MUST only consist of a sequence of digits of the form YYYY,
370	          YYYYMM, or YYYYMMDD

372	       *  it SHOULD be as short as possible

374	       Note: The format is related to that of [RFC3339], but is not the
375	       same.  The RFC 3339 full-date won't work because it uses hyphens.
376	       The offset ("Z") is not used because the date is a publication
377	       date (aka 'floating date').  For more information, see Section
378	       3.3, Floating Time in [W3C-TimeZones].

380	   c.  Examples:

382	       *  20110623 represents June 23rd, 2011.

384	       *  There are 3 dated versions of the UNGEGN transliteration
385	          specification for Hebrew to Latin.  They can be represented by
386	          the following language tags:

388	          +  und-Hebr-t-und-Latn-m0-ungegn-1972

390	          +  und-Hebr-t-und-Latn-m0-ungegn-1977

392	          +  und-Hebr-t-und-Latn-m0-ungegn-2007

394	       *  Suppose that the BGN transliteration specification for
395	          Cyrillic to Latin had three versions, dated June 11th, 1999;
396	          Dec 30th, 1999; and May 1st, 2011.  In that case, the
397	          corresponding first two DATE subtags would require months to
398	          be distinctive (199906 and 199912), but the last subtag would
399	          only require the year (2011).

401	   d.  Some mechanisms may use a versioning system that is not
402	       distinguished by date, or not by date alone.  In the latter case,
403	       the version will be of a form specified by [UTS35] for that
404	       mechanism.  For example, if the mechanism XXX uses versions of
405	       the form v21a, then a tag could look like "ja-t-it-m0-xxx-v21a".
406	       If there are multiple subversions distinguished by date, then a
407	       tag could look like "ja-t-it-m0-xxx-v21a-2007".

409	   A language tag with the 't' extension MAY be used to request a
410	   specific transform of content.  In such a case, the recipient SHOULD
411	   return content that corresponds as closely as feasible to the
412	   requested transform, including the specification of the mechanism.
413	   For example, if the request is ja-t-it-m0-xxx-v21a-2007, and the
414	   recipient has content corresponding to both ja-t-it-m0-xxx-v21a and
415	   ja-t-it-m0-xxx-v21b-2009, then the v21a version would be preferred.
416	   As is the case for language matching as discussed in [BCP47],
417	   different implementations MAY have different measures of "closeness".

419	2.6.  Registration of Field Subtags

421	   Registration of transform mechanisms is requested by filing a ticket
422	   at cldr.unicode.org [2].  The proposal in the ticket MUST contain the
423	   following information:

425	   +-------------+-----------------------------------------------------+
426	   | Item        | Description                                         |
427	   +-------------+-----------------------------------------------------+
428	   | Subtag      | The proposed mechanism subtag (or subtag sequence). |
429	   | Description | A description of the proposed mechanism; that       |
430	   |             | description MUST be sufficient to distinguish it    |
431	   |             | from other mechanisms in use.                       |
432	   | Version     | If versioning for the mechanism is not done         |
433	   |             | according to date, then a description of the        |
434	   |             | versioning conventions used for the mechanism.      |
435	   +-------------+-----------------------------------------------------+

437	   Proposals for clarifications of descriptions or additional aliases
438	   may also be requested by filing a ticket.

440	   The committee MAY define a template for submissions that requests
441	   more information, if it is found that such information would be
442	   useful in evaluating proposals.

444	2.7.  Registration of Additional Fields

446	   In the event that it proves necessary to add an additional field
447	   (such as 'm2'), it can be requested by filing a ticket at
448	   cldr.unicode.org [2].  The proposal in the ticket MUST contain a full
449	   description of the proposed field semantics and subtag syntax, and
450	   MUST be conform to the ABNF syntax for "field" presented in
451	   Section 2.2.

453	2.8.  Committee Responses to Registration Proposals

455	   The committee MUST post each proposal publicly within 2 weeks after
456	   reception, to allow for comments.  The committee must respond
457	   publicly to each proposal within 4 weeks after reception.

459	   The response MAY:

461	   o  request more information or clarification

463	   o  accept the proposal, optionally with modifications to the subtag
464	      or description

466	   o  reject the proposal, because of significant objections raised on
467	      the mailing list or due to problems with constraints in this
468	      document or in [UTS35]

470	   Accepted tickets result in a new entry in the machine-readable CLDR
471	   BCP47 data, or in the case of a clarified description, modifications
472	   to the description attribute value for an existing entry.

474	2.9.  Machine-Readable Data

476	   EDITORIAL NOTE: The following parallels the structure used for the
477	   'u' extension [RFC6067], for which the Unicode Consortium is the
478	   maintaining authority.  The data and specification will be available
479	   by the time this internet draft has been approved.  The description
480	   field is in the process of being added to CLDR.

482	   Beginning with CLDR version 1.7.2, machine-readable files are
483	   available listing the data defined for BCP47 extensions for each
484	   successive version of [UTS35].  These releases are listed on
485	   http://cldr.unicode.org/index/downloads.  Each release has an
486	   associated data directory of the form
487	   "http://unicode.org/Public/cldr/<version>", where "<version>" is
488	   replaced by the release number.  For example, for version 1.7.2, the
489	   "core.zip" file is located at
490	   http://unicode.org/Public/cldr/1.7.2/core.zip [3].  The most recent
491	   version is always identified by the version "latest" and can be
492	   accessed by the URL in Section 2.4.

494	   Inside the "core.zip" file, the directory "common/bcp47" contains the
495	   data files listing the valid attributes, keys, and types for each
496	   successive version of [UTS35].  Each data file list the keys and
497	   types relevant to that topic.  For example, mechanism.xml contains
498	   the subtags (types) for the 't' mechanisms.

500	   The XML structure lists the keys, such as <key extension="t"
501	   name="m0" alias="collation" description="Transliteration extension
502	   mechanism">, with subelements for the types, such as <type
503	   name="ungegn" description="United Nations Group of Experts on
504	   Geographical Names"/>.  The currently defined attributes for the
505	   mechanisms include:

507	   +-------------+-------------------------------+---------------------+
508	   | Attribute   | Description                   | Examples            |
509	   +-------------+-------------------------------+---------------------+
510	   | name        | The name of the mechanism,    | UNGEGN, ALALC       |
511	   |             | limited to 3-8 characters (or |                     |
512	   |             | sequences of them).           |                     |
513	   | description | A description of the name,    | United Nations      |
514	   |             | with all and only that        | Group of Experts on |
515	   |             | information necessary to      | Geographical Names; |
516	   |             | distinguish one name from     | American Library    |
517	   |             | others with which it might be | Association-Library |
518	   |             | confused.  Descriptions are   | of Congress         |
519	   |             | not intended to provide       |                     |
520	   |             | general background            |                     |
521	   |             | information.                  |                     |
522	   | since       | Indicates the first version   | 1.9, 2.0.1          |
523	   |             | of CLDR where the name        |                     |
524	   |             | appears.  (Required for new   |                     |
525	   |             | items.)                       |                     |
526	   | alias       | Alternative name of the key   |                     |
527	   |             | or type, not limited in       |                     |
528	   |             | number of characters.         |                     |
529	   |             | Aliases are intended for      |                     |
530	   |             | backwards compatibility, not  |                     |
531	   |             | to provide all possible       |                     |
532	   |             | alternate names or            |                     |
533	   |             | designations.  (Optional)     |                     |
534	   +-------------+-------------------------------+---------------------+

536	   The file for the transform extension is "transform.xml".  The initial
537	   version of that file contains the following information.

539	   <key extension="t" name="m0" description=
540	         "Transliteration extension mechanism"/>
541	      <type name="ungegn" description=
542	         "United Nations Group of Experts on Geographical Names"/>
543	      <type name="alaloc" description=
544	         "American Library Association-Library of Congress"/>
545	      <type name="bgn" description=
546	         "US Board on Geographic Names"/>
547	      <type name="mcst" description=
548	         "Korean Ministry of Culture, Sports and Tourism"/>
549	      <type name="iso" description=
550	         "International Organization for Standardization"/>
551	      <type name="din" description=
552	         "Deutsches Institut fuer Normung"/>
553	      <type name="gost" description=
554	         "Euro-Asian Council for Standardization, Metrology
555	          and Certification"/>
556	   </key>

558	   To get the version information in XML when working with the data
559	   files, the XML parser must be validating.  When the 'core.zip' file
560	   is unzipped, the 'dtd' directory will be at the same level as the
561	   'bcp47' directory; that is required for correct validation.  For each
562	   release after CLDR 1.8, types introduced in that release are also
563	   marked in the data files by the XML attribute "since", such as in the
564	   following example:
565	   <type name="adp" since="1.9"/>

567	   The data is also currently maintained in a source code repository,
568	   with each release tagged, for viewing directly without unzipping.
569	   For example, see:

571	   o  http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/

573	   o  http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/

575	   For more information, see
576	   http://cldr.unicode.org/index/bcp47-extension.

578	3.  Acknowledgements

580	   Thanks to John Emmons and the rest of the Unicode CLDR Technical
581	   Committee for their work in developing the BCP 47 subtags for LDML.

583	4.  IANA Considerations

585	   This document will require IANA to insert the record of Section 2.4
586	   into the Language Extensions Registry, according to Section 3.7,
587	   Extensions and the Extensions Registry of "Tags for Identifying
588	   Languages" in [BCP47].  Per Section 5.2 of [BCP47], there might be
589	   occasional (rare) requests by the Unicode Consortium (the "Authority"
590	   listed in the record) for maintenance of this record.  Changes that
591	   can be submitted to IANA without the publication of a new RFC are
592	   limited to modification of the Comments, Contact_Email, Mailing_List,
593	   and URL fields.  Any such requested changes MUST use the domain
594	   'unicode.org' in any new addresses or URIs, MUST explicitly cite this
595	   document (so that IANA can reference these requirements), and MUST
596	   originate from the 'unicode.org' domain.  The domain or authority can
597	   only be changed via a new RFC.

599	   This document does not require IANA to create or maintain a new
600	   registry or otherwise impact IANA.

602	5.  Security Considerations

604	   The security considerations for this extension are the same as those
605	   for [BCP47].  See RFC 5646, Section 6, Security Considerations
606	   [BCP47].

608	6.  References

610	6.1.  Normative References

612	   [BCP47]    Davis, M., Ed. and A. Phillips, Ed., "Tags for the
613	              Identification of Language (BCP47)", September 2009.

615	   [RFC5234]  Crocker, Ed., "Augmented BNF for Syntax Specifications:
616	              ABNF", 2008.

618	   [RFC6067]  Davis, M., Ed., Phillips, A., Ed., and Y. Umaoka, Ed.,
619	              "BCP 47 Extension U", September 2010.

621	   [UTS35]    Davis, M., "Unicode Technical Standard #35: Locale Data
622	              Markup Language (LDML)", December 2007,
623	              <http://www.unicode.org/reports/tr35/>.

625	6.2.  Informative References

627	   [RFC3339]  Klyne, Ed. and Newman, Ed., "Date and Time on the
628	              Internet: Timestamps", 2002.

630	   [W3C-TimeZones]
631	              Phillips, Ed., "W3C Working Group Note: Working with Time
632	              Zones", July 2011,
633	              <http://www.w3.org/TR/2011/NOTE-timezone-20110705/>.

635	   [ldml-registry]
636	              "Registry for Common Locale Data Repository tag elements",
637	              September 2009.

639	URIs

641	   [1]  <http://unicode.org/reports/tr35/>

643	   [2]  <http://cldr.unicode.org/>

645	   [3]  <http://unicode.org/Public/cldr/1.7.2/>

647	Authors' Addresses

649	   Mark Davis
650	   Google

652	   Email: mark@macchiato.com

654	   Addison Phillips
655	   Lab126

657	   Email: addison@lab126.com

659	   Yoshito Umaoka
660	   IBM

662	   Email: yoshito_umaoka@us.ibm.com

664	   Courtney Falk
665	   Infinite Automata

667	   Email: court@infiauto.com