idnits 2.17.1 draft-davis-u-langtag-ext-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 26, 2010) is 4989 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'TBD' is mentioned on line 254, but not defined Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force M. Davis 3 Internet-Draft Google 4 Intended status: Informational A. Phillips 5 Expires: February 27, 2011 Lab126 6 Y. Umaoka 7 IBM 8 August 26, 2010 10 BCP 47 Extension U 11 draft-davis-u-langtag-ext-04 13 Abstract 15 This document specifies an Extension to BCP 47 which provides subtags 16 that specify language and/or locale-based behavior or refinements to 17 language tags, according to work done by the Unicode Consortium. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on February 27, 2011. 36 Copyright Notice 38 Copyright (c) 2010 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 3 52 2. BCP47 Required Information . . . . . . . . . . . . . . . . . . 3 53 2.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.1.1. Canonicalization . . . . . . . . . . . . . . . . . . . 6 55 2.2. Registration Form . . . . . . . . . . . . . . . . . . . . . 6 56 3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7 57 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 58 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 59 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 60 6.1. Normative References . . . . . . . . . . . . . . . . . . . 8 61 6.2. Informative References . . . . . . . . . . . . . . . . . . 8 62 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8 64 1. Introduction 66 [BCP47] permits the definition and registration of language tag 67 extensions "that contain a language component and are compatible with 68 applications that understand language tags". This document defines 69 an extension for identifying Unicode locale-based variations using 70 language tags. The "singleton" identifier for this extension is 'u'. 72 1.1. Requirements Language 74 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 75 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 76 document are to be interpreted as described in RFC 2119. 78 2. BCP47 Required Information 80 Language tags, as defined by [BCP47], are useful for identifying the 81 language of content. They are also used as locale identifiers (or 82 can be mapped to locales) in many operating environments and APIs. 83 However, many locale identifiers also require additional "tailorings" 84 or options for specific values within a language, culture, region, or 85 other variation. This extension provides a mechanism for using these 86 additional tailorings within language tags for general interchange. 88 The Unicode Consortium defines a standardized, structured set of 89 locale data and identifiers for locale data in the "Common Locale 90 Data Repository" or "CLDR". The maintaining authority for the 91 extension defined by this document is the Unicode Consortium: 93 +---------------+---------------------------------------------------+ 94 | Item | Value | 95 +---------------+---------------------------------------------------+ 96 | Name | Unicode Consortium | 97 | Contact Email | cldr@unicode.org | 98 | Discussion | cldr-users@unicode.org | 99 | List Email | | 100 | URL Location | cldr.unicode.org | 101 | Specification | Unicode Technical Standard #35 Unicode Locale | 102 | | Data Markup Language (LDML), | 103 | | http://unicode.org/reports/tr35/ | 104 | Section | Section 3 Unicode Language and Locale Identifiers | 105 +---------------+---------------------------------------------------+ 107 The specification of extension subtags is provided by Section 3, Key 108 Type Definitions of Unicode Technical Standard #35: Unicode Locale 109 Data Markup Language [UTS35]. As required by BCP 47, subtags follow 110 the language tag ABNF and other rules for the formation of language 111 tags and subtags, are restricted to the ASCII letters and digits, are 112 not case sensitive, and do not exceed eight characters in length. 113 Note that any "well-formed" language tag (see RFC 5646, Section 2.2.9 114 [BCP47]) is also a well-formed locale identifier. 116 LDML [UTS35] specifies a canonical representation. LDML is available 117 over the Internet and at no cost, and is available via a royalty-free 118 license at http://unicode.org/copyright.html. LDML is versioned, and 119 each version of LDML is numbered, dated, and stable. Extension 120 subtags, once defined by LDML, are never retracted or change in 121 meaning in a substantial way. 123 The structure of the Unicode locale extension is determined by the 124 Unicode CLDR Technical Committee, in accordance with the policies and 125 procedures in http://www.unicode.org/consortium/tc-procedures.html, 126 and subject to the Unicode Consortium Policies on 127 http://www.unicode.org/policies/policies.html. 129 Changes that can be made by successive versions of LDML [UTS35] by 130 the Unicode Consortium without requiring a new RFC include: the 131 allocation of new attributes, keywords, and types; clarifications or 132 non-material changes to an existing attribute, keyword, or type; and 133 compatible extensions to the overall syntactic structure of 134 attributes, keywords, and types. A new RFC would be required for 135 material changes to an existing attribute, keyword, or type, or an 136 incompatible change to the overall syntactic structure of attributes, 137 keywords, and types; however, such a change would be contrary to the 138 policies of the Unicode Consortium, and thus is not anticipated. 140 2.1. Summary 142 The subtags available for use in the 'u' extension consist of a set 143 of attributes, keys, and types. Attributes, keys, types, and their 144 respective meanings are defined in Section 3 (Unicode Language and 145 Locale Identifiers) of [UTS35]. The following is a summary of that 146 definition: 148 o An 'attribute' is a subtag with a length of three to eight 149 characters following the singleton and preceding any 'keyword' 150 sequences. No attributes were defined at the time of this 151 document's publication. 153 o A 'keyword' is a sequence of subtags consisting of a 'key' subtag, 154 followed by zero or more 'type' subtags (so a 'key' might appear 155 alone and not be accompanied by a 'type' subtag). A 'key' MUST 156 NOT appear more than once in a language tag's extension string. 157 The order of the 'type' subtags within a 'keyword' is sometimes 158 significant to their interpretation. 160 A. A 'key' is a subtag with a length of exactly two characters. 161 Each 'key' is followed by zero or more 'type' subtags. 163 B. A 'type' is a subtag with a length of three to eight 164 characters following a key. 'Type' subtags are specific to a 165 particular 'key' and the order of the 'type' subtags MAY be 166 significant to the interpretation of the 'keyword'. 168 For example, the language tag "de-DE-u-attr-co-phonebk" consists of: 170 o The base language tag "de-DE" (German as used in Germany), exactly 171 as defined by [BCP47] using subtags from the IANA Language Subtag 172 Registry. 174 o The singleton 'u', identifying this extension. 176 o The attribute 'attr', which is an example for illustration (no 177 attributes were defined at the time this document was published). 179 o The keyword 'co-phonebk', consisting to the key 'co' (Collation) 180 and the type 'phonebk' (Phonebook collation order). 182 Only the first occurrence of an attribute or key conveys meaning in a 183 language tag. When interpreting tags containing the Unicode locale 184 extension, duplicate attributes or keywords are ignored in the 185 following way: ignore any attribute that has already appeared in the 186 tag and ignore any keyword whose key has already occurred in the tag. 188 Successive versions of [UTS35] could define additional attributes, 189 keys, and types. Once defined, attributes, keys, and types will 190 never be removed. 192 Beginning with CLDR version 1.7.2, machine-readable files are 193 available listing the valid attributes, keys, and types for each 194 successive version of [UTS35]. These releases are listed on 195 http://cldr.unicode.org/index/downloads. Each release has an 196 associated data directory of the form 197 "http://unicode.org/Public/cldr/", where "" is 198 replaced by the release number. For example, for version 1.7.2, the 199 "core.zip" file is located at 200 http://unicode.org/Public/cldr/1.7.2/core.zip [1]. Inside the 201 "core.zip" file, the path "common/bcp47" contains the data files 202 defining the valid attributes, keys, and types. The most recent 203 version is always identified by the version "latest" and can be 204 accessed by the URL in Section 2.2. 206 To get the version information in XML when working with the data 207 files, the XML parser must be validating. When the 'core.zip' file 208 is unzipped, the 'dtd' directory will be at the same level as the 209 'bcp47' directory; that is required for correct validation. For each 210 release after CLDR 1.8, types introduced in that release are also 211 marked in the data files by the XML attribute "since", such as in the 212 following example: 213 215 The data is also currently maintained in a source code repository, 216 with each release tagged, for viewing directly without unzipping. 217 For example, see: 219 o http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/ 221 o http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/ 223 Some data in the CLDR data files might require reference to LDML 224 [UTS35]. For specific information, see Appendix Q in that document. 225 For example, LDML reserves the type 'codepoints' to define specific 226 code point ranges in Unicode for specific purposes. 228 2.1.1. Canonicalization 230 As required by [BCP47], the use of uppercase or lowercase letters is 231 not significant in the subtags used in this extension. The canonical 232 form for all subtags in the extension is lowercase. The canonical 233 order of attributes is in [US-ASCII] order (that is, numbers before 234 letters, with letters sorted as lowercase US-ASCII code points). The 235 canonical order of keywords is in [US-ASCII] order by key. The order 236 of subtags within a keyword is significant; the meaning of this 237 extension is altered if those subtags are rearranged. Thus, the 238 canonical form of the extension never reorders the subtags within a 239 keyword. 241 2.2. Registration Form 243 Per RFC 5646, Section 3.7 [BCP47]: 245 %% 246 Identifier: u 247 Description: Unicode Locale 248 Comments: Subtags for the identification of language and cultural 249 variations. Used to set behavior in locale APIs. Data is 250 located in the "common/bcp47" directory inside the referenced 251 URL. Unicode Technical Standard #35 (LDML) provides additional 252 reference material defining the keys and values. 253 Added: 2010-mm-dd 254 RFC: [TBD] 255 Authority: Unicode Consortium 256 Contact_Email: cldr-contact@unicode.org 257 Mailing_List: cldr-users@unicode.org 258 URL: http://www.unicode.org/Public/cldr/latest/core.zip 259 %% 261 3. Acknowledgements 263 Thanks to John Emmons and the rest of the Unicode CLDR Technical 264 Committee for their work in developing the BCP 47 subtags for LDML. 266 Thanks also to Doug Ewell, for his many suggestions for improvements 267 to this document. 269 4. IANA Considerations 271 This document will require IANA to insert the record in Section 2.2 272 into the Language Extensions Registry, according to Section 3.7. 273 Extensions and the Extensions Registry of "Tags for Identifying 274 Languages" in [BCP47]. Per Section 5.2 of [BCP47], there might be 275 occasional (rare) requests by the Unicode Consortium (the "Authority" 276 listed in the record) for maintenance of this record. Changes that 277 can be submitted to IANA without the publication of a new RFC are 278 limited to modification of the Comments, Contact_Email, Mailing_List, 279 and URL fields. Any such requested changes MUST use the domain 280 'unicode.org' in any new addresses or URIs, MUST explicitly cite this 281 document (so that IANA can reference these requirements), and MUST 282 originate from the 'unicode.org' domain. The domain or authority can 283 only be changed via a new RFC. 285 This document does not require IANA to create or maintain a new 286 registry or otherwise impact IANA. 288 5. Security Considerations 290 The security considerations for this extension are the same as those 291 for [BCP47]. See RFC 5646, Section 6, Security Considerations 292 [BCP47]. 294 6. References 296 6.1. Normative References 298 [BCP47] Davis, M., Ed., "Tags for the Identification of Language 299 (BCP47)", September 2009. 301 [US-ASCII] 302 International Organization for Standardization, "ISO/IEC 303 646:1991, Information technology -- ISO 7-bit coded 304 character set for information interchange.", 1991. 306 [UTS35] Davis, M., "Unicode Technical Standard #35: Locale Data 307 Markup Language (LDML)", December 2007, 308 . 310 Section 3: http://unicode.org/reports/ 311 tr35/#Unicode_Language_and_Locale_Identifiers 313 Appendix Q: http://unicode.org/reports/ 314 tr35/#Locale_Extension_Key_and_Type_Data 316 6.2. Informative References 318 [ldml-registry] 319 "Registry for Common Locale Data Repository tag elements", 320 September 2009. 322 URIs 324 [1] 326 Authors' Addresses 328 Mark Davis 329 Google 331 Email: mark@macchiato.com 332 Addison Phillips 333 Lab126 335 Email: addison@lab126.com 337 Yoshito Umaoka 338 IBM 340 Email: yoshito_umaoka@us.ibm.com