idnits 2.17.1 draft-falk-transliteration-tags-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 13, 2011) is 4700 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: '1' is defined on line 181, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 184, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 188, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 193, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 199, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 202, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 205, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group C. Falk 2 Internet Draft Infinite Automata 3 Intended status: Informational June 13, 2011 4 Expires: December 2011 6 Tags for the Identification of Transliterated Text 7 draft-falk-transliteration-tags-01.txt 9 Status of this Memo 11 This Internet-Draft is submitted in full conformance with the 12 provisions of BCP 78 and BCP 79. This document may not be modified, 13 and derivative works of it may not be created, except to publish it 14 as an RFC and to translate it into languages other than English. 16 This document may contain material from IETF Documents or IETF 17 Contributions published or made publicly available before November 18 10, 2008. The person(s) controlling the copyright in some of this 19 material may not have granted the IETF Trust the right to allow 20 modifications of such material outside the IETF Standards Process. 21 Without obtaining an adequate license from the person(s) controlling 22 the copyright in such materials, this document may not be modified 23 outside the IETF Standards Process, and derivative works of it may 24 not be created outside the IETF Standards Process, except to format 25 it for publication as an RFC or to translate it into languages other 26 than English. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six 34 months and may be updated, replaced, or obsoleted by other documents 35 at any time. It is inappropriate to use Internet-Drafts as 36 reference material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html 44 This Internet-Draft will expire on December 13, 2011. 46 Copyright Notice 48 Copyright (c) 2011 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with 56 respect to this document. 58 Abstract 60 This document describes the structure, content, creation, and 61 semantics of language tags for use in describing text that was 62 transliterated from one orthographic system to another. 64 Table of Contents 66 1. Introduction...................................................2 67 1.1. Problems Concerning Language Tags.........................2 68 1.2. Tags for Identifying Languages............................3 69 2. Transliteration Tags...........................................4 70 3. Security Considerations........................................4 71 4. IANA Considerations............................................4 72 5. Conclusions....................................................4 73 6. References.....................................................4 74 6.1. Normative References......................................4 75 6.2. Informative References....................................5 76 7. Acknowledgments................................................5 77 Appendix A. Examples of Transliteration Tags (Informative)........6 79 1. Introduction 81 1.1. Problems Concerning Language Tags 83 Language tags are a common tool used in the Internet. Such tags are 84 useful in content localization and machine translation. Many 85 different standards exist for how to represent language information 86 in machine-readable formats. 88 Existing language tags all suffer from the same problem in that they 89 represent only the language and not the orthography used in writing 90 said language. Many languages such as Russian, Chinese, and Arabic 91 have multiple orthographies for written content. A few languages, 92 including Serbian, are digraphic, which means they are natively 93 written in two or more different scripts. 95 A further complication arises when including the practice of 96 transliteration, or changing orthographies. Most often this is seen 97 when languages written in non-Latin orthographies are rewritten 98 using Latin characters. These orthographies are not mutually 99 intelligible. So to say that two different pieces of text are, 100 "Chinese written in Latin script," is not useful if one is 101 transliterated using the Wade-Giles system while the other is using 102 the Pinyin system. 104 The problems a complete language tag must address are: 106 1. Identify the content's language. 107 2. Identify the language's current orthography. 108 3. Identify the original orthography used if the content was 109 subject to transliteration. 110 4. Identify the system used in the transliteration, if the current 111 content differs from the original. 113 To date no single language tag standard can address all these 114 problems. 116 1.2. Tags for Identifying Languages 118 While there are several existing language tag standards only a 119 handful of these standards advance us toward the goal of a complete 120 language tag system. Chief among these is the RFC 5646 document as 121 edited by Phillips and Davis. RFC 5646 satisfies the first two 122 criteria of the proposed complete language tag. 124 First, RFC 5646 it represents the content's language. This is the 125 very first portion of a BCP 47 language tag. If an alpha-2 code 126 belonging to the ISO 639-1 standard is available then that code is 127 used. If no alpha-2 code is available then the longer alpha-3 code 128 belonging to the ISO 639-3 standard is used. 130 Second, RFC 5646 represents the languages current orthography. This 131 is an optional portion of the BCP 47 tag. Language orthography 132 representation is handled by the alpha-4 tags defined in the ISO 133 15924 standard. 135 What RFC 5646 doesn't address is the last two transliteration- 136 related criteria for a complete language tag. 138 2. Transliteration Tags 140 While RFC 5646 does have its shortcomings, it provides for future 141 growth and expansion through extension sub-tags. By using these 142 extension sub-tags we can add a second layer of analysis upon the 143 existing RFC 5646 tags to satisfy our transliteration tag criteria. 145 As discussed in section 1.1. , the transliteration tag needs to 146 define two additional pieces of data: 148 1. Original orthography. 149 2. The transliteration system used. 151 There will be a new extension tag for each of these pieces of data: 153 1. The original source orthography will be denoted by the 154 singleton "s" followed by the ISO 15924 for the source script. 155 2. The transliteration system will be denoted by the singleton "t" 156 followed by a 2-8 character alphanumeric string abbreviation of 157 the transliteration system. 159 3. Security Considerations 161 The transliteration tag described in this document includes 162 information about the transliteration system used. Some 163 transliteration standards are proprietary, and the information of 164 their use in a public exchange might constitute a breach of privacy. 166 4. IANA Considerations 168 There are no IANA considerations for this document. 170 5. Conclusions 172 This document shows how, using the extension mechanisms built into 173 the language tag standard of RFC 5646, a more complete way of 174 representing written languages is achieved to include any 175 transliteration performed upon the text. 177 6. References 179 6.1. Normative References 181 [1] Phillips, A. and Davis M. (Editors), "Tags for Identifying 182 Languages", BCP 47, RFC 5646, September 2009. 184 [2] International Organization for Standardization, "ISO 639- 185 1:2002. Codes for the representation of names of languages - 186 Part 1: Alpha-2 code", July 2002. 188 [3] International Organization for Standardization, "ISO 639- 189 3:2007. Codes for the representation of names of languages - 190 Part 3: Alpha-3 code for comprehensive coverage of languages", 191 February 2007. 193 [4] International Organization for Standardization, "ISO 194 15924:2004. Information and documentation -- Codes for the 195 representation of names of scripts", January 2004. 197 6.2. Informative References 199 [5] Dale, I.R.H., "Digraphia", International Journal of the 200 Sociology of Language 26 (1980) pp. 5-13. 202 [6] Buckwalter, T., "Buckwalter Arabic Transliteration", Qamus, 203 2002. 205 [7] International Organization of Standardization, "ISO 9:1995. 206 Transliteration of Cyrillic characters into Latin characters - 207 Slavic and non-Slavic languages", 1995. 209 7. Acknowledgments 211 Thanks to Tim Buckwalter of the University of Maryland for patiently 212 answering questions about his Arabic transliteration system. 214 This document was prepared using 2-Word-v2.0.template.dot. 216 Appendix A. Examples of Transliteration Tags (Informative) 218 ar-Latn-s-Arab-t-buckwalt (Arabic-language text transliterated from 219 the Arabic script into the Latin script via the Buckwalter 220 transliteration system) 222 ru-Latn-s-Cyrl-t-iso9 (Russian-language text transliterated from the 223 Cyrillic script into the Latin script via the ISO 9 transliteration 224 system) 226 zh-Latn-s-Hans-t-pinyin (Mandarin Chinese-language text 227 transliterated from the simplified Han script into the Latin script 228 via the Pinyin transliteration system) 230 Authors' Addresses 232 Courtney Falk 233 Infinite Automata 235 Email: court@infiauto.com