idnits 2.17.1 draft-bouilland-polystring-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 15, 2019) is 1896 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'ECMA' is defined on line 280, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group A. Bouilland 2 Internet-Draft 3 Intended status: Experimental February 15, 2019 4 Expires: August 5, 2019 6 Multi-languages string: Polystring 7 draft-bouilland-polystring-02 9 Abstract 11 Managing multi-languages support for a service with autonomous parts 12 can be complex. Having its internal parts be polyglot, and coalece 13 to end-user's language only on display is one solution. 15 This paper discuss a format to store, exchange, and algorithms to 16 consume multi-language strings to this goal. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on August 5, 2019. 35 Copyright Notice 37 Copyright (c) 2019 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 53 1.1. Conventions . . . . . . . . . . . . . . . . . . . . . . . 2 54 2. Polystring . . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2.1. ABNF grammar . . . . . . . . . . . . . . . . . . . . . . . 3 56 2.2. Identifier . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2.3. String . . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2.4. Base . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. Security consideration . . . . . . . . . . . . . . . . . . . . 4 60 4. Consumer algorithm . . . . . . . . . . . . . . . . . . . . . . 5 61 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 62 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 63 6.1. Normative References . . . . . . . . . . . . . . . . . . . 6 64 6.2. Informative References . . . . . . . . . . . . . . . . . . 6 65 7. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 6 67 1. Introduction 69 Managing multi-languages support for a service with different parts, 70 platforms, runtime, back-ends, and front-end; each having theirs 71 proper way to achieve this, without a standardized way to collaborate 72 between them can be complex. Having internal parts be polyglot, and 73 coalesce to end-user's language only on display is one solution to 74 this complexity. 76 A common way of storing multi-languages is to split localization into 77 different "packages", splitting strings with the same meaning and 78 formatting apart and away from theirs context. This makes 79 translation and maintenance efforts harder and more error-prone. 81 To exchange text, one part must also know the end-user's language 82 beforehand, requiring tight collaboration with other parts; 83 for example a server-side API must know client-side's language before 84 exchanging a proper response or error, or it might use integer codes. 86 This paper present a format to 1) store multi-language strings 87 keeping them together in source, and 2) exchange and consume 88 them without requiring prior knowledge of the end-user's locale. 90 1.1. Conventions 92 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 93 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", 94 "MAY", and "OPTIONAL" in this document are to be interpreted as 95 described in BCP 14 [RFC2119] [RFC8174] when, and only when, they 96 appear in all capitals, as shown here. 98 The grammatical rules in this document are to be interpreted as 99 described in [RFC5234]. 101 2. Polystring 103 The most common string format is nul-terminated [C] formatted string, 104 compatible with C-like source and descendant, [ECMA]Script, 105 [Swift], etc. 107 By extending a (nul-terminated) C-formatted string to carry multiple 108 strings-with-identifiers, it can be multi-language while being stored 109 and exchanged as a single string. The character '\' and the NUL 110 character is used to structure this format, called polystring 111 hereafter. 113 As consumers of this string must implement polystring support, 114 backward compatibility for consumers of non-poly string is not 115 deemed necessary. 117 2.1. ABNF grammar 119 Polystring = *( Identifier String *%x20 ) Base 120 Identifier = *ANYBUT5C %x5C 121 String = *%x01-FF %x00 122 Base = *ANYBUT5C %x00 123 ANYBUT5C = %x01-5B / %x5D-FF 125 This grammar represent the string in memory. 126 Note that it MUST be stored and exchanged as an escaped string. 128 2.2. Identifier 130 Identifier SHOULD be [ASCII] encoded, MUST NOT contain NUL nor '\' 131 and MUST end by '\'. Althought a custom set can be agreed upon by 132 the producer and the consumer, it is recommended for it to be an 133 IETF language [Tag]. It is compared to consumer's target identifier, 134 up-to Identifier's own length and matches if equals; so that longer 135 Identifiers should appear first or will be ignored. e.g. "pt-PT" 136 (European Portuguese) before "pt" (Brazilian Portuguese). 138 2.3. String 140 String SHOULD be [UTF-8] encoded and MUST NOT contain NUL. It is 141 paired with an Identifier, and terminated by NUL. String of the 142 first matching Identifier is choosed by the consumer for display to 143 the end-user. To improve readability, spaces following String are 144 ignored. 146 C samples : (both are equivalent) 148 "fr\\Bonjour\0it\\Ciao\0" 149 "fr\\Bonjour\0 it\\Ciao\0 " 151 2.4. Base 153 Base is the default string choosed when no Identifier matches. It 154 MUST NOT contain NUL nor '\'. It is equivalent to a non-poly 155 string, so that single-language string can be used as-is. It is 156 recommended for Base to be the "en-US" String. 158 C samples : (both are valid polystring, with the same Base) 160 "Hello" 161 "fr\\Bonjour\0 it\\Ciao\0 Hello" 163 Base cannot contain '\', but it is possible to achieve equivalent 164 functionality by using an all-match (zero-length) Identifier. As 165 it always matches in this case, Base MAY be dropped. It MAY also 166 be used to add an internal identifier or describe the usage context. 168 C samples : (first one drops the unusable Base) 170 "fr\\Avec \\ dedans\0 \\With \\ inside" 171 "fr\\Avec \\ dedans\0 \\With \\ inside\0 #1234" 172 "fr\\Avec \\ dedans\0 \\With \\ inside\0 a sample" 173 ^ 174 zero-length Identifier 176 Polystring can be used on multiple lines if the storing source 177 supports it. 179 sample for C and ECMAScript : 180 "es\\Con cada lengua que se extingue, se borra una imagen del hombre\0 \ 181 fr\\Chaque langue qui s'eteint est une image de l'homme qui s'efface\0\ 182 For every language that become extinct, an image of man disappears" 184 sample for Swift : 185 """ 186 es\\Con cada lengua que se extingue, se borra una imagen del hombre\0 \ 187 fr\\Chaque langue qui s'eteint est une image de l'homme qui s'efface\0 \ 188 For every language that become extinct, an image of man disappears 189 """ 191 3. Security consideration 193 This format is intendend to be read-only, and convey texts only. It 194 is as safe as a standard string, as long as the formating is strictly 195 respected. 197 Generating polystring dynamically MUST take care to enforce that 198 no '\' nor NUL creeps into respective parts, as it could break 199 the consumer, leading to crashes in worst case scenario. 201 4. Consumer algorithm 203 C sample : 205 const char *localize(const char *text, const char *target) 206 { 207 for (;;) 208 { 209 const char *separator = strchr(text, '\\'); 210 if (!separator) 211 return text; 213 if (!memcmp(text, target, separator - text)) 214 return separator + 1; 216 text += strlen(text) + 1; 217 while (*text == ' ') 218 ++text; 219 } 220 } 222 // usage 223 const char *lang = getenv("LANG"); 224 puts(localize("fr\\Bonjour\0 it\\Ciao\0 Hello", lang)); 226 Ecmascript sample : 228 function localize(text, target) { 229 for (;;) { 230 var sep = text.indexOf('\\') 231 if (sep < 0) 232 return text 234 var end = text.indexOf('\0') 235 if (text.substring(0, sep) == target.substring(0, sep)) 236 return text.substring(sep + 1, end) 238 text = text.substring(end + 1) 239 while (text[0] == ' ') 240 text = text.substring(1) 241 } 242 } 244 // usage 245 var lang = navigator.language || navigator.userLanguage 246 alert(localize("fr\\Bonjour\0 it\\Ciao\0 Hello", lang)) 248 Other samples can be found at : http://github.com/blld/polystring 250 5. IANA Considerations 252 This document has currently no actions for IANA. 254 6. References 256 6.1. Normative References 258 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 259 Requirement Levels", BCP 14, RFC 2119, March 1997. 261 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 262 2119 Key Words", BCP 14, RFC 8174, May 2017. 264 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 265 Specifications: ABNF", RFC 5234, January 2008. 267 [C] ISO/IEC 9899:1999, "Programming languages - C", 1999. 269 [ASCII] Cerf, V., "ASCII format for network interchange", RFC 20, 270 October 1969. 272 [Tag] Phillips, A., "Tags for Identifying Languages", BCP 47, 273 RFC 5646, September 2009. 275 [UTF-8] The Unicode Consortium, "The Unicode Standard", 276 . 278 6.2. Informative References 280 [ECMA] European Computer Manufacturers Association, "ECMAScript 281 Language Specification 9th Edition", June 2018, 282 . 285 [Swift] Apple Inc., "About Swift", 2018, 286 . 288 7. Author's Address 290 Aurelien Bouilland 291 email : aurelien.bouilland@gmail.com