idnits 2.17.1
draft-bouilland-polystring-02.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
No issues found here.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (February 15, 2019) is 1896 days in the past. Is this
intentional?
Checking references for intended status: Experimental
----------------------------------------------------------------------------
== Unused Reference: 'ECMA' is defined on line 280, but no explicit
reference was found in the text
Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
1 Network Working Group A. Bouilland
2 Internet-Draft
3 Intended status: Experimental February 15, 2019
4 Expires: August 5, 2019
6 Multi-languages string: Polystring
7 draft-bouilland-polystring-02
9 Abstract
11 Managing multi-languages support for a service with autonomous parts
12 can be complex. Having its internal parts be polyglot, and coalece
13 to end-user's language only on display is one solution.
15 This paper discuss a format to store, exchange, and algorithms to
16 consume multi-language strings to this goal.
18 Status of This Memo
20 This Internet-Draft is submitted in full conformance with the
21 provisions of BCP 78 and BCP 79.
23 Internet-Drafts are working documents of the Internet Engineering
24 Task Force (IETF). Note that other groups may also distribute
25 working documents as Internet-Drafts. The list of current Internet-
26 Drafts is at http://datatracker.ietf.org/drafts/current/.
28 Internet-Drafts are draft documents valid for a maximum of six months
29 and may be updated, replaced, or obsoleted by other documents at any
30 time. It is inappropriate to use Internet-Drafts as reference
31 material or to cite them other than as "work in progress."
33 This Internet-Draft will expire on August 5, 2019.
35 Copyright Notice
37 Copyright (c) 2019 IETF Trust and the persons identified as the
38 document authors. All rights reserved.
40 This document is subject to BCP 78 and the IETF Trust's Legal
41 Provisions Relating to IETF Documents
42 (http://trustee.ietf.org/license-info) in effect on the date of
43 publication of this document. Please review these documents
44 carefully, as they describe your rights and restrictions with respect
45 to this document. Code Components extracted from this document must
46 include Simplified BSD License text as described in Section 4.e of
47 the Trust Legal Provisions and are provided without warranty as
48 described in the Simplified BSD License.
50 Table of Contents
52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
53 1.1. Conventions . . . . . . . . . . . . . . . . . . . . . . . 2
54 2. Polystring . . . . . . . . . . . . . . . . . . . . . . . . . . 3
55 2.1. ABNF grammar . . . . . . . . . . . . . . . . . . . . . . . 3
56 2.2. Identifier . . . . . . . . . . . . . . . . . . . . . . . . 3
57 2.3. String . . . . . . . . . . . . . . . . . . . . . . . . . . 3
58 2.4. Base . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
59 3. Security consideration . . . . . . . . . . . . . . . . . . . . 4
60 4. Consumer algorithm . . . . . . . . . . . . . . . . . . . . . . 5
61 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
62 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
63 6.1. Normative References . . . . . . . . . . . . . . . . . . . 6
64 6.2. Informative References . . . . . . . . . . . . . . . . . . 6
65 7. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 6
67 1. Introduction
69 Managing multi-languages support for a service with different parts,
70 platforms, runtime, back-ends, and front-end; each having theirs
71 proper way to achieve this, without a standardized way to collaborate
72 between them can be complex. Having internal parts be polyglot, and
73 coalesce to end-user's language only on display is one solution to
74 this complexity.
76 A common way of storing multi-languages is to split localization into
77 different "packages", splitting strings with the same meaning and
78 formatting apart and away from theirs context. This makes
79 translation and maintenance efforts harder and more error-prone.
81 To exchange text, one part must also know the end-user's language
82 beforehand, requiring tight collaboration with other parts;
83 for example a server-side API must know client-side's language before
84 exchanging a proper response or error, or it might use integer codes.
86 This paper present a format to 1) store multi-language strings
87 keeping them together in source, and 2) exchange and consume
88 them without requiring prior knowledge of the end-user's locale.
90 1.1. Conventions
92 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
93 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
94 "MAY", and "OPTIONAL" in this document are to be interpreted as
95 described in BCP 14 [RFC2119] [RFC8174] when, and only when, they
96 appear in all capitals, as shown here.
98 The grammatical rules in this document are to be interpreted as
99 described in [RFC5234].
101 2. Polystring
103 The most common string format is nul-terminated [C] formatted string,
104 compatible with C-like source and descendant, [ECMA]Script,
105 [Swift], etc.
107 By extending a (nul-terminated) C-formatted string to carry multiple
108 strings-with-identifiers, it can be multi-language while being stored
109 and exchanged as a single string. The character '\' and the NUL
110 character is used to structure this format, called polystring
111 hereafter.
113 As consumers of this string must implement polystring support,
114 backward compatibility for consumers of non-poly string is not
115 deemed necessary.
117 2.1. ABNF grammar
119 Polystring = *( Identifier String *%x20 ) Base
120 Identifier = *ANYBUT5C %x5C
121 String = *%x01-FF %x00
122 Base = *ANYBUT5C %x00
123 ANYBUT5C = %x01-5B / %x5D-FF
125 This grammar represent the string in memory.
126 Note that it MUST be stored and exchanged as an escaped string.
128 2.2. Identifier
130 Identifier SHOULD be [ASCII] encoded, MUST NOT contain NUL nor '\'
131 and MUST end by '\'. Althought a custom set can be agreed upon by
132 the producer and the consumer, it is recommended for it to be an
133 IETF language [Tag]. It is compared to consumer's target identifier,
134 up-to Identifier's own length and matches if equals; so that longer
135 Identifiers should appear first or will be ignored. e.g. "pt-PT"
136 (European Portuguese) before "pt" (Brazilian Portuguese).
138 2.3. String
140 String SHOULD be [UTF-8] encoded and MUST NOT contain NUL. It is
141 paired with an Identifier, and terminated by NUL. String of the
142 first matching Identifier is choosed by the consumer for display to
143 the end-user. To improve readability, spaces following String are
144 ignored.
146 C samples : (both are equivalent)
148 "fr\\Bonjour\0it\\Ciao\0"
149 "fr\\Bonjour\0 it\\Ciao\0 "
151 2.4. Base
153 Base is the default string choosed when no Identifier matches. It
154 MUST NOT contain NUL nor '\'. It is equivalent to a non-poly
155 string, so that single-language string can be used as-is. It is
156 recommended for Base to be the "en-US" String.
158 C samples : (both are valid polystring, with the same Base)
160 "Hello"
161 "fr\\Bonjour\0 it\\Ciao\0 Hello"
163 Base cannot contain '\', but it is possible to achieve equivalent
164 functionality by using an all-match (zero-length) Identifier. As
165 it always matches in this case, Base MAY be dropped. It MAY also
166 be used to add an internal identifier or describe the usage context.
168 C samples : (first one drops the unusable Base)
170 "fr\\Avec \\ dedans\0 \\With \\ inside"
171 "fr\\Avec \\ dedans\0 \\With \\ inside\0 #1234"
172 "fr\\Avec \\ dedans\0 \\With \\ inside\0 a sample"
173 ^
174 zero-length Identifier
176 Polystring can be used on multiple lines if the storing source
177 supports it.
179 sample for C and ECMAScript :
180 "es\\Con cada lengua que se extingue, se borra una imagen del hombre\0 \
181 fr\\Chaque langue qui s'eteint est une image de l'homme qui s'efface\0\
182 For every language that become extinct, an image of man disappears"
184 sample for Swift :
185 """
186 es\\Con cada lengua que se extingue, se borra una imagen del hombre\0 \
187 fr\\Chaque langue qui s'eteint est une image de l'homme qui s'efface\0 \
188 For every language that become extinct, an image of man disappears
189 """
191 3. Security consideration
193 This format is intendend to be read-only, and convey texts only. It
194 is as safe as a standard string, as long as the formating is strictly
195 respected.
197 Generating polystring dynamically MUST take care to enforce that
198 no '\' nor NUL creeps into respective parts, as it could break
199 the consumer, leading to crashes in worst case scenario.
201 4. Consumer algorithm
203 C sample :
205 const char *localize(const char *text, const char *target)
206 {
207 for (;;)
208 {
209 const char *separator = strchr(text, '\\');
210 if (!separator)
211 return text;
213 if (!memcmp(text, target, separator - text))
214 return separator + 1;
216 text += strlen(text) + 1;
217 while (*text == ' ')
218 ++text;
219 }
220 }
222 // usage
223 const char *lang = getenv("LANG");
224 puts(localize("fr\\Bonjour\0 it\\Ciao\0 Hello", lang));
226 Ecmascript sample :
228 function localize(text, target) {
229 for (;;) {
230 var sep = text.indexOf('\\')
231 if (sep < 0)
232 return text
234 var end = text.indexOf('\0')
235 if (text.substring(0, sep) == target.substring(0, sep))
236 return text.substring(sep + 1, end)
238 text = text.substring(end + 1)
239 while (text[0] == ' ')
240 text = text.substring(1)
241 }
242 }
244 // usage
245 var lang = navigator.language || navigator.userLanguage
246 alert(localize("fr\\Bonjour\0 it\\Ciao\0 Hello", lang))
248 Other samples can be found at : http://github.com/blld/polystring
250 5. IANA Considerations
252 This document has currently no actions for IANA.
254 6. References
256 6.1. Normative References
258 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
259 Requirement Levels", BCP 14, RFC 2119, March 1997.
261 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
262 2119 Key Words", BCP 14, RFC 8174, May 2017.
264 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
265 Specifications: ABNF", RFC 5234, January 2008.
267 [C] ISO/IEC 9899:1999, "Programming languages - C", 1999.
269 [ASCII] Cerf, V., "ASCII format for network interchange", RFC 20,
270 October 1969.
272 [Tag] Phillips, A., "Tags for Identifying Languages", BCP 47,
273 RFC 5646, September 2009.
275 [UTF-8] The Unicode Consortium, "The Unicode Standard",
276 .
278 6.2. Informative References
280 [ECMA] European Computer Manufacturers Association, "ECMAScript
281 Language Specification 9th Edition", June 2018,
282 .
285 [Swift] Apple Inc., "About Swift", 2018,
286 .
288 7. Author's Address
290 Aurelien Bouilland
291 email : aurelien.bouilland@gmail.com