Network Working Group A. Bouilland Internet-Draft Intended status: Experimental February 15, 2019 Expires: August 5, 2019 Multi-languages string: Polystring draft-bouilland-polystring-01 Abstract Managing multi-languages support for a service with autonomous parts can be complex. Having its internal parts be polyglot, and coalece to end-user's language only on display is one solution. This paper discuss a format to store, exchange, and algorithms to consume multi-language strings to this goal. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 5, 2019. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Bouilland Expires August 5, 2019 [Page 1] Internet-Draft Polystring February 2019 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Conventions . . . . . . . . . . . . . . . . . . . . . . . 2 2. Polystring . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. ABNF grammar . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Identifier . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3. String . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.4. Base . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Security consideration . . . . . . . . . . . . . . . . . . . . 4 4. Consumer algorithm . . . . . . . . . . . . . . . . . . . . . . 5 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6.1. Normative References . . . . . . . . . . . . . . . . . . . 6 6.2. Informative References . . . . . . . . . . . . . . . . . . 6 7. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 6 1. Introduction Managing multi-languages support for a service with different parts, platforms, runtime, back-ends, and front-end; each having theirs proper way to achieve this, without a standardized way to collaborate between them can be complex. Having services' internal parts be polyglot, and coalesce to end-user's language only on display is one solution to this complexity. A common way of storing multi-languages is to split localization into different "packages", splitting strings with the same meaning and formatting apart and away from theirs context. This makes translation and maintenance efforts harder and more error-prone. To exchange text, one part must also know the end-user's language beforehand, requiring tight collaboration with other parts; for example a server-side API must know client-side's language before exchanging a proper response or error, or it might use integer codes. This paper present a format to 1) store multi-language strings keeping them together in source, and 2) exchange and consume them without requiring prior knowledge of the end-user's locale. 1.1. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. The grammatical rules in this document are to be interpreted as described in [RFC5234]. Bouilland Expires August 5, 2019 [Page 2] Internet-Draft Polystring February 2019 2. Polystring The most common string format is nul-terminated [C] formatted string, compatible with C-like source and descendant, [ECMA]script, [Swift], etc. By extending a (nul-terminated) C-formatted string to carry multiple strings-with-identifiers, it can be multi-language while being stored and exchanged as a single string. The escape character sequence "\\" and the nul character sequence "\0" is used to structure this format, called polystring hereafter. As consumers of this string must implement polystring support, backward compatibility for consumer of "normal" string is not deemed necessary. 2.1. ABNF grammar Polystring = *( Identifier String *%x20 ) Base Identifier = *ANYBUT5C %x5C String = *%x01-FF %x00 Base = *ANYBUT5C %x00 ANYBUT5C = %x01-5B / %x5D-FF This grammar represent the string after parsing. Note that a it MUST be stored and exchanged as an escaped string. 2.2. Identifier Identifier SHOULD be [ASCII] encoded, MUST NOT contain NUL nor '\' and MUST end by '\'. Althought a custom "set" can be agreed upon by the producer and the consumer, it is recommended for it to be an IETF language [Tag]. It is compared to consumer's target identifier, up-to Identifier's own length and matches if equals; so that longer Identifiers should appear first or will be ignored. e.g. "pt-PT" (European Portuguese) before "pt" (Brazilian Portuguese). 2.3. String String SHOULD be [UTF-8] encoded and MUST NOT contain NUL. It is paired with an Identifier, and terminated by NUL. String of the first matching Identifier is choosed by the consumer for display to the end-user. To improve readability, spaces following String are ignored. C samples : (both are equivalent) "fr\\Bonjour\0it\\Ciao\0" "fr\\Bonjour\0 it\\Ciao\0 " Bouilland Expires August 5, 2019 [Page 3] Internet-Draft Polystring February 2019 2.4. Base Base is the default string choosed when no Identifier matches. It MUST NOT contain NUL nor '\'. It is equivalent to a normal (nul-terminated) string, so that single-language string can be used as-is. It is recommended for Base to be the "en-US" String. C samples : (both are valid polystring, with the same Base) "Hello" "fr\\Bonjour\0 it\\Ciao\0 Hello" Base cannot contain '\', but it is possible to achieve equivalent functionality by using an all-match (zero-length) Identifier. As it always matches in this case, Base MAY be dropped. It MAY also be used to add an internal identifier or describe the usage context. C samples : (first one drops the unusable Base) "fr\\Avec \\ dedans\0 \\With \\ inside" "fr\\Avec \\ dedans\0 \\With \\ inside\0 #1234" "fr\\Avec \\ dedans\0 \\With \\ inside\0 a sample" ^ zero-length Identifier Polystring can be used on multiple lines if the storing source supports it. sample for C, Ecmascript : "es\\Con cada lengua que se extingue, se borra una imagen del hombre\0 \ fr\\Chaque langue qui s'eteint est une image de l'homme qui s'efface\0\ For every language that become extinct, an image of man disappears" sample for Swift : """ es\\Con cada lengua que se extingue, se borra una imagen del hombre\0 \ fr\\Chaque langue qui s'eteint est une image de l'homme qui s'efface\0 \ For every language that become extinct, an image of man disappears """ 3. Security consideration This format is intendend to be read-only, and convey texts only. It is as safe as a standard string, as long as the formating is strictly respected. Generating polystring dynamically MUST take care to enforce that no '\' nor NUL creeps into respective parts, as it could break the consumer, leading to crashes in worst case scenario. Bouilland Expires August 5, 2019 [Page 4] Internet-Draft Polystring February 2019 4. Consumer algorithm C sample : const char *localize(const char *text, const char *target) { for (;;) { const char *separator = strchr(text, '\\'); if (!separator) return text; if (!memcmp(text, target, separator - text)) return separator + 1; text += strlen(text) + 1; while (*text == ' ') ++text; } } // usage const char *lang = getenv("LANG"); puts(localize("fr\\Bonjour\0 it\\Ciao\0 Hello", lang)); Ecmascript sample : function localize(text, target) { for (;;) { var sep = text.indexOf('\\') if (sep < 0) return text var end = text.indexOf('\0') if (text.substring(0, sep) == target.substring(0, sep)) return text.substring(sep + 1, end) text = text.substring(end + 1) while (text[0] == ' ') text = text.substring(1) } } // usage var lang = navigator.language || navigator.userLanguage alert(localize("fr\\Bonjour\0 it\\Ciao\0 Hello", lang)) Other samples can be found at : http://github.com/blld/polystring Bouilland Expires August 5, 2019 [Page 5] Internet-Draft Polystring February 2019 5. IANA Considerations This document has currently no actions for IANA. 6. References 6.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, May 2017. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 5234, January 2008. [C] ISO/IEC 9899:1999, "Programming languages - C", 1999. [ASCII] Cerf, V., "ASCII format for network interchange", RFC 20, October 1969. [Tag] Phillips, A., "Tags for Identifying Languages", BCP 47, RFC 5646, September 2009. [UTF-8] The Unicode Consortium, "The Unicode Standard", . 6.2. Informative References [ECMA] European Computer Manufacturers Association, "ECMAScript Language Specification 9th Edition", June 2018, . [JSON] Bray, T., "The JavaScript Object Notation (JSON) Data Interchange Format", RFC 8259, December 2017. [Swift] Apple Inc., "About Swift", 2018, . 7. Author's Address Aurelien Bouilland email : aurelien.bouilland@gmail.com Bouilland Expires August 5, 2019 [Page 6]