idnits 2.17.1 

draft-ietf-acap-mlsf-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-23) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 3 instances of too long lines in the document, the longest one
     being 2 characters in excess of 72.

  ** The abstract seems to contain references ([UTF-8], [IAB-CHARSET]), which
     it shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 1997) is 9809 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'UTF-8' is mentioned on line 46, but not defined

  == Missing Reference: 'LANG-TAG' is mentioned on line 122, but not defined

  == Missing Reference: 'MLSF-LANG-TAG' is mentioned on line 153, but not
     defined

  -- Looks like a reference, but probably isn't: '256' on line 487

  -- Looks like a reference, but probably isn't: '1' on line 465

  -- Looks like a reference, but probably isn't: '0' on line 465

  == Unused Reference: 'MIME-IMB' is defined on line 247, but no explicit
     reference was found in the text

  == Unused Reference: 'UTF8' is defined on line 257, but no explicit
     reference was found in the text

  -- No information found for draft-ietf-drums-abnf-xx - is the name correct?

  -- Possible downref: Normative reference to a draft: ref. 'ABNF' 

  ** Downref: Normative reference to an Informational RFC: RFC 1896 (ref.
     'ENRICHED')

  ** Obsolete normative reference: RFC 2070 (ref. 'HTML-I18N') (Obsoleted by
     RFC 2854)

  ** Downref: Normative reference to an Informational RFC: RFC 2130 (ref.
     'IAB-CHARSET')

  ** Obsolete normative reference: RFC 2060 (ref. 'IMAP4') (Obsoleted by RFC
     3501)

  ** Obsolete normative reference: RFC 1766 (ref. 'LANG-TAGS') (Obsoleted by
     RFC 3066, RFC 3282)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MIME-LANG'

  ** Obsolete normative reference: RFC 2044 (ref. 'UTF8') (Obsoleted by RFC
     2279)


     Summary: 16 errors (**), 0 flaws (~~), 7 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          C. Newman
3	Internet Draft: Multi-Lingual String Format                     Innosoft
4	Document: draft-ietf-acap-mlsf-01.txt                          June 1997
5	                                                   Expires in six months

7	                   Multi-Lingual String Format (MLSF)

9	Status of this memo

11	     This document is an Internet Draft.  Internet Drafts are working
12	     documents of the Internet Engineering Task Force (IETF), its Areas,
13	     and its Working Groups.  Note that other groups may also distribute
14	     working documents as Internet Drafts.

16	     Internet Drafts are draft documents valid for a maximum of six
17	     months.  Internet Drafts may be updated, replaced, or obsoleted by
18	     other documents at any time.  It is not appropriate to use Internet
19	     Drafts as reference material or to cite them other than as a
20	     "working draft" or "work in progress".

22	     To learn the current status of any Internet-Draft, please check the
23	     1id-abstracts.txt listing contained in the Internet-Drafts Shadow
24	     Directories on ds.internic.net, nic.nordu.net, ftp.isi.edu, or
25	     munnari.oz.au.

27	     A revised version of this draft document will be submitted to the
28	     RFC editor as a Proposed Standard for the Internet Community.
29	     Discussion and suggestions for improvement are requested.  This
30	     document will expire six months after publication.  Distribution of
31	     this draft is unlimited.

33	Abstract

35	     The IAB charset workshop [IAB-CHARSET] concluded that for human
36	     readable text there should always be a way to specify the natural
37	     language.  Many protocols are designed with an attribute-value
38	     model (including RFC 822, HTTP, LDAP, SNMP, DHCP, and ACAP) which
39	     stores many small human readable text strings.  The primary
40	     function of an attribute-value model is to simplify both
41	     extensibility and searchability.  A solution is needed to provide
42	     language tags in these small human readable text strings, which
43	     does not interfere with these primary functions.

45	     This specification defines MLSF (Multi-Lingual String Format) which
46	     applies another layer of encoding on top of UTF-8 [UTF-8] to permit
47	     the addition of language tags anywhere within a text string.  In
48	     addition, it defines an alternate form which can be used to include
49	     alternative representations of the same text in different character
50	     sets.  MLSF has the property that UTF-8 is a proper subset of MLSF.
51	     This preserves the searchability requirement of the attribute-value
52	     model.

54	     Appendix F of this document includes a brief discussion of the
55	     background behind MLSF and why some other potential solutions were
56	     rejected for this purpose.

58	1. Conventions used in this document

60	     The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
61	     in this document are to be interpreted as defined in "Key words for
62	     use in RFCs to Indicate Requirement Levels" [KEYWORDS].

64	2. MLSF simple form

66	     MLSF uses "Tags for the Identification of Languages" [LANG-TAGS] as
67	     the basis for language identification.

69	     Language tags are encoded by mapping them to upper-case, then
70	     adding hexadecimal A0 to each octet.  The result is broken up into
71	     groups of five octets followed by a final group of five or fewer
72	     octets.  Each group is prefixed by a UTF-8-style length count with
73	     the low bits set to 0.  See Appendix D for sample source code to
74	     perform this conversion.

76	     MLSF simple form is defined by the MLSF-SIMPLE rule in section 7.
77	     A quoted version of MLSF simple form is defined by the MLSF-
78	     SIMPLE-QUOTED rule.

80	     Note that MLSF is not compatible with UTF-8.  A program which uses
81	     MLSF MUST downconvert it to UTF-8 prior to using it in a context
82	     where UTF-8 is required.  Sample code for this down conversion is
83	     included in Appendix B.

85	3. MLSF alternative form

87	     A MLSF alternative form string may contain alternative
88	     representations of the same text in different primary languages.
89	     The octet with hexadecimal representation of FE is used to
90	     introduce a new alternative.  This MUST be followed by a MLSF
91	     language tag for the primary language of the alternative.

93	     The component of the MLSF string prior to the first FE octet is
94	     considered the "preferred" representation for the string.  This is
95	     the version which will be displayed by MLSF clients which choose
96	     not to support alternative representations.  The preferred
97	     representation MAY be prefixed by a MLSF language tag.

99	     MLSF alternate form is defined by the MLSF-ALT rule in section 7.
100	     A quoted version of MLSF alternate form is defined by the
101	     MLSF-ALT-QUOTED rule.

103	     Note that MLSF alternate form is not compatible with UTF-8.  A
104	     program which uses MLSF MUST downconvert it to UTF-8 prior to using
105	     it in a context where UTF-8 is required.  Sample code for this down
106	     conversion is included in Appendix B.

108	4. MLSF MIME character sets

110	     The character set label "XXXX-simple" will be registered to
111	     indicate the use of MLSF simple form.  The character set label
112	     "XXXX-alt" will be registered to indicate the use of MLSF alternate
113	     form.

115	     MLSF may be used in conjunction with MIME header [MIME-HDR]
116	     encoding to permit language tagging and alternative representations
117	     in header fields.  A work in progress [MIME-LANG] will propose a
118	     mechanism for language tagging in headers which is not dependent on
119	     the use of UTF-8.

121	     For single language MIME body parts, the UTF-8 character set with
122	     an appropriate Content-Language [LANG-TAG] header SHOULD be used
123	     instead of MLSF.  Text/enriched [ENRICHED] or HTML with language
124	     tags [HTML-I18N] are preferred to using MLSF for MIME bodies when
125	     possible.

127	5. Security Considerations

129	     Multi-Lingual String Format is not believed to have any security
130	     considerations beyond those for simple US-ASCII strings.  In
131	     particular, unfiltered display of certain US-ASCII control
132	     characters by a terminal emulator may result in modifying the
133	     behavior of the terminal emulator (e.g. by redefining function
134	     keys) such that security can be breached.  Programs which display
135	     text to a potentially insecure terminal emulator channel are
136	     encouraged to remove control characters to avoid these problems.

138	6. Formal Grammar

140	     This section defines the formal grammar for MLSF using Augmented
141	     BNF [ABNF] notation.

143	     MLSF-ALT           = [[MLSF-LANG-TAG] MLSF-COMPONENT
144	                           *(MLSF-ALTERNATE MLSF-COMPONENT)]

146	     MLSF-ALT-QUOTED    = <"> [[MLSF-LANG-TAG] MLSF-COMPONENT-Q
147	                           *(MLSF-ALTERNATE MLSF-COMPONENT-Q)] <">

149	     MLSF-ALTERNATE     = %xFE MLSF-LANG-TAG

151	     MLSF-COMPONENT     = UTF8-NON-NUL *([MLSF-LANG-TAG] UTF8-NON-NUL)

153	     MLSF-COMPONENT-Q   = UTF8-QUOTED *([MLSF-LANG-TAG] UTF8-QUOTED)

155	     MLSF-LANG-TAG      = *MLSF-LANG-5 (MLSF-LANG-1 / MLSF-LANG-2 /
156	                          MLSF-LANG-3 / MLSF-LANG-4 / MLSF-LANG-5)
157	                          ;; Encoded version of Language-Tag from RFC 1766
158	                          ;; characters converted to uppercase, with
159	                          ;; A0 added and broken into MLSF-LANG components

161	     MLSF-LANG-CONT     = %xCD / %xE1..FA

163	     MLSF-LANG-1        = %xC0 MLSF-LANG-CONT

165	     MLSF-LANG-2        = %xE0 2MLSF-LANG-CONT

167	     MLSF-LANG-3        = %xF0 3MLSF-LANG-CONT

169	     MLSF-LANG-4        = %xF8 4MLSF-LANG-CONT

171	     MLSF-LANG-5        = %xFC 5MLSF-LANG-CONT

173	     MLSF-SIMPLE        = [[MLSF-LANG-TAG] MLSF-COMPONENT]

175	     MLSF-SIMPLE-QUOTED = <"> [[MLSF-LANG-TAG] MLSF-COMPONENT-Q] <">

177	     QUOTED             = "\" QUOTED-SPECIAL

179	     QUOTED-SPECIAL     = "\" / <">

181	     US-ASCII-SAFE      = %x01..09 / %x0B..0C / %x0E..21
182	                          / %x23..5B / %x5D..7F
183	                         ;; US-ASCII except QUOTED-SPECIALs, CR, LF, NUL

185	     UTF8-NON-NUL       = UTF8-SAFE / CR / LF / QUOTED-SPECIAL
186	     UTF8-QUOTED        = UTF8-SAFE / QUOTED

188	     UTF8-SAFE          = US-ASCII-SAFE / UTF8-1 / UTF8-2 / UTF8-3
189	                          / UTF8-4 / UTF8-5

191	     UTF8-CONT          = %x80..BF

193	     UTF8-1             = %xC0..DF UTF8-CONT

195	     UTF8-2             = %xE0..EF 2UTF8-CONT

197	     UTF8-3             = %xF0..F7 3UTF8-CONT

199	     UTF8-4             = %xF8..FB 4UTF8-CONT

201	     UTF8-5             = %xFC..FD 5UTF8-CONT

203	7. References

205	     [ABNF] Crocker, D., "Augmented BNF for Syntax Specifications:
206	     ABNF", Work in progress: draft-ietf-drums-abnf-xx.txt

208	     [ENRICHED] Resnick, Walker, "The text/enriched MIME Content-type",
209	     RFC 1896, Qualcomm, InterCon, February 1996.

211	         <ftp://ds.internic.net/rfc/rfc1896.txt>

213	     [HTML-I18N] Yergeau, Nicol, Adams, Duerst, "Internationalization of
214	     the Hypertext Markup Language", RFC 2070,  Alis Technologies,
215	     Electronic Book Technologies, Spyglass, University of Zurich,
216	     January 1997.

218	         <ftp://ds.internic.net/rfc/rfc2070.txt>

220	     [IAB-CHARSET] Weider, Preston, Simonsen, Alvestrand, Atkinson,
221	     Crispin, Svanberg, "The Report of the IAB Character Set Workshop
222	     held 29 February - 1 March, 1996", RFC 2130, April 1997.

224	         <ftp://ds.internic.net/rfc/rfc2130.txt>

226	     [IMAP4] Crispin, "Internet Message Access Protocol - Version
227	     4rev1", RFC 2060, University of Washington, December 1996.

229	         <ftp://ds.internic.net/rfc/rfc2060.txt>

231	     [KEYWORDS] Bradner, "Key words for use in RFCs to Indicate
232	     Requirement Levels", RFC 2119, Harvard University, March 1997.

234	         <ftp://ds.internic.net/rfc/rfc2119.txt>

236	     [LANG-TAGS] Alvestrand, H., "Tags for the Identification of
237	     Languages", RFC 1766.

239	         <ftp://ds.internic.net/rfc/rfc1766.txt>

241	     [MIME-HDR] Moore, "MIME (Multipurpose Internet Mail Extensions)
242	     Part Three: Message Header Extensions for Non-ASCII Text", RFC
243	     2047, University of Tennessee, November 1996.

245	         <ftp://ds.internic.net/rfc/rfc2047.txt>

247	     [MIME-IMB] Freed, Borenstein, "Multipurpose Internet Mail
248	     Extensions (MIME) Part One: Format of Internet Message Bodies", RFC
249	     2045, Innosoft, First Virtual, November 1996.

251	         <ftp://ds.internic.net/rfc/rfc2045.txt>

253	     [MIME-LANG] Freed, Moore, "MIME Parameter Value and Encoded Words:
254	     Character Sets, Language, and Continuations", work in progress,
255	     March 1997.

257	     [UTF8] Yergeau, F. "UTF-8, a transformation format of Unicode and
258	     ISO 10646", RFC 2044, Alis Technologies, October 1996.

260	         <ftp://ds.internic.net/rfc/rfc2044.txt>

262	8. Acknowledgements

264	     Special thanks to Mark Crispin for the idea of using unused UTF-8
265	     codes for this purpose.   Thanks are also due to participants of
266	     the ACAP WG mailing list who helped review this proposal.

268	9. Author's Address

270	     Chris Newman
271	     Innosoft International, Inc.
272	     1050 East Garvey Ave. South
273	     West Covina, CA 91790 USA

275	     Email: chris.newman@innosoft.com

277	Appendix A.  Client advice

279	     A simple UTF-8 client is likely to find the source code in Appendix
280	     B useful.  A simple Latin-1 based client is likely to find the
281	     source code in Appendix C useful.

283	     A more sophisticated client will allow the user to select a
284	     preferred language and use something like the source code in
285	     Appendix E to find the best alternative in an MLSF string.  Such
286	     clients should also be aware that sometimes the client's preferred
287	     language is misconfigured, and the user may wish to have the last
288	     few messages repeated after they have changed languages.  For this
289	     reason, such a client may wish to cache the last few MLSF strings
290	     displayed to the user.

292	Appendix B.  Sample code to convert to UTF-8

294	Here is sample C source code to convert from MLSF to UTF-8.

296	#include <stdio.h>
297	#include <ctype.h>

299	/* a UTF8 lookup table */
300	#define BAD 0x80
301	#define SEP 0x40
302	#define EXT 0x20
303	static unsigned char utlen[256] = {
304	        /* 0x00 */ BAD,   1,   1,   1,   1,   1,   1,   1,
305	        /* 0x08 */   1,   1,   1,   1,   1,   1,   1,   1,
306	        /* 0x10 */   1,   1,   1,   1,   1,   1,   1,   1,
307	        /* 0x18 */   1,   1,   1,   1,   1,   1,   1,   1,
308	        /* 0x20 */   1,   1,   1,   1,   1,   1,   1,   1,
309	        /* 0x28 */   1,   1,   1,   1,   1,   1,   1,   1,
310	        /* 0x30 */   1,   1,   1,   1,   1,   1,   1,   1,
311	        /* 0x38 */   1,   1,   1,   1,   1,   1,   1,   1,
312	        /* 0x40 */   1,   1,   1,   1,   1,   1,   1,   1,
313	        /* 0x48 */   1,   1,   1,   1,   1,   1,   1,   1,
314	        /* 0x50 */   1,   1,   1,   1,   1,   1,   1,   1,
315	        /* 0x58 */   1,   1,   1,   1,   1,   1,   1,   1,
316	        /* 0x60 */   1,   1,   1,   1,   1,   1,   1,   1,
317	        /* 0x68 */   1,   1,   1,   1,   1,   1,   1,   1,
318	        /* 0x70 */   1,   1,   1,   1,   1,   1,   1,   1,
319	        /* 0x78 */   1,   1,   1,   1,   1,   1,   1,   1,
320	        /* 0x80 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT,
321	        /* 0x88 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT,
322	        /* 0x90 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT,
323	        /* 0x98 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT,
324	        /* 0xA0 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT,
325	        /* 0xA8 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT,
326	        /* 0xB0 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT,
327	        /* 0xB8 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT,
328	        /* 0xC0 */   2,   2,   2,   2,   2,   2,   2,   2,
329	        /* 0xC8 */   2,   2,   2,   2,   2,   2,   2,   2,
330	        /* 0xD0 */   2,   2,   2,   2,   2,   2,   2,   2,
331	        /* 0xD8 */   2,   2,   2,   2,   2,   2,   2,   2,
332	        /* 0xE0 */   3,   3,   3,   3,   3,   3,   3,   3,
333	        /* 0xE8 */   3,   3,   3,   3,   3,   3,   3,   3,
334	        /* 0xF0 */   4,   4,   4,   4,   4,   4,   4,   4,
335	        /* 0xF8 */   5,   5,   5,   5,   6,   6, SEP, BAD
336	};
337	/* Down conversion from NUL terminated MLSF string to UTF-8.
338	 *  this strips the language tags and only keeps the preferred
339	 *  representation.
340	 * It returns the length of the final string.
341	 * The destination string will not be longer than the source string.
342	 *  dst and src may be the same for in-place conversion.
343	 */
344	int MLSFtoUTF8(unsigned char *dst, unsigned char *src)
345	{
346	    unsigned char *start = dst;
347	    int len;

349	    for (;;) {
350	        len = utlen[*src];
351	        if (len > 6) break;
352	        /* skip language tags */
353	        if (len > 1 && src[1] > 0xC0U) {
354	            while (len && *src != '\0') {
355	                ++src;
356	                --len;
357	            }
358	            continue;
359	        }
360	        /* copy UTF8 character */
361	        while (len && *src != '\0') {
362	            *dst = *src;
363	            ++dst;
364	            ++src;
365	            --len;
366	        }
367	    }
368	    *dst = '\0';

370	    return (dst - start);
371	}
372	Appendix C. Sample code to convert to Latin-1

374	/* Down conversion from NUL terminated MLSF string to 8859-1
375	 * The destination string will not be longer than the source string.
376	 *  fillc is used to fill untranslatable characters,
377	 *  if fillc is NUL, untranslatable characters are ignored.
378	 * returns 0 if source only contained latin-1, returns -1 otherwise.
379	 */
380	int MLSFtoLatin1(unsigned char *dst, unsigned char *src, int fillc)
381	{
382	    int len, result = 0;

384	    for (;;) {
385	        len = utlen[*src];
386	        /* copy US-ASCII */
387	        if (len == 1) {
388	            *dst = *src;
389	            ++dst;
390	            ++src;
391	            continue;
392	        }
393	        /* stop at illegal character or end of string */
394	        if (len > 6) break;
395	        /* skip non-latin1 glyphs and language tags */
396	        if (*src > 0xC3U || src[1] > 0xC0U) {
397	            if (src[1] <= 0xC0U) {
398	                /* non-latin1 glyph found */
399	                result = -1;
400	                if (fillc) {
401	                    *dst = fillc;
402	                    ++dst;
403	                }
404	            }
405	            while (len && *src != '\0') {
406	                ++src;
407	                --len;
408	            }
409	            continue;
410	        }
411	        /* copy latin 1 character */
412	        *dst = ((src[0] & 0x03) << 6) | (src[1] & 0x3F);
413	        ++dst;
414	        src += 2;
415	    }
416	    *dst = '\0';

418	    return (result);
419	}
420	Appendix D. Sample code for encoding/decoding language tags

422	/* encode a language tag
423	 *  the destination must have a size of least (counting terminating NUL):
424	 *        (6 * strlen(src) + 9) / 5
425	 *  returns the length of the destination.
426	 */
427	int MLSFlangencode(unsigned char *dst, unsigned char *src)
428	{
429	    static unsigned char prefix[] = { 0xC0, 0xE0, 0xF0, 0xF8, 0xFC };
430	    unsigned char *start = dst;
431	    int len;                    /* source length */
432	    int complen;                /* component length */
433	    int i;

435	    for (len = strlen(src); len > 0; len -= complen) {
436	        /* find maximal component length */
437	        complen = len;
438	        if (len >= 5) {
439	            complen = 5;
440	        }
441	        /* look up component prefix */
442	        *dst = prefix[complen - 1];
443	        ++dst;
444	        /* copy and map characters in component */
445	        for (i = 0; i < complen; ++i) {
446	            *dst = (islower(*src) ? toupper(*src) : *src) + 0xA0U;
447	            ++dst;
448	            ++src;
449	        }
450	    }
451	    *dst = '\0';

453	    return (dst - start);
454	}
455	/* decode a language tag
456	 *  the destination will not be longer than the source
457	 *  dst and src may be the same for in-place conversion
458	 * returns the length of the destination
459	 */
460	int MLSFlangdecode(unsigned char *dst, unsigned char *src)
461	{
462	    unsigned char *start = dst;
463	    int complen;

465	    while (src[0] >= 0xC0U && src[1] > 0xC0U) {
466	        for (complen = utlen[*src++]; complen > 1; --complen) {
467	            *dst = *src - 0xA0U;
468	            ++dst;
469	            ++src;
470	        }
471	    }
472	    *dst = '\0';

474	    return (dst - start);
475	}

477	Appendix E. Sample code for selecting the "best" alternative

479	/* select the "best" language match from an MLSF string
480	 *  assume input language tag has been converted to upper case
481	 *  assume language tags in string won't exceed 256 characters
482	 *  "best" is calculated by matching RFC 1766 language tag components
483	 * returns a pointer to the start of best matching component
484	 */
485	unsigned char *MLSFselect(unsigned char *str, unsigned char *tag)
486	{
487	    unsigned char ltag[256];
488	    unsigned char *best, *match1, *match2;
489	    int bestlen, mlen;

491	    /* start with match on preferred alternative */
492	    best = str;
493	    bestlen = 0;

495	    /* skip test if no language tag */
496	    if (tag != NULL && *tag != '\0') {
497	        do {
498	            /* get language tag for this component */
499	            MLSFlangdecode(ltag, str);
500	            /* calculate match length of language tags */
501	            match1 = ltag;
502	            match2 = tag;
503	            mlen = 0;
504	            while (*match1 != '\0' && *match1 == *match2) {
505	                ++match1, ++match2;
506	                /* save length of partial match */
507	                if (*match2 == '-'
508	                    && (*match1 == '-' || *match1 == '\0')) {
509	                    mlen = match1 - ltag;
510	                }
511	            }

513	            /* finish on exact match */
514	            if (*match2 == '\0'
515	                && (*match1 == '-' || *match1 == '\0')) {
516	                best = str;
517	                break;
518	            }

520	            /* remember best match */
521	            if (mlen > bestlen) {
522	                best = str;
523	                bestlen = mlen;
524	            }

526	            /* skip to next MLSF component */
527	            while (*str != '\0' && *str++ != 0xFEU)
528	                ;
529	        } while (*str != '\0');
530	    }

532	    return (best);
533	}

535	Appendix F. Background and Alternate Solutions

537	     MLSF was designed to deal with language tagging in the context of
538	     the ACAP protocol, but is believed to be useful in other contexts.
539	     Specific scenarios cited during discussion were human names in
540	     address books, system administrator alert error messages, and error
541	     messages which include identifiers potentially in a different
542	     language from the client's preferred error message language.  Since
543	     ACAP is an arbitrary attribute-value protocol, it is impossible to
544	     imaging all possible scenarios in advance, so a general purpose
545	     mechanism was needed.

547	     There have been several attempts to solve language tagging in
548	     attribute value protocols.  RFC 822 poses a particularly
549	     troublesome scenario, since headers must be 7-bit.  The MIME
550	     solution to label character sets [MIME-HDR] and languages [MIME-
551	     LANG] in headers is thus a necessary evil.  The result of this is
552	     to make header searching services such as those provided by IMAP
553	     [IMAP4] massively more complex.  If 8-bit headers were permitted a
554	     solution like MLSF would have been far simpler and more efficient.

556	     Another approach taken is demonstrated by the current vCard,
557	     iCalendar, and LDAPv3 proposals (all works in progress).  These
558	     proposals overload the attribute namespace to provide language
559	     tagging and creates a concept roughly described as attributes of
560	     the attribute.  The result of this is that clients have to deal
561	     with a multiple attribute response to a query where each attribute
562	     may have multiple values.  The additional complexity this adds to
563	     client processing was deemed unacceptable for ACAP where client
564	     simplicity was an important design goal.

566	     Another possible approach is the use of a markup language such as
567	     text/enriched [ENRICHED].  While this is certainly a suitable
568	     language tagging solution for large text objects such as MIME
569	     bodies, it is unsuitable for the attribute-value model where
570	     searching is a primary function.