idnits 2.17.1 draft-ietf-acap-mlsf-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 2 characters in excess of 72. ** The abstract seems to contain references ([UTF-8], [IAB-CHARSET]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 1997) is 9809 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'UTF-8' is mentioned on line 46, but not defined == Missing Reference: 'LANG-TAG' is mentioned on line 122, but not defined == Missing Reference: 'MLSF-LANG-TAG' is mentioned on line 153, but not defined -- Looks like a reference, but probably isn't: '256' on line 487 -- Looks like a reference, but probably isn't: '1' on line 465 -- Looks like a reference, but probably isn't: '0' on line 465 == Unused Reference: 'MIME-IMB' is defined on line 247, but no explicit reference was found in the text == Unused Reference: 'UTF8' is defined on line 257, but no explicit reference was found in the text -- No information found for draft-ietf-drums-abnf-xx - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'ABNF' ** Downref: Normative reference to an Informational RFC: RFC 1896 (ref. 'ENRICHED') ** Obsolete normative reference: RFC 2070 (ref. 'HTML-I18N') (Obsoleted by RFC 2854) ** Downref: Normative reference to an Informational RFC: RFC 2130 (ref. 'IAB-CHARSET') ** Obsolete normative reference: RFC 2060 (ref. 'IMAP4') (Obsoleted by RFC 3501) ** Obsolete normative reference: RFC 1766 (ref. 'LANG-TAGS') (Obsoleted by RFC 3066, RFC 3282) -- Possible downref: Non-RFC (?) normative reference: ref. 'MIME-LANG' ** Obsolete normative reference: RFC 2044 (ref. 'UTF8') (Obsoleted by RFC 2279) Summary: 16 errors (**), 0 flaws (~~), 7 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Newman 3 Internet Draft: Multi-Lingual String Format Innosoft 4 Document: draft-ietf-acap-mlsf-01.txt June 1997 5 Expires in six months 7 Multi-Lingual String Format (MLSF) 9 Status of this memo 11 This document is an Internet Draft. Internet Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its Areas, 13 and its Working Groups. Note that other groups may also distribute 14 working documents as Internet Drafts. 16 Internet Drafts are draft documents valid for a maximum of six 17 months. Internet Drafts may be updated, replaced, or obsoleted by 18 other documents at any time. It is not appropriate to use Internet 19 Drafts as reference material or to cite them other than as a 20 "working draft" or "work in progress". 22 To learn the current status of any Internet-Draft, please check the 23 1id-abstracts.txt listing contained in the Internet-Drafts Shadow 24 Directories on ds.internic.net, nic.nordu.net, ftp.isi.edu, or 25 munnari.oz.au. 27 A revised version of this draft document will be submitted to the 28 RFC editor as a Proposed Standard for the Internet Community. 29 Discussion and suggestions for improvement are requested. This 30 document will expire six months after publication. Distribution of 31 this draft is unlimited. 33 Abstract 35 The IAB charset workshop [IAB-CHARSET] concluded that for human 36 readable text there should always be a way to specify the natural 37 language. Many protocols are designed with an attribute-value 38 model (including RFC 822, HTTP, LDAP, SNMP, DHCP, and ACAP) which 39 stores many small human readable text strings. The primary 40 function of an attribute-value model is to simplify both 41 extensibility and searchability. A solution is needed to provide 42 language tags in these small human readable text strings, which 43 does not interfere with these primary functions. 45 This specification defines MLSF (Multi-Lingual String Format) which 46 applies another layer of encoding on top of UTF-8 [UTF-8] to permit 47 the addition of language tags anywhere within a text string. In 48 addition, it defines an alternate form which can be used to include 49 alternative representations of the same text in different character 50 sets. MLSF has the property that UTF-8 is a proper subset of MLSF. 51 This preserves the searchability requirement of the attribute-value 52 model. 54 Appendix F of this document includes a brief discussion of the 55 background behind MLSF and why some other potential solutions were 56 rejected for this purpose. 58 1. Conventions used in this document 60 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" 61 in this document are to be interpreted as defined in "Key words for 62 use in RFCs to Indicate Requirement Levels" [KEYWORDS]. 64 2. MLSF simple form 66 MLSF uses "Tags for the Identification of Languages" [LANG-TAGS] as 67 the basis for language identification. 69 Language tags are encoded by mapping them to upper-case, then 70 adding hexadecimal A0 to each octet. The result is broken up into 71 groups of five octets followed by a final group of five or fewer 72 octets. Each group is prefixed by a UTF-8-style length count with 73 the low bits set to 0. See Appendix D for sample source code to 74 perform this conversion. 76 MLSF simple form is defined by the MLSF-SIMPLE rule in section 7. 77 A quoted version of MLSF simple form is defined by the MLSF- 78 SIMPLE-QUOTED rule. 80 Note that MLSF is not compatible with UTF-8. A program which uses 81 MLSF MUST downconvert it to UTF-8 prior to using it in a context 82 where UTF-8 is required. Sample code for this down conversion is 83 included in Appendix B. 85 3. MLSF alternative form 87 A MLSF alternative form string may contain alternative 88 representations of the same text in different primary languages. 89 The octet with hexadecimal representation of FE is used to 90 introduce a new alternative. This MUST be followed by a MLSF 91 language tag for the primary language of the alternative. 93 The component of the MLSF string prior to the first FE octet is 94 considered the "preferred" representation for the string. This is 95 the version which will be displayed by MLSF clients which choose 96 not to support alternative representations. The preferred 97 representation MAY be prefixed by a MLSF language tag. 99 MLSF alternate form is defined by the MLSF-ALT rule in section 7. 100 A quoted version of MLSF alternate form is defined by the 101 MLSF-ALT-QUOTED rule. 103 Note that MLSF alternate form is not compatible with UTF-8. A 104 program which uses MLSF MUST downconvert it to UTF-8 prior to using 105 it in a context where UTF-8 is required. Sample code for this down 106 conversion is included in Appendix B. 108 4. MLSF MIME character sets 110 The character set label "XXXX-simple" will be registered to 111 indicate the use of MLSF simple form. The character set label 112 "XXXX-alt" will be registered to indicate the use of MLSF alternate 113 form. 115 MLSF may be used in conjunction with MIME header [MIME-HDR] 116 encoding to permit language tagging and alternative representations 117 in header fields. A work in progress [MIME-LANG] will propose a 118 mechanism for language tagging in headers which is not dependent on 119 the use of UTF-8. 121 For single language MIME body parts, the UTF-8 character set with 122 an appropriate Content-Language [LANG-TAG] header SHOULD be used 123 instead of MLSF. Text/enriched [ENRICHED] or HTML with language 124 tags [HTML-I18N] are preferred to using MLSF for MIME bodies when 125 possible. 127 5. Security Considerations 129 Multi-Lingual String Format is not believed to have any security 130 considerations beyond those for simple US-ASCII strings. In 131 particular, unfiltered display of certain US-ASCII control 132 characters by a terminal emulator may result in modifying the 133 behavior of the terminal emulator (e.g. by redefining function 134 keys) such that security can be breached. Programs which display 135 text to a potentially insecure terminal emulator channel are 136 encouraged to remove control characters to avoid these problems. 138 6. Formal Grammar 140 This section defines the formal grammar for MLSF using Augmented 141 BNF [ABNF] notation. 143 MLSF-ALT = [[MLSF-LANG-TAG] MLSF-COMPONENT 144 *(MLSF-ALTERNATE MLSF-COMPONENT)] 146 MLSF-ALT-QUOTED = <"> [[MLSF-LANG-TAG] MLSF-COMPONENT-Q 147 *(MLSF-ALTERNATE MLSF-COMPONENT-Q)] <"> 149 MLSF-ALTERNATE = %xFE MLSF-LANG-TAG 151 MLSF-COMPONENT = UTF8-NON-NUL *([MLSF-LANG-TAG] UTF8-NON-NUL) 153 MLSF-COMPONENT-Q = UTF8-QUOTED *([MLSF-LANG-TAG] UTF8-QUOTED) 155 MLSF-LANG-TAG = *MLSF-LANG-5 (MLSF-LANG-1 / MLSF-LANG-2 / 156 MLSF-LANG-3 / MLSF-LANG-4 / MLSF-LANG-5) 157 ;; Encoded version of Language-Tag from RFC 1766 158 ;; characters converted to uppercase, with 159 ;; A0 added and broken into MLSF-LANG components 161 MLSF-LANG-CONT = %xCD / %xE1..FA 163 MLSF-LANG-1 = %xC0 MLSF-LANG-CONT 165 MLSF-LANG-2 = %xE0 2MLSF-LANG-CONT 167 MLSF-LANG-3 = %xF0 3MLSF-LANG-CONT 169 MLSF-LANG-4 = %xF8 4MLSF-LANG-CONT 171 MLSF-LANG-5 = %xFC 5MLSF-LANG-CONT 173 MLSF-SIMPLE = [[MLSF-LANG-TAG] MLSF-COMPONENT] 175 MLSF-SIMPLE-QUOTED = <"> [[MLSF-LANG-TAG] MLSF-COMPONENT-Q] <"> 177 QUOTED = "\" QUOTED-SPECIAL 179 QUOTED-SPECIAL = "\" / <"> 181 US-ASCII-SAFE = %x01..09 / %x0B..0C / %x0E..21 182 / %x23..5B / %x5D..7F 183 ;; US-ASCII except QUOTED-SPECIALs, CR, LF, NUL 185 UTF8-NON-NUL = UTF8-SAFE / CR / LF / QUOTED-SPECIAL 186 UTF8-QUOTED = UTF8-SAFE / QUOTED 188 UTF8-SAFE = US-ASCII-SAFE / UTF8-1 / UTF8-2 / UTF8-3 189 / UTF8-4 / UTF8-5 191 UTF8-CONT = %x80..BF 193 UTF8-1 = %xC0..DF UTF8-CONT 195 UTF8-2 = %xE0..EF 2UTF8-CONT 197 UTF8-3 = %xF0..F7 3UTF8-CONT 199 UTF8-4 = %xF8..FB 4UTF8-CONT 201 UTF8-5 = %xFC..FD 5UTF8-CONT 203 7. References 205 [ABNF] Crocker, D., "Augmented BNF for Syntax Specifications: 206 ABNF", Work in progress: draft-ietf-drums-abnf-xx.txt 208 [ENRICHED] Resnick, Walker, "The text/enriched MIME Content-type", 209 RFC 1896, Qualcomm, InterCon, February 1996. 211 213 [HTML-I18N] Yergeau, Nicol, Adams, Duerst, "Internationalization of 214 the Hypertext Markup Language", RFC 2070, Alis Technologies, 215 Electronic Book Technologies, Spyglass, University of Zurich, 216 January 1997. 218 220 [IAB-CHARSET] Weider, Preston, Simonsen, Alvestrand, Atkinson, 221 Crispin, Svanberg, "The Report of the IAB Character Set Workshop 222 held 29 February - 1 March, 1996", RFC 2130, April 1997. 224 226 [IMAP4] Crispin, "Internet Message Access Protocol - Version 227 4rev1", RFC 2060, University of Washington, December 1996. 229 231 [KEYWORDS] Bradner, "Key words for use in RFCs to Indicate 232 Requirement Levels", RFC 2119, Harvard University, March 1997. 234 236 [LANG-TAGS] Alvestrand, H., "Tags for the Identification of 237 Languages", RFC 1766. 239 241 [MIME-HDR] Moore, "MIME (Multipurpose Internet Mail Extensions) 242 Part Three: Message Header Extensions for Non-ASCII Text", RFC 243 2047, University of Tennessee, November 1996. 245 247 [MIME-IMB] Freed, Borenstein, "Multipurpose Internet Mail 248 Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 249 2045, Innosoft, First Virtual, November 1996. 251 253 [MIME-LANG] Freed, Moore, "MIME Parameter Value and Encoded Words: 254 Character Sets, Language, and Continuations", work in progress, 255 March 1997. 257 [UTF8] Yergeau, F. "UTF-8, a transformation format of Unicode and 258 ISO 10646", RFC 2044, Alis Technologies, October 1996. 260 262 8. Acknowledgements 264 Special thanks to Mark Crispin for the idea of using unused UTF-8 265 codes for this purpose. Thanks are also due to participants of 266 the ACAP WG mailing list who helped review this proposal. 268 9. Author's Address 270 Chris Newman 271 Innosoft International, Inc. 272 1050 East Garvey Ave. South 273 West Covina, CA 91790 USA 275 Email: chris.newman@innosoft.com 277 Appendix A. Client advice 279 A simple UTF-8 client is likely to find the source code in Appendix 280 B useful. A simple Latin-1 based client is likely to find the 281 source code in Appendix C useful. 283 A more sophisticated client will allow the user to select a 284 preferred language and use something like the source code in 285 Appendix E to find the best alternative in an MLSF string. Such 286 clients should also be aware that sometimes the client's preferred 287 language is misconfigured, and the user may wish to have the last 288 few messages repeated after they have changed languages. For this 289 reason, such a client may wish to cache the last few MLSF strings 290 displayed to the user. 292 Appendix B. Sample code to convert to UTF-8 294 Here is sample C source code to convert from MLSF to UTF-8. 296 #include 297 #include 299 /* a UTF8 lookup table */ 300 #define BAD 0x80 301 #define SEP 0x40 302 #define EXT 0x20 303 static unsigned char utlen[256] = { 304 /* 0x00 */ BAD, 1, 1, 1, 1, 1, 1, 1, 305 /* 0x08 */ 1, 1, 1, 1, 1, 1, 1, 1, 306 /* 0x10 */ 1, 1, 1, 1, 1, 1, 1, 1, 307 /* 0x18 */ 1, 1, 1, 1, 1, 1, 1, 1, 308 /* 0x20 */ 1, 1, 1, 1, 1, 1, 1, 1, 309 /* 0x28 */ 1, 1, 1, 1, 1, 1, 1, 1, 310 /* 0x30 */ 1, 1, 1, 1, 1, 1, 1, 1, 311 /* 0x38 */ 1, 1, 1, 1, 1, 1, 1, 1, 312 /* 0x40 */ 1, 1, 1, 1, 1, 1, 1, 1, 313 /* 0x48 */ 1, 1, 1, 1, 1, 1, 1, 1, 314 /* 0x50 */ 1, 1, 1, 1, 1, 1, 1, 1, 315 /* 0x58 */ 1, 1, 1, 1, 1, 1, 1, 1, 316 /* 0x60 */ 1, 1, 1, 1, 1, 1, 1, 1, 317 /* 0x68 */ 1, 1, 1, 1, 1, 1, 1, 1, 318 /* 0x70 */ 1, 1, 1, 1, 1, 1, 1, 1, 319 /* 0x78 */ 1, 1, 1, 1, 1, 1, 1, 1, 320 /* 0x80 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT, 321 /* 0x88 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT, 322 /* 0x90 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT, 323 /* 0x98 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT, 324 /* 0xA0 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT, 325 /* 0xA8 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT, 326 /* 0xB0 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT, 327 /* 0xB8 */ EXT, EXT, EXT, EXT, EXT, EXT, EXT, EXT, 328 /* 0xC0 */ 2, 2, 2, 2, 2, 2, 2, 2, 329 /* 0xC8 */ 2, 2, 2, 2, 2, 2, 2, 2, 330 /* 0xD0 */ 2, 2, 2, 2, 2, 2, 2, 2, 331 /* 0xD8 */ 2, 2, 2, 2, 2, 2, 2, 2, 332 /* 0xE0 */ 3, 3, 3, 3, 3, 3, 3, 3, 333 /* 0xE8 */ 3, 3, 3, 3, 3, 3, 3, 3, 334 /* 0xF0 */ 4, 4, 4, 4, 4, 4, 4, 4, 335 /* 0xF8 */ 5, 5, 5, 5, 6, 6, SEP, BAD 336 }; 337 /* Down conversion from NUL terminated MLSF string to UTF-8. 338 * this strips the language tags and only keeps the preferred 339 * representation. 340 * It returns the length of the final string. 341 * The destination string will not be longer than the source string. 342 * dst and src may be the same for in-place conversion. 343 */ 344 int MLSFtoUTF8(unsigned char *dst, unsigned char *src) 345 { 346 unsigned char *start = dst; 347 int len; 349 for (;;) { 350 len = utlen[*src]; 351 if (len > 6) break; 352 /* skip language tags */ 353 if (len > 1 && src[1] > 0xC0U) { 354 while (len && *src != '\0') { 355 ++src; 356 --len; 357 } 358 continue; 359 } 360 /* copy UTF8 character */ 361 while (len && *src != '\0') { 362 *dst = *src; 363 ++dst; 364 ++src; 365 --len; 366 } 367 } 368 *dst = '\0'; 370 return (dst - start); 371 } 372 Appendix C. Sample code to convert to Latin-1 374 /* Down conversion from NUL terminated MLSF string to 8859-1 375 * The destination string will not be longer than the source string. 376 * fillc is used to fill untranslatable characters, 377 * if fillc is NUL, untranslatable characters are ignored. 378 * returns 0 if source only contained latin-1, returns -1 otherwise. 379 */ 380 int MLSFtoLatin1(unsigned char *dst, unsigned char *src, int fillc) 381 { 382 int len, result = 0; 384 for (;;) { 385 len = utlen[*src]; 386 /* copy US-ASCII */ 387 if (len == 1) { 388 *dst = *src; 389 ++dst; 390 ++src; 391 continue; 392 } 393 /* stop at illegal character or end of string */ 394 if (len > 6) break; 395 /* skip non-latin1 glyphs and language tags */ 396 if (*src > 0xC3U || src[1] > 0xC0U) { 397 if (src[1] <= 0xC0U) { 398 /* non-latin1 glyph found */ 399 result = -1; 400 if (fillc) { 401 *dst = fillc; 402 ++dst; 403 } 404 } 405 while (len && *src != '\0') { 406 ++src; 407 --len; 408 } 409 continue; 410 } 411 /* copy latin 1 character */ 412 *dst = ((src[0] & 0x03) << 6) | (src[1] & 0x3F); 413 ++dst; 414 src += 2; 415 } 416 *dst = '\0'; 418 return (result); 419 } 420 Appendix D. Sample code for encoding/decoding language tags 422 /* encode a language tag 423 * the destination must have a size of least (counting terminating NUL): 424 * (6 * strlen(src) + 9) / 5 425 * returns the length of the destination. 426 */ 427 int MLSFlangencode(unsigned char *dst, unsigned char *src) 428 { 429 static unsigned char prefix[] = { 0xC0, 0xE0, 0xF0, 0xF8, 0xFC }; 430 unsigned char *start = dst; 431 int len; /* source length */ 432 int complen; /* component length */ 433 int i; 435 for (len = strlen(src); len > 0; len -= complen) { 436 /* find maximal component length */ 437 complen = len; 438 if (len >= 5) { 439 complen = 5; 440 } 441 /* look up component prefix */ 442 *dst = prefix[complen - 1]; 443 ++dst; 444 /* copy and map characters in component */ 445 for (i = 0; i < complen; ++i) { 446 *dst = (islower(*src) ? toupper(*src) : *src) + 0xA0U; 447 ++dst; 448 ++src; 449 } 450 } 451 *dst = '\0'; 453 return (dst - start); 454 } 455 /* decode a language tag 456 * the destination will not be longer than the source 457 * dst and src may be the same for in-place conversion 458 * returns the length of the destination 459 */ 460 int MLSFlangdecode(unsigned char *dst, unsigned char *src) 461 { 462 unsigned char *start = dst; 463 int complen; 465 while (src[0] >= 0xC0U && src[1] > 0xC0U) { 466 for (complen = utlen[*src++]; complen > 1; --complen) { 467 *dst = *src - 0xA0U; 468 ++dst; 469 ++src; 470 } 471 } 472 *dst = '\0'; 474 return (dst - start); 475 } 477 Appendix E. Sample code for selecting the "best" alternative 479 /* select the "best" language match from an MLSF string 480 * assume input language tag has been converted to upper case 481 * assume language tags in string won't exceed 256 characters 482 * "best" is calculated by matching RFC 1766 language tag components 483 * returns a pointer to the start of best matching component 484 */ 485 unsigned char *MLSFselect(unsigned char *str, unsigned char *tag) 486 { 487 unsigned char ltag[256]; 488 unsigned char *best, *match1, *match2; 489 int bestlen, mlen; 491 /* start with match on preferred alternative */ 492 best = str; 493 bestlen = 0; 495 /* skip test if no language tag */ 496 if (tag != NULL && *tag != '\0') { 497 do { 498 /* get language tag for this component */ 499 MLSFlangdecode(ltag, str); 500 /* calculate match length of language tags */ 501 match1 = ltag; 502 match2 = tag; 503 mlen = 0; 504 while (*match1 != '\0' && *match1 == *match2) { 505 ++match1, ++match2; 506 /* save length of partial match */ 507 if (*match2 == '-' 508 && (*match1 == '-' || *match1 == '\0')) { 509 mlen = match1 - ltag; 510 } 511 } 513 /* finish on exact match */ 514 if (*match2 == '\0' 515 && (*match1 == '-' || *match1 == '\0')) { 516 best = str; 517 break; 518 } 520 /* remember best match */ 521 if (mlen > bestlen) { 522 best = str; 523 bestlen = mlen; 524 } 526 /* skip to next MLSF component */ 527 while (*str != '\0' && *str++ != 0xFEU) 528 ; 529 } while (*str != '\0'); 530 } 532 return (best); 533 } 535 Appendix F. Background and Alternate Solutions 537 MLSF was designed to deal with language tagging in the context of 538 the ACAP protocol, but is believed to be useful in other contexts. 539 Specific scenarios cited during discussion were human names in 540 address books, system administrator alert error messages, and error 541 messages which include identifiers potentially in a different 542 language from the client's preferred error message language. Since 543 ACAP is an arbitrary attribute-value protocol, it is impossible to 544 imaging all possible scenarios in advance, so a general purpose 545 mechanism was needed. 547 There have been several attempts to solve language tagging in 548 attribute value protocols. RFC 822 poses a particularly 549 troublesome scenario, since headers must be 7-bit. The MIME 550 solution to label character sets [MIME-HDR] and languages [MIME- 551 LANG] in headers is thus a necessary evil. The result of this is 552 to make header searching services such as those provided by IMAP 553 [IMAP4] massively more complex. If 8-bit headers were permitted a 554 solution like MLSF would have been far simpler and more efficient. 556 Another approach taken is demonstrated by the current vCard, 557 iCalendar, and LDAPv3 proposals (all works in progress). These 558 proposals overload the attribute namespace to provide language 559 tagging and creates a concept roughly described as attributes of 560 the attribute. The result of this is that clients have to deal 561 with a multiple attribute response to a query where each attribute 562 may have multiple values. The additional complexity this adds to 563 client processing was deemed unacceptable for ACAP where client 564 simplicity was an important design goal. 566 Another possible approach is the use of a markup language such as 567 text/enriched [ENRICHED]. While this is certainly a suitable 568 language tagging solution for large text objects such as MIME 569 bodies, it is unsuitable for the attribute-value model where 570 searching is a primary function.