idnits 2.17.1 draft-resnick-text-enriched-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 959 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 4 instances of too long lines in the document, the longest one being 8 characters in excess of 72. ** The abstract seems to contain references ([RFC-1866], [RFC-1563], [RFC-1521], [RFC-1523]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 1995) is 10360 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'ISO-639' is mentioned on line 465, but not defined -- Looks like a reference, but probably isn't: '62' on line 886 ** Obsolete normative reference: RFC 1341 (Obsoleted by RFC 1521) ** Obsolete normative reference: RFC 1521 (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 1523 (Obsoleted by RFC 1563, RFC 1896) ** Obsolete normative reference: RFC 1563 (Obsoleted by RFC 1896) ** Obsolete normative reference: RFC 1642 (Obsoleted by RFC 2152) ** Obsolete normative reference: RFC 1766 (Obsoleted by RFC 3066, RFC 3282) ** Obsolete normative reference: RFC 1866 (Obsoleted by RFC 2854) Summary: 17 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Resnick 3 INTERNET-DRAFT A. Walker 4 To-obsolete RFCs: 1523, 1563 December 1995 5 Category: Informational 7 The text/enriched MIME Content-type 9 Status of this Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its areas, 13 and its working groups. Note that other groups may also distribute 14 working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six 17 months and may be updated, replaced, or obsoleted by other documents 18 at any time. It is inappropriate to use Internet-Drafts as reference 19 material or to cite them other than as "work in progress." 21 To learn the current status of any Internet-Draft, please check the 22 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 Abstract 29 MIME [RFC-1521] defines a format and general framework for the 30 representation of a wide variety of data types in Internet mail. 31 This document defines one particular type of MIME data, the 32 text/enriched MIME type. The text/enriched MIME type is intended to 33 facilitate the wider interoperation of simple enriched text across a 34 wide variety of hardware and software platforms. This document is 35 only a minor revision to the text/enriched MIME type that was first 36 described in [RFC-1523] and [RFC-1563], and is only intended to be 37 used in the short term until other MIME types for text formatting in 38 Internet mail are developed and deployed. 40 The text/enriched MIME type 42 In order to promote the wider interoperability of simple formatted 43 text, this document defines an extremely simple subtype of the MIME 44 content-type "text", the "text/enriched" subtype. The content-type 45 line for this type may have one optional parameter, the "charset" 46 parameter, with the same values permitted for the "text/plain" MIME 47 content-type. 49 The text/enriched subtype was designed to meet the following 50 criteria: 52 1. The syntax must be extremely simple to parse, so that even 53 teletype-oriented mail systems can easily strip away the 54 formatting information and leave only the readable text. 56 2. The syntax must be extensible to allow for new formatting 57 commands that are deemed essential for some application. 59 3. If the character set in use is ASCII or an 8- bit ASCII 60 superset, then the raw form of the data must be readable enough 61 to be largely unobjectionable in the event that it is displayed 62 on the screen of the user of a non-MIME-conformant mail reader. 64 4. The capabilities must be extremely limited, to ensure that it 65 can represent no more than is likely to be representable by the 66 user's primary word processor. While this limits what can be 67 sent, it increases the likelihood that what is sent can be 68 properly displayed. 70 There are other text formatting standards which meet some of these 71 criteria. In particular, HTML and SGML have come into widespread use 72 on the Internet. However, there are two important reasons that this 73 document further promotes the use of text/enriched in Internet mail 74 over other such standards: 76 1. Most MIME-aware Internet mail applications are already able to 77 either properly format text/enriched mail or, at the very 78 least, are able to strip out the formatting commands and 79 display the readable text. The same is not true for HTML or 80 SGML. 82 2. The current RFC on HTML [RFC-1866] and Internet Drafts on SGML 83 have many features which are not necessary for Internet mail, 84 and are missing a few capabilities that text/enriched already 85 has. 87 For these reasons, this document is promoting the use of 88 text/enriched until other Internet standards come into more 89 widespread use. For those who will want to use HTML, Appendix B of 90 this document contains a very simple C program that converts 91 text/enriched to HTML 2.0 described in [RFC-1866]. 93 Syntax 95 The syntax of "text/enriched" is very simple. It represents text in 96 a single character set--US-ASCII by default, although a different 97 character set can be specified by the use of the "charset" 98 parameter. (The semantics of text/enriched in non-ASCII character 99 sets are discussed later in this document.) All characters represent 100 themselves, with the exception of the "<" character (ASCII 60), 101 which is used to mark the beginning of a formatting command. A 102 literal less-than sign ("<") can be represented by a sequence of two 103 such characters, "<<". 105 Formatting instructions consist of formatting commands surrounded by 106 angle brackets ("<>", ASCII 60 and 62). Each formatting command may 107 be no more than 60 characters in length, all in US-ASCII, restricted 108 to the alphanumeric and hyphen ("-") characters. Formatting commands 109 may be preceded by a solidus ("/", ASCII 47), making them negations, 110 and such negations must always exist to balance the initial opening 111 commands. Thus, if the formatting command "" appears at some 112 point, there must later be a "" to balance it. (NOTE: The 60 113 character limit on formatting commands does NOT include the "<", 114 ">", or "/" characters that might be attached to such commands.) 116 Line break rules 118 Line breaks (CRLF pairs in standard network representation) are 119 handled specially. In particular, isolated CRLF pairs are translated 120 into a single SPACE character. Sequences of N consecutive CRLF 121 pairs, however, are translated into N-1 actual line breaks. This 122 permits long lines of data to be represented in a natural looking 123 manner despite the frequency of line-wrapping in Internet mailers. 124 When preparing the data for mail transport, isolated line breaks 125 should be inserted wherever necessary to keep each line shorter than 126 80 characters. When preparing such data for presentation to the 127 user, isolated line breaks should be replaced by a single SPACE 128 character, and N consecutive CRLF pairs should be presented to the 129 user as N-1 line breaks. 131 Thus text/enriched data that looks like this: 133 This is 134 a single 135 line 137 This is the 138 next line. 140 This is the 141 next section. 143 should be displayed by a text/enriched interpreter as follows: 145 This is a single line 146 This is the next line. 148 This is the next section. 150 The formatting commands, not all of which will be implemented by all 151 implementations, are described in the following sections. 153 Formatting Commands 155 The text/enriched formatting commands all begin with 156 and end with , affecting the formatting of the text 157 between those two tokens. The commands are described here, grouped 158 according to type. 160 Parameter Command 162 Some of the formatting commands may require one or more associated 163 parameters. The "param" command is a special formatting command used 164 to include these parameters. 166 Param 167 Marks the affected text as command parameters, to be 168 interpreted or ignored by the text/enriched 169 interpreter, but not to be shown to the reader. The 170 "param" command always immediately follows some other 171 formatting command, and the parameter data indicates 172 some additional information about the formatting that 173 is to be done. The syntax of the parameter data 174 (whatever appears between the initial "" and 175 the terminating "") is defined for each 176 command that uses it. However, it is always required 177 that the format of such data must not contain nested 178 "param" commands, and either must not use the "<" 179 character or must use it in a way that is compatible 180 with text/enriched parsing. That is, the end of the 181 parameter data should be recognizable with either of 182 two algorithms: simply searching for the first 183 occurrence of "" or parsing until a balanced 184 "" command is found. In either case, however, 185 the parameter data should not be shown to the human 186 reader. 188 Font-Alteration Commands 190 The following formatting commands are intended to alter the font in 191 which text is displayed, but not to alter the indentation or 192 justification state of the text: 194 Bold 195 causes the affected text to be in a bold font. Nested 196 bold commands have the same effect as a single bold 197 command. 199 Italic 200 causes the affected text to be in an italic font. 201 Nested italic commands have the same effect as a 202 single italic command. 204 Underline 205 causes the affected text to be underlined. Nested 206 underline commands have the same effect as a single 207 underline command. 209 Fixed 210 causes the affected text to be in a fixed width font. 211 Nested fixed commands have the same effect as a 212 single fixed command. 214 FontFamily 215 causes the affected text to be displayed in a 216 specified typeface. The "fontfamily" command requires 217 a parameter that is specified by using the "param" 218 command. The parameter data is a case-insensitive 219 string containing the name of a font family. Any 220 currently available font family name (e.g. Times, 221 Palatino, Courier, etc.) may be used. This includes 222 font families defined by commercial type foundries 223 such as Adobe, BitStream, or any other such foundry. 224 Note that implementations should only use the general 225 font family name, not the specific font name (e.g. 226 use "Times", not "TimesRoman" nor "TimesBoldItalic"). 227 When nested, the inner "fontfamily" command takes 228 precedence. Also note that the "fontfamily" command 229 is advisory only; it should not be expected that 230 other implementations will honor the typeface 231 information in this command since the font 232 capabilities of systems vary drastically. 234 Color 235 causes the affected text to be displayed in a 236 specified color. The "color" command requires a 237 parameter that is specified by using the "param" 238 command. The parameter data can be one of the 239 following: 241 red 242 blue 243 green 244 yellow 245 cyan 246 magenta 247 black 248 white 250 or an RGB color value in the form: 252 ####,####,#### 254 where '#' is a hexadecimal digit '0' through '9', 'A' 255 through 'F', or 'a' through 'f'. The three 4-digit 256 hexadecimal values are the RGB values for red, green, 257 and blue respectively, where each component is 258 expressed as an unsigned value between 0 (0000) and 259 65535 (FFFF). The default color for the message is 260 unspecified, though black is a common choice in many 261 environments. When nested, the inner "color" command 262 takes precedence. 264 Smaller 265 causes the affected text to be in a smaller font. It 266 is recommended that the font size be changed by two 267 points, but other amounts may be more appropriate in 268 some environments. Nested smaller commands produce 269 ever smaller fonts, to the limits of the 270 implementation's capacity to reasonably display them, 271 after which further smaller commands have no 272 incremental effect. 274 Bigger 275 causes the affected text to be in a bigger font. It 276 is recommended that the font size be changed by two 277 points, but other amounts may be more appropriate in 278 some environments. Nested bigger commands produce 279 ever bigger fonts, to the limits of the 280 implementation's capacity to reasonably display them, 281 after which further bigger commands have no 282 incremental effect. 284 While the "bigger" and "smaller" operators are effectively inverses, 285 it is not recommended, for example, that "" be used to end 286 the effect of "". This is properly done with "". 288 Since the capabilities of implementations will vary, it is to be 289 expected that some implementations will not be able to act on some 290 of the font-alteration commands. However, an implementation should 291 still display the text to the user in a reasonable fashion. In 292 particular, the lack of capability to display a particular font 293 family, color, or other text attribute does not mean that an 294 implementation should fail to display text. 296 Fill/Justification/Indentation Commands 298 Initially, text/enriched text is intended to be displayed fully 299 filled (that is, using the rules specified for replacing CRLF pairs 300 with spaces or removing them as appropriate) with appropriate 301 kerning and letter-tracking, and using the maximum available margins 302 as suits the capabilities of the receiving user agent software. 304 The following commands alter that state. Each of these commands 305 force a line break before and after the formatting environment if 306 there is not otherwise a line break. For example, if one of these 307 commands occurs anywhere other than the beginning of a line of text 308 as presented, a new line is begun. 310 Center 311 causes the affected text to be centered. 313 FlushLeft 314 causes the affected text to be left-justified with a 315 ragged right margin. 317 FlushRight 318 causes the affected text to be right-justified with a 319 ragged left margin. 321 FlushBoth 322 causes the affected text to be filled and padded so 323 as to create smooth left and right margins, i.e., to 324 be fully justified. 326 ParaIndent 327 causes the running margins of the affected text to be 328 moved in. The recommended indentation change is the 329 width of four characters, but this may differ among 330 implementations. The "paraindent" command requires a 331 parameter that is specified by using the "param" 332 command. The parameter data is a comma-seperated list 333 of one or more of the following: 335 Left 336 causes the running left margin to be moved to 337 the right. 339 Right 340 causes the running right margin to be moved to 341 the left. 343 In 344 causes the first line of the affected paragraph 345 to be indented in addition to the running 346 margin. The remaining lines remain flush to the 347 running margin. 349 Out 350 causes all lines except for the first line of 351 the affected paragraph to be indented in 352 addition to the running margin. The first line 353 remains flush to the running margin. 355 Nofill 356 causes the affected text to be displayed without 357 filling. That is, the text is displayed without using 358 the rules for replacing CRLF pairs with spaces or 359 removing consecutive sequences of CRLF pairs. 360 However, the current state of the margins and 361 justification is honored; any indentation or 362 justification commands are still applied to the text 363 within the scope of the "nofill". 365 The "center", "flushleft", "flushright", and "flushboth" commands 366 are mutually exclusive, and, when nested, the inner command takes 367 precedence. 369 The "nofill" command is mutually exclusive with the "in" and "out" 370 parameters of the "paraindent" command; when they occur in the same 371 scope, their behavior is undefined. 373 The parameter data for the "paraindent" command my contain multiple 374 occurances of the same parameter (i.e. "left", "right", "in", or 375 "out"). Each occurance causes the text to be further indented in the 376 manner indicated by that parameter. Nested "paraindent" commands 377 cause the affected text to be further indented according to the 378 parameters. Note that the "in" and "out" parameters for "paraindent" 379 are mutually exclusive; when they appear together or when nested 380 "paraindent" commands contain both of them, their behavior is 381 undefined. 383 For purposes of the "in" and "out" parameters, a paragraph is 384 defined as text that is delimited by line breaks after applying the 385 rules for replacing CRLF pairs with spaces or removing consecutive 386 sequences of CRLF pairs. For example, within the scope of an "out", 387 the line following each CRLF is made flush with the running margin, 388 and subsequent lines are indented. Within the scope of an "in", the 389 first line following each CRLF is indented, and subsequent lines 390 remain flush to the running margin. 392 Whether or not text is justified by default (that is, whether the 393 default environment is "flushleft", "flushright", or "flushboth") is 394 unspecified, and depends on the preferences of the user, the 395 capabilities of the local software and hardware, and the nature of 396 the character set in use. On systems where full justification is 397 considered undesirable, the "flushboth" environment may be identical 398 to the default environment. Note that full justification should 399 never be performed inside of "center", "flushleft", "flushright", or 400 "nofill" environments. Note also that for some non-ASCII character 401 sets, full justification may be fundamentally inappropriate. 403 Note that [RFC-1563] defined two additional indentation commands, 404 "Indent" and "IndentRight". These commands did not force a line 405 break, and therefore their behavior was unpredictable since they 406 depended on the margins and character sizes that a particular 407 implementation used. Therefore, their use is deprecated and they 408 should be ignored just as other unrecognized commands. 410 Markup Commands 412 Commands in this section, unlike the other text/enriched commands 413 are declarative markup commands. Text/enriched is not intended as a 414 full markup language, but instead as a simple way to represent 415 common formatting commands. Therefore, markup commands are purposely 416 kept to a minimum. It is only because each was deemed so prevalent 417 or necessary in an e-mail environment that these particular commands 418 have been included at all. 420 Excerpt 421 causes the affected text to be interpreted as a 422 textual excerpt from another source, probably a 423 message being responded to. Typically this will be 424 displayed using indentation and an alternate font, or 425 by indenting lines and preceding them with "> ", but 426 such decisions are up to the implementation. Note 427 that as with the justification commands, the excerpt 428 command implicitly begins and ends with a line break 429 if one is not already there. Nested "excerpt" 430 commands are acceptable and should be interpreted as 431 meaning that the excerpted text was excerpted from 432 yet another source. Again, this can be displayed 433 using additional indentation, different colors, etc. 435 Optionally, the "excerpt" command can take a 436 parameter by using the "param" command. The format of 437 the data is unspecified, but it is intended to 438 uniquely identify the text from which the excerpt is 439 taken. With this information, an implementation 440 should be able to uniquely identify the source of any 441 particular excerpt, especially if two or more 442 excerpts in the message are from the same source, and 443 display it in some way that makes this apparent to 444 the user. 446 Lang 447 causes the affected text to be interpreted as 448 belonging to a particular language. This is most 449 useful when two different languages use the same 450 character set, but may require a different font or 451 formatting depending on the language. For instance, 452 Chinese and Japanese share similar character glyphs, 453 and in some character sets like UNICODE share common 454 code points, but it is considered very important that 455 different fonts be used for the two languages, 456 especially if they appear together, so that meaning 457 is not lost. Also, language information can be used 458 to allow for fancier text handling, like spell 459 checking or hyphenation. 461 The "lang" command requires a parameter using the 462 "param" command. The parameter data can be any of the 463 language tags specified in [RFC-1766], "Tags for the 464 Identification of Languages". These tags are the two 465 letter language codes taken from [ISO-639] or can be 466 other language codes that are registered according to 467 the instructions in the Langauge Tags RFC. Consult 468 that memo for further information. 470 Balancing and Nesting of Formatting Commands 472 Pairs of formatting commands must be properly balanced and nested. 473 Thus, a proper way to describe text in bold italics is: 475 the-text 477 or, alternately, 479 the-text 481 but, in particular, the following is illegal text/enriched: 483 the-text 485 The nesting requirement for formatting commands imposes a slightly 486 higher burden upon the composers of text/enriched bodies, but 487 potentially simplifies text/enriched displayers by allowing them to 488 be stack-based. The main goal of text/enriched is to be simple 489 enough to make multifont, formatted email widely readable, so that 490 those with the capability of sending it will be able to do so with 491 confidence. Thus slightly increased complexity in the composing 492 software was deemed a reasonable tradeoff for simplified reading 493 software. Nonetheless, implementors of text/enriched readers are 494 encouraged to follow the general Internet guidelines of being 495 conservative in what you send and liberal in what you accept. Those 496 implementations that can do so are encouraged to deal reasonably 497 with improperly nested text/enriched data. 499 Unrecognized formatting commands 501 Implementations must regard any unrecognized formatting command as 502 "no-op" commands, that is, as commands having no effect, thus 503 facilitating future extensions to "text/enriched". Private 504 extensions may be defined using formatting commands that begin with 505 "X-", by analogy to Internet mail header field names. 507 In order to formally define extended commands, a new Internet 508 document should be published. 510 White Space in Text/enriched Data 512 No special behavior is required for the SPACE or TAB (HT) character. 513 It is recommended, however, that, at least when fixed-width fonts 514 are in use, the common semantics of the TAB (HT) character should be 515 observed, namely that it moves to the next column position that is a 516 multiple of 8. (In other words, if a TAB (HT) occurs in column n, 517 where the leftmost column is column 0, then that TAB (HT) should be 518 replaced by 8-(n mod 8) SPACE characters.) It should also be noted 519 that some mail gateways are notorious for losing (or, less commonly, 520 adding) white space at the end of lines, so reliance on SPACE or TAB 521 characters at the end of a line is not recommended. 523 Initial State of a text/enriched interpreter 525 Text/enriched is assumed to begin with filled text in a 526 variable-width font in a normal typeface and a size that is average 527 for thecurrent display and user. The left and right margins are 528 assumed to be maximal, that is, at the leftmost and rightmost 529 acceptable positions. 531 Non-ASCII character sets 533 One of the great benefits of MIME is the ability to use different 534 varieties of non-ASCII text in messages. To use non-ASCII text in a 535 message, normally a charset parameter is specified in the 536 Content-type line that indicates the character set being used. For 537 purposes of this RFC, any legal MIME charset parameter can be used 538 with the text/enriched Content-type. However, there are two 539 difficulties that arise with regard to the text/enriched 540 Content-type when non-ASCII text is desired. The first problem 541 involves difficulties that occur when the user wishes to create text 542 which would normally require multiple non-ASCII character sets in 543 the same text/enriched message. The second problem is an ambiguity 544 that arises because of the text/enriched use of the "<" character in 545 formatting commands. 547 Using multiple non-ASCII character sets 549 Normally, if a user wishes to produce text which contains characters 550 from entirely different character sets within the same MIME message 551 (for example, using Russian Cyrillic characters from ISO 8859-5 and 552 Hebrew characters from ISO 8859-8), a multipart message is used. 553 Every time a new character set is desired, a new MIME body part is 554 started with different character sets specified in the charset 555 parameter of the Content-type line. However, using multiple 556 character sets this way in text/enriched messages introduces 557 problems. Since a change in the charset parameter requires a new 558 part, text/enriched formatting commands used in the first part would 559 not be able to apply to text that occurs in subsequent parts. It is 560 not possible for text/enriched formatting commands to apply across 561 MIME body part boundaries. 563 [RFC-1341] attempted to get around this problem in the now obsolete 564 text/richtext format by introducing different character set 565 formatting commands like "iso-8859-5" and "us-ascii". But this, or 566 even a more general solution along the same lines, is still 567 undesirable: It is common for a MIME application to decide, for 568 example, what character font resources or character lookup tables it 569 will require based on the information provided by the charset 570 parameter of the Content-type line, before it even begins to 571 interpret or display the data in that body part. By allowing the 572 text/enriched interpreter to subsequently change the character set, 573 perhaps to one completely different from the charset specified in 574 the Content-type line (with potentially much different resource 575 requirements), too much burden would be placed on the text/enriched 576 interpreter itself. 578 Therefore, if multiple types of non-ASCII characters are desired in 579 a text/enriched document, one of the following two methods must be 580 used: 582 1. For cases where the different types of non-ASCII text can be 583 limited to their own paragraphs with distinct formatting, a 584 multipart message can be used with each part having a 585 Content-Type of text/enriched and a different charset 586 parameter. The one caveat to using this method is that each new 587 part must start in the initial state for a text/enriched 588 document. That means that all of the text/enriched commands in 589 the preceding part must be properly balanced with ending 590 commands before the next text/enriched part begins. Also, each 591 text/enriched part must begin a new paragraph. 593 2. If different types of non-ASCII text are to appear in the same 594 line or paragraph, or if text/enriched formatting (e.g. 595 margins, typeface, justification) is required across several 596 different types of non-ASCII text, a single text/enriched body 597 part should be used with a character set specified that 598 contains all of the required characters. For example, a charset 599 parameter of "UNICODE-1-1-UTF-7" as specified in [RFC-1642] 600 could be used for such purposes. Not only does UNICODE contain 601 all of the characters that can be represented in all of the 602 other registered ISO 8859 MIME character sets, but UTF-7 is 603 fully compatible with other aspects of the text/enriched 604 standard, including the use of the "<" character referred to 605 below. Any other character sets that are specified for use in 606 MIME which contain different types of non-ASCII text can also 607 be used in these instances. 609 Use of the "<" character in formatting commands 611 If the character set specified by the charset parameter on the 612 Content-type line is anything other than "US- ASCII", this means 613 that the text being described by text/enriched formatting commands 614 is in a non-ASCII character set. However, the commands themselves 615 are still the same ASCII commands that are defined in this document. 616 This creates an ambiguity only with reference to the "<" character, 617 the octet with numeric value 60. In single byte character sets, such 618 as the ISO-8859 family, this is not a problem; the octet 60 can be 619 quoted by including it twice, just as for ASCII. The problem is more 620 complicated, however, in the case of multi-byte character sets, 621 where the octet 60 might appear at any point in the byte sequence 622 for any of several characters. 624 In practice, however, most multi-byte character sets address this 625 problem internally. For example, the UNICODE character sets can use 626 the UTF-7 encoding which preserves all of the important ASCII 627 characters in their single byte form. The ISO-2022 family of 628 character sets can use certain character sequences to switch back 629 into ASCII at any moment. Therefore it is specified that, before 630 text/enriched formatting commands, the prevailing character set 631 should be "switched back" into ASCII, and that only those characters 632 which would be interpreted as "<" in plain text should be 633 interpreted as token delimiters in text/enriched. 635 The question of what to do for hypothetical future character sets 636 that do not subsume ASCII is not addressed in this memo. 638 Minimal text/enriched conformance 640 A minimal text/enriched implementation is one that converts "<<" to 641 "<", removes everything between a command and the next 642 balancing command, removes all other formatting commands 643 (all text enclosed in angle brackets), and, outside of 644 environments, converts any series of n CRLFs to n-1 CRLFs, and 645 converts any lone CRLF pairs to SPACE. 647 Notes for Implementors 649 It is recognized that implementors of future mail systems will want 650 rich text functionality far beyond that currently defined for 651 text/enriched. The intent of text/enriched is to provide a common 652 format for expressing that functionality in a form in which much of 653 it, at least, will be understood by interoperating software. Thus, 654 in particular, software with a richer notion of formatted text than 655 text/enriched can still use text/enriched as its basic 656 representation, but can extend it with new formatting commands and 657 by hiding information specific to that software system in 658 text/enriched constructs. As such systems evolve, it is 659 expected that the definition of text/enriched will be further 660 refined by future published specifications, but text/enriched as 661 defined here provides a platform on which evolutionary refinements 662 can be based. 664 An expected common way that sophisticated mail programs will 665 generate text/enriched data is as part of a multipart/alternative 666 construct. For example, a mail agent that can generate enriched mail 667 in ODA format can generate that mail in a more widely interoperable 668 form by generating both text/enriched and ODA versions of the same 669 data, e.g.: 671 Content-type: multipart/alternative; boundary=foo 673 --foo 674 Content-type: text/enriched 676 [text/enriched version of data] 677 --foo Content-type: application/oda 679 [ODA version of data] 680 --foo-- 682 If such a message is read using a MIME-conformant mail reader that 683 understands ODA, the ODA version will be displayed; otherwise, the 684 text/enriched version will be shown. 686 In some environments, it might be impossible to combine certain 687 text/enriched formatting commands, whereas in others they might be 688 combined easily. For example, the combination of and 689 might produce bold italics on systems that support such fonts, but 690 there exist systems that can make text bold or italicized, but not 691 both. In such cases, the most recently issued (innermost) recognized 692 formatting command should be preferred. 694 One of the major goals in the design of text/enriched was to make it 695 so simple that even text-only mailers will implement enriched-to- 696 plain-text translators, thus increasing the likelihood that enriched 697 text will become "safe" to use very widely. To demonstrate this 698 simplicity, an extremely simple C program that converts 699 text/enriched input into plain text output is included in Appendix 700 A. 702 Extensions to text/enriched 704 It is expected that various mail system authors will desire 705 extensions to text/enriched. The simple syntax of text/enriched, and 706 the specification that unrecognized formatting commands should 707 simply be ignored, are intended to promote such extensions. 709 An Example 711 Putting all this together, the following "text/enriched" body 712 fragment: 714 From: Nathaniel Borenstein 715 To: Ned Freed 716 Content-type: text/enriched 718 Now is the time for all 719 good men 720 (and <) to 721 come 723 to the aid of their 725 redbeloved 726 country. 728 By the way, 729 I think that left< 731 should REALLY be called 733 left< 734 and that I am always right. 736 -- the end 738 represents the following formatted text (which will, no doubt, look 739 somewhat cryptic in the text-only version of this document): 741 Now is the time for all good men (and ) to come 742 to the aid of their 744 beloved country. 745 By the way, I think that 746 747 should REALLY be called 748 749 and that I am always right. 750 -- the end 752 where the word "beloved" would be in red on a color display. 754 Security Considerations 756 Security issues are not discussed in this memo, as the mechanism 757 raises no security issues. 759 Author's Address 761 For more information, the authors of this document may be contacted 762 via Internet mail: 764 Peter W. Resnick 765 QUALCOMM Incorporated 766 1009 North Busey Avenue 767 Urbana, IL 61801-1607 768 Phone: +1 217 337 1905 769 FAX: +1 217 337 1905 770 e-mail: presnick@qualcomm.com 772 Amanda Walker 773 InterCon Systems Corporation 774 950 Herndon Parkway 775 Herndon, VA 22070 776 Phone: +1 703 709 5500 777 FAX: +1 703 709 5555 778 e-mail: amanda@intercon.com 780 Acknowledgements 782 References 784 [RFC-1341] 785 [RFC-1521] 786 [RFC-1523] 787 [RFC-1563] 788 [RFC-1642] 789 [RFC-1766] 790 [RFC-1866] 792 Appendix A--A Simple enriched-to-plain Translator in C 794 One of the major goals in the design of the text/enriched subtype of 795 the text Content-Type is to make formatted text so simple that even 796 text-only mailers will implement enriched-to-plain-text translators, 797 thus increasing the likelihood that multifont text will become 798 "safe" to use very widely. To demonstrate this simplicity, what 799 follows is a simple C program that converts text/enriched input into 800 plain text output. Note that the local newline convention (the 801 single character represented by "\n") is assumed by this program, 802 but that special CRLF handling might be necessary on some systems. 804 #include 805 #include 806 #include 807 #include 809 main() { 810 int c, i, paramct=0, newlinect=0, nofill=0; 811 char token[62], *p; 813 while ((c=getc(stdin)) != EOF) { 814 if (c == '<') { 815 if (newlinect == 1) putc(' ', stdout); 816 newlinect = 0; 817 c = getc(stdin); 818 if (c == '<') { 819 if (paramct <= 0) putc(c, stdout); 820 } else { 821 ungetc(c, stdin); 822 for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) { 823 if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c; 824 } 825 *p = '\0'; 826 if (c == EOF) break; 827 if (strcmp(token, "param") == 0) 828 paramct++; 829 else if (strcmp(token, "nofill") == 0) 830 nofill++; 831 else if (strcmp(token, "/param") == 0) 832 paramct--; 833 else if (strcmp(token, "/nofill") == 0) 834 nofill--; 835 } 836 } else { 837 if (paramct > 0) 838 ; /* ignore params */ 839 else if (c == '\n' && nofill <= 0) { 840 if (++newlinect > 1) putc(c, stdout); 841 } else { 842 if (newlinect == 1) putc(' ', stdout); 843 newlinect = 0; 844 putc(c, stdout); 845 } 846 } 847 } 848 /* The following line is only needed with line-buffering */ 849 putc('\n', stdout); 850 exit(0); 851 } 853 It should be noted that one can do considerably better than this in 854 displaying text/enriched data on a dumb terminal. In particular, one 855 can replace font information such as "bold" with textual emphasis 856 (like *this* or _T_H_I_S_). One can also properly handle the 857 text/enriched formatting commands regarding indentation, 858 justification, and others. However, the above program is all that is 859 necessary in order to present text/enriched on a dumb terminal 860 without showing the user any formatting artifacts. 862 Appendix B--A Simple enriched-to-HTML Translator in C 864 It is fully expected that other text formatting standards like HTML 865 and SGML will supplant text/enriched in Internet mail. It is also 866 likely that as this happens, recipients of text/enriched mail will 867 wish to view such mail with an HTML viewer. To this end, the 868 following is a simple example of a C program to convert 869 text/enriched to HTML. Since the current version of HTML at the time 870 of this document's publication is HTML 2.0 defined in [RFC-1866], 871 this program converts to that standard. There are several 872 text/enriched commands that have no HTML 2.0 equivalent. In those 873 cases, this program simply puts those commands into processing 874 instructions; that is, surrounded by "". As in Appendix A, 875 the local newline convention (the single character represented by 876 "\n") is assumed by this program, but special CRLF handling might be 877 necessary on some systems. 879 #include 880 #include 881 #include 882 #include 884 main() { 885 int c, i, paramct=0, nofill=0; 886 char token[62], *p; 888 while((c=getc(stdin)) != EOF) { 889 if(c == '<') { 890 c = getc(stdin); 891 if(c == '<') { 892 fputs("<", stdout); 893 } else { 894 ungetc(c, stdin); 895 for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) { 896 if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c; 897 } 898 *p = '\0'; 899 if(c == EOF) break; 900 if(strcmp(token, "/param") == 0) { 901 paramct--; 902 putc('>', stdout); 903 } else if(paramct > 0) { 904 fputs("<", stdout); 905 fputs(token, stdout); 906 fputs(">", stdout); 907 } else { 908 putc('<', stdout); 909 if(strcmp(token, "nofill") == 0) { 910 nofill++; 911 fputs("pre", stdout); 912 } else if(strcmp(token, "/nofill") == 0) { 913 nofill--; 914 fputs("/pre", stdout); 915 } else if(strcmp(token, "bold") == 0) { 916 fputs("b", stdout); 917 } else if(strcmp(token, "/bold") == 0) { 918 fputs("/b", stdout); 919 } else if(strcmp(token, "italic") == 0) { 920 fputs("i", stdout); 921 } else if(strcmp(token, "/italic") == 0) { 922 fputs("/i", stdout); 923 } else if(strcmp(token, "fixed") == 0) { 924 fputs("tt", stdout); 925 } else if(strcmp(token, "/fixed") == 0) { 926 fputs("/tt", stdout); 927 } else if(strcmp(token, "excerpt") == 0) { 928 fputs("blockquote", stdout); 929 } else if(strcmp(token, "/excerpt") == 0) { 930 fputs("/blockquote", stdout); 931 } else { 932 putc('?', stdout); 933 fputs(token, stdout); 934 if(strcmp(token, "param") == 0) { 935 paramct++; 936 putc(' ', stdout); 937 continue; 938 } 939 } 940 putc('>', stdout); 941 } 942 } 943 } else if(c == '>') { 944 fputs(">", stdout); 945 } else { 946 if(c == '\n' && nofill <= 0 && paramct <= 0) { 947 while((i=getc(stdin)) == '\n') fputs("
", stdout); 948 ungetc(i, stdin); 949 } 950 putc(c, stdout); 951 } 952 } 953 /* The following line is only needed with line-buffering */ 954 putc('\n', stdout); 955 exit(0); 956 }