idnits 2.17.1 draft-resnick-text-enriched-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-03-28) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 916 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([ISO-639], [RFC-1766], [RFC-1866], [RFC-1563], [RFC-1521], [RFC-1523]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 1995) is 10361 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'ISO-639' is mentioned on line 420, but not defined -- Looks like a reference, but probably isn't: '62' on line 837 == Unused Reference: 'RFC-1341' is defined on line 732, but no explicit reference was found in the text == Unused Reference: 'RFC-1642' is defined on line 736, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1341 (Obsoleted by RFC 1521) ** Obsolete normative reference: RFC 1521 (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 1523 (Obsoleted by RFC 1563, RFC 1896) ** Obsolete normative reference: RFC 1563 (Obsoleted by RFC 1896) ** Obsolete normative reference: RFC 1642 (Obsoleted by RFC 2152) ** Obsolete normative reference: RFC 1766 (Obsoleted by RFC 3066, RFC 3282) ** Obsolete normative reference: RFC 1866 (Obsoleted by RFC 2854) Summary: 18 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group P. Resnick 2 INTERNET-DRAFT A. Walker 3 To-obsolete RFCs: 1523, 1563 November 1995 4 Category: Informational 6 The text/enriched MIME Content-type 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its 12 areas, and its working groups. Note that other groups may also 13 distribute working documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six 16 months and may be updated, replaced, or obsoleted by other 17 documents at any time. It is inappropriate to use Internet- 18 Drafts as reference material or to cite them other than as 19 ``work in progress.'' 21 To learn the current status of any Internet-Draft, please check 22 the ``1id-abstracts.txt'' listing contained in the Internet- 23 Drafts Shadow Directories on ftp.is.co.za (Africa), 24 nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), 25 ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 27 Abstract 29 MIME [RFC-1521] defines a format and general framework for the 30 representation of a wide variety of data types in Internet mail. This 31 document defines one particular type of MIME data, the text/enriched 32 MIME type. The text/enriched MIME type is intended to facilitate the 33 wider interoperation of simple enriched text across a wide variety of 34 hardware and software platforms. This document is only a minor revision 35 to the text/enriched MIME type that was first described in [RFC-1523] 36 and [RFC-1563], and is only intended to be used in the short term until 37 other MIME types for text formatting in Internet mail are developed and 38 deployed. 40 The text/enriched MIME type 42 In order to promote the wider interoperability of simple formatted 43 text, this document defines an extremely simple subtype of the MIME 44 content-type "text", the "text/enriched" subtype. The content-type line 45 for this type may have one optional parameter, the "charset" parameter, 46 with the same values permitted for the "text/plain" MIME content-type. 48 The text/enriched subtype was designed to meet the following criteria: 50 1. The syntax must be extremely simple to parse, so that even 51 teletype-oriented mail systems can easily strip away the 52 formatting information and leave only the readable text. 54 2. The syntax must be extensible to allow for new formatting commands 55 that are deemed essential for some application. 57 3. If the character set in use is ASCII or an 8- bit ASCII superset, 58 then the raw form of the data must be readable enough to be 59 largely unobjectionable in the event that it is displayed on the 60 screen of the user of a non-MIME-conformant mail reader. 62 4. The capabilities must be extremely limited, to ensure that it can 63 represent no more than is likely to be representable by the user's 64 primary word processor. While this limits what can be sent, it 65 increases the likelihood that what is sent can be properly 66 displayed. 68 There are other text formatting standards which meet some of these 69 criteria. In particular, HTML and SGML have come into widespread use 70 on the Internet. However, there are two important reasons that this 71 document further promotes the use of text/enriched in Internet mail 72 over other such standards: 74 1. Most MIME-aware Internet mail applications are already able to 75 either properly format text/enriched mail or, at the very least, 76 are able to strip out the formatting commands and display the 77 readable text. The same is not true for HTML or SGML. 79 2. The current RFC on HTML [RFC-1866] and Internet Drafts on SGML 80 have many features which are not necessary for Internet mail, and 81 are missing a few capabilities that text/enriched already has. 83 For these reasons, this document is promoting the use of text/enriched 84 until other Internet standards come into more widespread use. For those 85 who will want to use HTML, Appendix B of this document contains a very 86 simple C program that converts text/enriched to HTML 2.0 described in 87 [RFC-1866]. 89 Syntax 91 The syntax of "text/enriched" is very simple. It represents text in a 92 single character set--US-ASCII by default, although a different 93 character set can be specified by the use of the "charset" parameter. 94 (The semantics of text/enriched in non-ASCII character sets are 95 discussed later in this document.) All characters represent themselves, 96 with the exception of the "<" character (ASCII 60), which is used to 97 mark the beginning of a formatting command. A literal less-than sign 98 ("<") can be represented by a sequence of two such characters, "<<". 100 Formatting instructions consist of formatting commands surrounded by 101 angle brackets ("<>", ASCII 60 and 62). Each formatting command may be 102 no more than 60 characters in length, all in US-ASCII, restricted to 103 the alphanumeric and hyphen ("-") characters. Formatting commands may 104 be preceded by a solidus ("/", ASCII 47), making them negations, and 105 such negations must always exist to balance the initial opening 106 commands. Thus, if the formatting command "" appears at some 107 point, there must later be a "" to balance it. (NOTE: The 60 108 character limit on formatting commands does NOT include the "<", ">", 109 or "/" characters that might be attached to such commands.) 111 Line break rules 113 Line breaks (CRLF pairs in standard network representation) are handled 114 specially. In particular, isolated CRLF pairs are translated into a 115 single SPACE character. Sequences of N consecutive CRLF pairs, however, 116 are translated into N-1 actual line breaks. This permits long lines of 117 data to be represented in a natural looking manner despite the 118 frequency of line-wrapping in Internet mailers. When preparing the data 119 for mail transport, isolated line breaks should be inserted wherever 120 necessary to keep each line shorter than 80 characters. When preparing 121 such data for presentation to the user, isolated line breaks should be 122 replaced by a single SPACE character, and N consecutive CRLF pairs 123 should be presented to the user as N-1 line breaks. 125 Thus text/enriched data that looks like this: 127 This is 128 a single 129 line 131 This is the 132 next line. 134 This is the 135 next paragraph. 137 should be displayed by a text/enriched interpreter as follows: 139 This is a single line 140 This is the next line. 142 This is the next paragraph. 144 The formatting commands, not all of which will be implemented by all 145 implementations, are described in the following sections. 147 Formatting Commands 149 The text/enriched formatting commands all begin with and 150 end with , affecting the formatting of the text between 151 those two tokens. The commands are described here, grouped according to 152 type. 154 Parameter Command 156 Some of the formatting commands may require one or more associated 157 parameters. The "param" command is a special formatting command used to 158 include these parameters. 160 Param 161 Marks the affected text as command parameters, to be 162 interpreted or ignored by the text/enriched interpreter, but 163 not to be shown to the reader. The "param" command always 164 immediately follows some other formatting command, and the 165 parameter data indicates some additional information about 166 the formatting that is to be done. The syntax of the 167 parameter data (whatever appears between the initial 168 "" and the terminating "") is defined for each 169 command that uses it. However, it is always required that the 170 format of such data must not contain nested "param" commands, 171 and either must not use the "<" character or must use it in a 172 way that is compatible with text/enriched parsing. That is, 173 the end of the parameter data should be recognizable with 174 either of two algorithms: simply searching for the first 175 occurrence of "" or parsing until a balanced 176 "" command is found. In either case, however, the 177 parameter data should not be shown to the human reader. 179 Font-Alteration Commands 181 The following formatting commands are intended to alter the font in 182 which text is displayed, but not to alter the indentation or 183 justification state of the text: 185 Bold 186 causes the affected text to be in a bold font. Nested bold 187 commands have the same effect as a single bold command. 189 Italic 190 causes the affected text to be in an italic font. Nested 191 italic commands have the same effect as a single italic 192 command. 194 Underline 195 causes the affected text to be underlined. Nested underline 196 commands have the same effect as a single underline command. 198 Fixed 199 causes the affected text to be in a fixed width font. Nested 200 fixed commands have the same effect as a single fixed 201 command. 203 FontFamily 204 causes the affected text to be displayed in a specified 205 typeface. The "fontfamily" command requires a parameter that 206 is specified by using the "param" command. The parameter data 207 is a case-insensitive string containing the name of a font 208 family. Any currently available font family name (e.g. Times, 209 Palatino, Courier, etc.) may be used. This includes font 210 families defined by commercial type foundries such as Adobe, 211 BitStream, or any other such foundry. Note that 212 implementations should only use the general font family name, 213 not the specific font name (e.g. use "Times", not 214 "TimesRoman" nor "TimesBoldItalic"). Also note that the 215 "fontfamily" command is advisory only; it should not be 216 expected that other implementations will honor the typeface 217 information in this command since the font capabilities of 218 systems vary drastically. 220 Color 221 causes the affected text to be displayed in a specified 222 color. The "color" command requires a parameter that is 223 specified by using the "param" command. The parameter data 224 can be one of the following: 226 red 227 blue 228 green 229 yellow 230 cyan 231 magenta 232 black 233 white 235 or an RGB color value in the form: 237 ####,####,#### 239 where '#' is a hexadecimal digit '0' through '9', 'A' through 240 'F', or 'a' through 'f'. The three 4-digit hexadecimal values 241 are the RGB values for red, green, and blue respectively, 242 where each component is expressed as an unsigned value 243 between 0 (0000) and 65535 (FFFF). The default color for the 244 message is unspecified, though black is a common choice in 245 many environments. Text/enriched implementations should not 246 produce "color" commands that are nested, but in the event 247 that nested "color" commands are enountered, the inner 248 "color" command takes precedence. 250 Smaller 251 causes the affected text to be in a smaller font. It is 252 recommended that the font size be changed by two points, but 253 other amounts may be more appropriate in some environments. 254 Nested smaller commands produce ever smaller fonts, to the 255 limits of the implementation's capacity to reasonably display 256 them, after which further smaller commands have no 257 incremental effect. 259 Bigger 260 causes the affected text to be in a bigger font. It is 261 recommended that the font size be changed by two points, but 262 other amounts may be more appropriate in some environments. 263 Nested bigger commands produce ever bigger fonts, to the 264 limits of the implementation's capacity to reasonably display 265 them, after which further bigger commands have no incremental 266 effect. 268 While the "bigger" and "smaller" operators are effectively inverses, it 269 is not recommended, for example, that "" be used to end the 270 effect of "". This is properly done with "". 272 Since the capabilities of implementations will vary, it is to be 273 expected that some implementations will not be able to act on some of 274 the font-alteration commands. However, an implementation should still 275 display the text to the user in a reasonable fashion. In particular, 276 the lack of capability to display a particular font family, color, or 277 other text attribute does not mean that an implementation should fail 278 to display text. 280 Fill/Justification/Indentation Commands 282 Initially, text/enriched text is intended to be displayed fully filled 283 (that is, using the rules specified for replacing CRLF pairs with 284 spaces or removing them as appropriate) with appropriate kerning and 285 letter-tracking, and using the maximum available margins as suits the 286 capabilities of the receiving user agent software. 288 The following commands alter that state. Each of these commands force a 289 line break before and after the formatting environment if there is not 290 otherwise a line break. For example, if one of these commands occurs 291 anywhere other than the beginning of a line of text as presented, a new 292 line is begun. 294 Center 295 causes the affected text to be centered. 297 FlushLeft 298 causes the affected text to be left-justified with a ragged 299 right margin. 301 FlushRight 302 causes the affected text to be right-justified with a ragged 303 left margin. 305 FlushBoth 306 causes the affected text to be filled and padded so as to 307 create smooth left and right margins, i.e., to be fully 308 justified. 310 Nofill 311 causes the affected text to be displayed without filling or 312 justification. That is, the text is displayed without using 313 the rules for replacing CRLF pairs with spaces or removing 314 consecutive sequences of CRLF pairs and is displayed in the 315 default justification without making any adjustments for 316 flushing text to either margin. 318 ParaIndent 319 causes the running margins of the affected text to be moved 320 in. The recommended indentation change is the width of four 321 characters, but this may differ among implementations. The 322 "paraindent" command requires a parameter that is specified 323 by using the "param" command. The parameter data is a 324 comma-seperated list of between one and four of the 325 following: 327 Left 328 causes the running left margin to be moved to the 329 right. 331 Right 332 causes the running right margin to be moved to the 333 left. 335 In 336 causes the first line of the affected text to be 337 indented in addition to the running margin. The 338 remaining lines remain flush to the running margin. 340 Out 341 causes all lines except for the first line of the 342 affected text to be indented in addition to the running 343 margin. The first line remains flush to the running 344 margin. 346 The center, flushleft, flushright, and flushboth commands are mutually 347 exclusive, and, when nested, the inner command takes precedence. 349 Nested "paraindent" commands cause the affected text to be further 350 indented according to the parameters. 352 Whether or not text is justified by default (that is, whether the 353 default environment is flushleft, flushright, or flushboth) is 354 unspecified, and depends on the preferences of the user, the 355 capabilities of the local software and hardware, and the nature of the 356 character set in use. On systems where justification is considered 357 undesirable, the flushboth environment may be identical to the default 358 environment. Note that justification should never be performed inside 359 of center, flushleft, flushright, or nofill environments. Note also 360 that for some non-ASCII character sets, full justification may be 361 fundamentally inappropriate. 363 Note that [RFC-1563] defined two additional indentation commands, 364 "Indent" and "IndentRight". These commands did not force a line break, 365 and therefore their behavior was unpredictable since they depended on 366 the margins and character sizes that a particular implementation used. 367 Therefore, their use is deprecated and they should be ignored just as 368 other unrecognized commands. 370 Markup Commands 372 Commands in this section, unlike the other text/enriched commands are 373 declarative markup commands. Text/enriched is not intended as a full 374 markup language, but instead as a simple way to represent common 375 formatting commands. Therefore, markup commands are purposely kept to a 376 minimum. It is only because each was deemed so prevalent or necessary 377 in an e-mail environment that these particular commands have been 378 included at all. 380 Excerpt 381 causes the affected text to be interpreted as a textual 382 excerpt from another source, probably a message being 383 responded to. Typically this will be displayed using 384 indentation and an alternate font, or by indenting lines and 385 preceding them with "> ", but such decisions are up to the 386 implementation. Note that as with the justification commands, 387 the excerpt command implicitly begins and ends with a line 388 break if one is not already there. Nested "excerpt" commands 389 are acceptable and should be interpreted as meaning that the 390 excerpted text was excerpted from yet another source. Again, 391 this can be displayed using additional indentation, different 392 colors, etc. 394 Optionally, the "excerpt" command can take a parameter by 395 using the "param" command. The format of the data is 396 unspecified, but it is intended to uniquely identify the text 397 from which the excerpt is taken. With this information, an 398 implementation should be able to uniquely identify the source 399 of any particular excerpt, especially if two or more excerpts 400 in the message are from the same source, and display it in 401 some way that makes this apparent to the user. 403 Lang 404 causes the affected text to be interpreted as belonging to a 405 particular language. This is most useful when two different 406 languages use the same character set, but may require a 407 different font or formatting depending on the language. For 408 instance, Chinese and Japanese share similar character 409 glyphs, and in some character sets like UNICODE share common 410 code points, but it is considered very important that 411 different fonts be used for the two languages, especially if 412 they appear together, so that meaning is not lost. Also, 413 language information can be used to allow for fancier text 414 handling, like spell checking or hyphenation. 416 The "lang" command requires a parameter using the "param" 417 command. The parameter data can be any of the language tags 418 specified in [RFC-1766], "Tags for the Identification of 419 Languages". These tags are the two letter language codes 420 taken from [ISO-639] or can be other language codes that 421 are registered according to the instructions in the 422 Langauge Tags RFC. Consult that memo for further 423 information. 425 Balancing and Nesting of Formatting Commands 427 Pairs of formatting commands must be properly balanced and nested. 428 Thus, a proper way to describe text in bold italics is: 430 the-text 432 or, alternately, 434 the-text 436 but, in particular, the following is illegal text/enriched: 438 the-text 440 The nesting requirement for formatting commands imposes a slightly 441 higher burden upon the composers of text/enriched bodies, but 442 potentially simplifies text/enriched displayers by allowing them to be 443 stack-based. The main goal of text/enriched is to be simple enough to 444 make multifont, formatted email widely readable, so that those with the 445 capability of sending it will be able to do so with confidence. Thus 446 slightly increased complexity in the composing software was deemed a 447 reasonable tradeoff for simplified reading software. Nonetheless, 448 implementors of text/enriched readers are encouraged to follow the 449 general Internet guidelines of being conservative in what you send and 450 liberal in what you accept. Those implementations that can do so are 451 encouraged to deal reasonably with improperly nested text/enriched 452 data. 454 Unrecognized formatting commands 456 Implementations must regard any unrecognized formatting command as 457 "no-op" commands, that is, as commands having no effect, thus 458 facilitating future extensions to "text/enriched". Private extensions 459 may be defined using formatting commands that begin with "X-", by 460 analogy to Internet mail header field names. 462 In order to formally define extended commands, a new Internet document 463 should be published. 465 White Space in Text/enriched Data 467 No special behavior is required for the SPACE or TAB (HT) character. It 468 is recommended, however, that, at least when fixed-width fonts are in 469 use, the common semantics of the TAB (HT) character should be observed, 470 namely that it moves to the next column position that is a multiple of 471 8. (In other words, if a TAB (HT) occurs in column n, where the 472 leftmost column is column 0, then that TAB (HT) should be replaced by 473 8-(n mod 8) SPACE characters.) It should also be noted that some mail 474 gateways are notorious for losing (or, less commonly, adding) white 475 space at the end of lines, so reliance on SPACE or TAB characters at 476 the end of a line is not recommended. 478 Initial State of a text/enriched interpreter 480 Text/enriched is assumed to begin with filled text in a variable-width 481 font in a normal typeface and a size that is average for thecurrent 482 display and user. The left and right margins are assumed to be maximal, 483 that is, at the leftmost and rightmost acceptable positions. 485 Non-ASCII character sets 487 One of the great benefits of MIME is the ability to use different 488 varieties of non-ASCII text in messages. To use non-ASCII text in a 489 message, normally a charset parameter is specified in the Content-type 490 line that indicates the character set being used. For purposes of this 491 RFC, any legal MIME charset parameter can be used with the 492 text/enriched Content-type. However, there are two difficulties that 493 arise with regard to the text/enriched Content-type when non-ASCII text 494 is desired. The first problem involves difficulties that occur when the 495 user wishes to create text which would normally require multiple 496 non-ASCII character sets in the same text/enriched message. The second 497 problem is an ambiguity that arises because of the text/enriched use of 498 the "<" character in formatting commands. 500 Using multiple non-ASCII character sets 502 Normally, if a user wishes to produce text which contains characters 503 from entirely different character sets within the same MIME message 504 (for example, using Russian Cyrillic characters from ISO 8859-5 and 505 Hebrew characters from ISO 8859-8), a multipart message is used. Every 506 time a new character set is desired, a new MIME body part is started 507 with different character sets specified in the charset parameter of the 508 Content-type line. However, using multiple character sets this way in 509 text/enriched messages introduces problems. Since a change in the 510 charset parameter requires a new part, text/enriched formatting 511 commands used in the first part would not be able to apply to text that 512 occurs in subsequent parts. It is not possible for text/enriched 513 formatting commands to apply across MIME body part boundaries. 515 RFC 1341 attempted to get around this problem in the now obsolete 516 text/richtext format by introducing different character set formatting 517 commands like "iso-8859-5" and "us-ascii". But this, or even a more 518 general solution along the same lines, is still undesirable: It is 519 common for a MIME application to decide, for example, what character 520 font resources or character lookup tables it will require based on the 521 information provided by the charset parameter of the Content-type line, 522 before it even begins to interpret or display the data in that body 523 part. By allowing the text/enriched interpreter to subsequently change 524 the character set, perhaps to one completely different from the charset 525 specified in the Content-type line (with potentially much different 526 resource requirements), too much burden would be placed on the 527 text/enriched interpreter itself. 529 Therefore, if multiple types of non-ASCII characters are desired in a 530 text/enriched document, one of the following two methods must be used: 532 1. For cases where the different types of non-ASCII text can be 533 limited to their own paragraphs with distinct formatting, a 534 multipart message can be used with each part having a Content-Type 535 of text/enriched and a different charset parameter. The one caveat 536 to using this method is that each new part must start in the 537 initial state for a text/enriched document. That means that all of 538 the text/enriched commands in the preceding part must be properly 539 balanced with ending commands before the next text/enriched part 540 begins. Also, each text/enriched part must begin a new paragraph. 542 2. If different types of non-ASCII text are to appear in the same 543 line or paragraph, or if text/enriched formatting (e.g. margins, 544 typeface, justification) is required across several different 545 types of non-ASCII text, a single text/enriched body part should 546 be used with a character set specified that contains all of the 547 required characters. For example, a charset parameter of 548 "UNICODE-1-1-UTF-7" as specified in RFC 1642 could be used for 549 such purposes. Not only does UNICODE contain all of the characters 550 that can be represented in all of the other registered ISO 8859 551 MIME character sets, but UTF-7 is fully compatible with other 552 aspects of the text/enriched standard, including the use of the 553 "<" character referred to below. Any other character sets that are 554 specified for use in MIME which contain different types of 555 non-ASCII text can also be used in these instances. 557 Use of the "<" character in formatting commands 559 If the character set specified by the charset parameter on the 560 Content-type line is anything other than "US- ASCII", this means that 561 the text being described by text/enriched formatting commands is in a 562 non-ASCII character set. However, the commands themselves are still the 563 same ASCII commands that are defined in this document. This creates an 564 ambiguity only with reference to the "<" character, the octet with 565 numeric value 60. In single byte character sets, such as the ISO-8859 566 family, this is not a problem; the octet 60 can be quoted by including 567 it twice, just as for ASCII. The problem is more complicated, however, 568 in the case of multi-byte character sets, where the octet 60 might 569 appear at any point in the byte sequence for any of several 570 characters. 572 In practice, however, most multi-byte character sets address this 573 problem internally. For example, the UNICODE character sets can use the 574 UTF-7 encoding which preserves all of the important ASCII characters in 575 their single byte form. The ISO-2022 family of character sets can use 576 certain character sequences to switch back into ASCII at any moment. 577 Therefore it is specified that, before text/enriched formatting 578 commands, the prevailing character set should be "switched back" into 579 ASCII, and that only those characters which would be interpreted as "<" 580 in plain text should be interpreted as token delimiters in 581 text/enriched. 583 The question of what to do for hypothetical future character sets that 584 do not subsume ASCII is not addressed in this memo. 586 Minimal text/enriched conformance 588 A minimal text/enriched implementation is one that converts "<<" to 589 "<", removes everything between a command and the next 590 balancing command, removes all other formatting commands (all 591 text enclosed in angle brackets), and, outside of 592 environments, converts any series of n CRLFs to n-1 CRLFs, and converts 593 any lone CRLF pairs to SPACE. 595 Notes for Implementors 597 It is recognized that implementors of future mail systems will want 598 rich text functionality far beyond that currently defined for 599 text/enriched. The intent of text/enriched is to provide a common 600 format for expressing that functionality in a form in which much of it, 601 at least, will be understood by interoperating software. Thus, in 602 particular, software with a richer notion of formatted text than 603 text/enriched can still use text/enriched as its basic representation, 604 but can extend it with new formatting commands and by hiding 605 information specific to that software system in text/enriched 606 constructs. As such systems evolve, it is expected that the definition 607 of text/enriched will be further refined by future published 608 specifications, but text/enriched as defined here provides a platform 609 on which evolutionary refinements can be based. 611 An expected common way that sophisticated mail programs will generate 612 text/enriched data is as part of a multipart/alternative construct. For 613 example, a mail agent that can generate enriched mail in ODA format can 614 generate that mail in a more widely interoperable form by generating 615 both text/enriched and ODA versions of the same data, e.g.: 617 Content-type: multipart/alternative; boundary=foo 619 --foo 620 Content-type: text/enriched 622 [text/enriched version of data] 623 --foo Content-type: application/oda 625 [ODA version of data] 626 --foo-- 628 If such a message is read using a MIME-conformant mail reader that 629 understands ODA, the ODA version will be displayed; otherwise, the 630 text/enriched version will be shown. 632 In some environments, it might be impossible to combine certain 633 text/enriched formatting commands, whereas in others they might be 634 combined easily. For example, the combination of and 635 might produce bold italics on systems that support such fonts, but 636 there exist systems that can make text bold or italicized, but not 637 both. In such cases, the most recently issued (innermost) recognized 638 formatting command should be preferred. 640 One of the major goals in the design of text/enriched was to make it so 641 simple that even text-only mailers will implement enriched-to- 642 plain-text translators, thus increasing the likelihood that enriched 643 text will become "safe" to use very widely. To demonstrate this 644 simplicity, an extremely simple C program that converts text/enriched 645 input into plain text output is included in Appendix A. 647 Extensions to text/enriched 649 It is expected that various mail system authors will desire extensions 650 to text/enriched. The simple syntax of text/enriched, and the 651 specification that unrecognized formatting commands should simply be 652 ignored, are intended to promote such extensions. 654 An Example 656 Putting all this together, the following "text/enriched" body 657 fragment: 659 From: Nathaniel Borenstein 660 To: Ned Freed 661 Content-type: text/enriched 663 Now is the time for all 664 good men 665 (and <) to 666 come 668 to the aid of their 670 redbeloved 671 country. 673 By the way, 674 I think that left< 676 should REALLY be called 678 left< 679 and that I am always right. 681 -- the end 683 represents the following formatted text (which will, no doubt, look 684 somewhat cryptic in the text-only version of this document): 686 Now is the time for all good men (and ) to come 687 to the aid of their 689 beloved country. 690 By the way, I think that 691 692 should REALLY be called 693 694 and that I am always right. 695 -- the end 697 where the word "beloved" would be in red on a color display. 699 Security Considerations 701 Security issues are not discussed in this memo, as the mechanism raises 702 no security issues. 704 Author's Address 706 For more information, the authors of this document may be contacted via 707 Internet mail: 709 Peter W. Resnick 710 QUALCOMM Incorporated 711 1009 North Busey Avenue 712 Urbana, IL 61801-1607 713 Phone: +1 217 337 1905 714 FAX: +1 217 337 1905 715 e-mail: presnick@qualcomm.com 717 Amanda Walker 718 InterCon Systems Corporation 719 950 Herndon Parkway 720 Herndon, VA 22070 721 Phone: +1 703 709 5500 722 FAX: +1 703 709 5555 723 e-mail: amanda@intercon.com 725 Acknowledgements 727 (In the process of being written) 729 References 731 (Full citations will appear in the final draft) 732 [RFC-1341] 733 [RFC-1521] 734 [RFC-1523] 735 [RFC-1563] 736 [RFC-1642] 737 [RFC-1766] 738 [RFC-1866] 740 Appendix A--A Simple enriched-to-plain Translator in C 742 One of the major goals in the design of the text/enriched subtype of 743 the text Content-Type is to make formatted text so simple that even 744 text-only mailers will implement enriched-to-plain-text translators, 745 thus increasing the likelihood that multifont text will become "safe" 746 to use very widely. To demonstrate this simplicity, what follows is a 747 simple C program that converts text/enriched input into plain text 748 output. Note that the local newline convention (the single character 749 represented by "\n") is assumed by this program, but that special CRLF 750 handling might be necessary on some systems. 752 #include 753 #include 754 #include 755 #include 757 main() { 758 int c, i, paramct=0, newlinect=0, nofill=0; 759 char token[62], *p; 761 while ((c=getc(stdin)) != EOF) { 762 if (c == '<') { 763 if (newlinect == 1) putc(' ', stdout); 764 newlinect = 0; 765 c = getc(stdin); 766 if (c == '<') { 767 if (paramct <= 0) putc(c, stdout); 768 } else { 769 ungetc(c, stdin); 770 for (i=0, p=token; 771 (c=getc(stdin)) != EOF && c != '>'; 772 i++) { 773 if (i < sizeof(token)-1) { 774 *p++ = isupper(c) ? tolower(c) : c; 775 } 776 } 777 *p = '\0'; 778 if (c == EOF) break; 779 if (strcmp(token, "param") == 0) 780 paramct++; 781 else if (strcmp(token, "nofill") == 0) 782 nofill++; 783 else if (strcmp(token, "/param") == 0) 784 paramct--; 785 else if (strcmp(token, "/nofill") == 0) 786 nofill--; 787 } 788 } else { 789 if (paramct > 0) 790 ; /* ignore params */ 791 else if (c == '\n' && nofill <= 0) { 792 if (++newlinect > 1) putc(c, stdout); 793 } else { 794 if (newlinect == 1) putc(' ', stdout); 795 newlinect = 0; 796 putc(c, stdout); 797 } 798 } 799 } 800 /* The following line is only needed with line-buffering */ 801 putc('\n', stdout); 802 exit(0); 803 } 805 It should be noted that one can do considerably better than this in 806 displaying text/enriched data on a dumb terminal. In particular, one 807 can replace font information such as "bold" with textual emphasis (like 808 *this* or _T_H_I_S_). One can also properly handle the text/enriched 809 formatting commands regarding indentation, justification, and others. 810 However, the above program is all that is necessary in order to present 811 text/enriched on a dumb terminal without showing the user any 812 formatting artifacts. 814 Appendix B--A Simple enriched-to-HTML Translator in C 816 It is fully expected that other text formatting standards like HTML and 817 SGML will supplant text/enriched in Internet mail. It is also likely 818 that as this happens, recipients of text/enriched mail will wish to 819 view such mail with an HTML viewer. To this end, the following is a 820 simple example of a C program to convert text/enriched to HTML. Since 821 the current version of HTML at the time of this document's publication 822 is HTML 2.0 defined in [RFC-1866], this program converts to that 823 standard. There are several text/enriched commands that have no HTML 824 2.0 equivalent. In those cases, this program simply puts those commands 825 into processing instructions; that is, surrounded by "". As 826 in Appendix A, the local newline convention (the single character 827 represented by "\n") is assumed by this program, but special CRLF 828 handling might be necessary on some systems. 830 #include 831 #include 832 #include 833 #include 835 main() { 836 int c, i, paramct=0, nofill=0; 837 char token[62], *p; 839 while((c=getc(stdin)) != EOF) { 840 if(c == '<') { 841 c = getc(stdin); 842 if(c == '<') { 843 fputs("<", stdout); 844 } else { 845 ungetc(c, stdin); 846 for (i=0, p=token; 847 (c=getc(stdin)) != EOF && c != '>'; 848 i++) { 849 if (i < sizeof(token)-1) { 850 *p++ = isupper(c) ? tolower(c) : c; 851 } 852 } 853 *p = '\0'; 854 if(c == EOF) break; 855 if(strcmp(token, "/param") == 0) { 856 paramct--; 857 putc('>', stdout); 858 } else if(paramct > 0) { 859 fputs("<", stdout); 860 fputs(token, stdout); 861 fputs(">", stdout); 862 } else { 863 putc('<', stdout); 864 if(strcmp(token, "nofill") == 0) { 865 nofill++; 866 fputs("pre", stdout); 867 } else if(strcmp(token, "/nofill") == 0) { 868 nofill--; 869 fputs("/pre", stdout); 870 } else if(strcmp(token, "bold") == 0) { 871 fputs("b", stdout); 872 } else if(strcmp(token, "/bold") == 0) { 873 fputs("/b", stdout); 874 } else if(strcmp(token, "italic") == 0) { 875 fputs("i", stdout); 876 } else if(strcmp(token, "/italic") == 0) { 877 fputs("/i", stdout); 878 } else if(strcmp(token, "fixed") == 0) { 879 fputs("tt", stdout); 880 } else if(strcmp(token, "/fixed") == 0) { 881 fputs("/tt", stdout); 882 } else if(strcmp(token, "excerpt") == 0) { 883 fputs("blockquote", stdout); 884 } else if(strcmp(token, "/excerpt") == 0) { 885 fputs("/blockquote", stdout); 886 } else { 887 putc('?', stdout); 888 fputs(token, stdout); 889 if(strcmp(token, "param") == 0) { 890 paramct++; 891 putc(' ', stdout); 892 continue; 893 } 894 } 895 putc('>', stdout); 896 } 897 } 898 } else if(c == '>') { 899 fputs(">", stdout); 900 } else { 901 if(c == '\n' && nofill <= 0 && paramct <= 0) { 902 while((i=getc(stdin)) == '\n') fputs("
", stdout); 903 ungetc(i, stdin); 904 } 905 putc(c, stdout); 906 } 907 } 908 /* The following line is only needed with line-buffering */ 909 putc('\n', stdout); 910 exit(0); 911 }