idnits 2.17.1 draft-resnick-text-enriched-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 954 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 19 instances of too long lines in the document, the longest one being 28 characters in excess of 72. ** The abstract seems to contain references ([ISO-639], [RFC-1766], [RFC-1866], [RFC-1563], [RFC-1521], [RFC-1523]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 1996) is 10329 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'ISO-639' is mentioned on line 448, but not defined -- Looks like a reference, but probably isn't: '62' on line 879 ** Obsolete normative reference: RFC 1341 (Obsoleted by RFC 1521) ** Obsolete normative reference: RFC 1521 (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 1523 (Obsoleted by RFC 1563, RFC 1896) ** Obsolete normative reference: RFC 1563 (Obsoleted by RFC 1896) ** Obsolete normative reference: RFC 1642 (Obsoleted by RFC 2152) ** Obsolete normative reference: RFC 1766 (Obsoleted by RFC 3066, RFC 3282) ** Obsolete normative reference: RFC 1866 (Obsoleted by RFC 2854) Summary: 17 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group P. Resnick 2 INTERNET-DRAFT QUALCOMM 3 To-obsolete RFCs: 1523, 1563 A. Walker 4 Category: Informational InterCon 5 January 1996 6 8 The text/enriched MIME Content-type 10 Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, and 14 its working groups. Note that other groups may also distribute working 15 documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference material 20 or to cite them other than as "work in progress." 22 To learn the current status of any Internet-Draft, please check the 23 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 24 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 25 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 26 ftp.isi.edu (US West Coast). 28 Abstract 30 MIME [RFC-1521] defines a format and general framework for the 31 representation of a wide variety of data types in Internet mail. This 32 document defines one particular type of MIME data, the text/enriched 33 MIME type. The text/enriched MIME type is intended to facilitate the 34 wider interoperation of simple enriched text across a wide variety of 35 hardware and software platforms. This document is only a minor revision 36 to the text/enriched MIME type that was first described in [RFC-1523] 37 and [RFC-1563], and is only intended to be used in the short term until 38 other MIME types for text formatting in Internet mail are developed and 39 deployed. 41 The text/enriched MIME type 43 In order to promote the wider interoperability of simple formatted text, 44 this document defines an extremely simple subtype of the MIME 45 content-type "text", the "text/enriched" subtype. The content-type line 46 for this type may have one optional parameter, the "charset" parameter, 47 with the same values permitted for the "text/plain" MIME content-type. 49 The text/enriched subtype was designed to meet the following criteria: 51 1. The syntax must be extremely simple to parse, so that even 52 teletype-oriented mail systems can easily strip away the formatting 53 information and leave only the readable text. 55 2. The syntax must be extensible to allow for new formatting commands 56 that are deemed essential for some application. 58 3. If the character set in use is ASCII or an 8- bit ASCII superset, 59 then the raw form of the data must be readable enough to be largely 60 unobjectionable in the event that it is displayed on the screen of 61 the user of a non-MIME-conformant mail reader. 63 4. The capabilities must be extremely limited, to ensure that it can 64 represent no more than is likely to be representable by the user's 65 primary word processor. While this limits what can be sent, it 66 increases the likelihood that what is sent can be properly 67 displayed. 69 There are other text formatting standards which meet some of these 70 criteria. In particular, HTML and SGML have come into widespread use on 71 the Internet. However, there are two important reasons that this 72 document further promotes the use of text/enriched in Internet mail over 73 other such standards: 75 1. Most MIME-aware Internet mail applications are already able to 76 either properly format text/enriched mail or, at the very least, 77 are able to strip out the formatting commands and display the 78 readable text. The same is not true for HTML or SGML. 80 2. The current RFC on HTML [RFC-1866] and Internet Drafts on SGML have 81 many features which are not necessary for Internet mail, and are 82 missing a few capabilities that text/enriched already has. 84 For these reasons, this document is promoting the use of text/enriched 85 until other Internet standards come into more widespread use. For those 86 who will want to use HTML, Appendix B of this document contains a very 87 simple C program that converts text/enriched to HTML 2.0 described in 88 [RFC-1866]. 90 Syntax 92 The syntax of "text/enriched" is very simple. It represents text in a 93 single character set--US-ASCII by default, although a different 94 character set can be specified by the use of the "charset" parameter. 95 (The semantics of text/enriched in non-ASCII character sets are 96 discussed later in this document.) All characters represent themselves, 97 with the exception of the "<" character (ASCII 60), which is used to 98 mark the beginning of a formatting command. A literal less-than sign 99 ("<") can be represented by a sequence of two such characters, "<<". 101 Formatting instructions consist of formatting commands surrounded by 102 angle brackets ("<>", ASCII 60 and 62). Each formatting command may be 103 no more than 60 characters in length, all in US-ASCII, restricted to the 104 alphanumeric and hyphen ("-") characters. Formatting commands may be 105 preceded by a solidus ("/", ASCII 47), making them negations, and such 106 negations must always exist to balance the initial opening commands. 107 Thus, if the formatting command "" appears at some point, there 108 must later be a "" to balance it. (NOTE: The 60 character limit 109 on formatting commands does NOT include the "<", ">", or "/" characters 110 that might be attached to such commands.) 112 Line break rules 114 Line breaks (CRLF pairs in standard network representation) are handled 115 specially. In particular, isolated CRLF pairs are translated into a 116 single SPACE character. Sequences of N consecutive CRLF pairs, however, 117 are translated into N-1 actual line breaks. This permits long lines of 118 data to be represented in a natural looking manner despite the frequency 119 of line-wrapping in Internet mailers. When preparing the data for mail 120 transport, isolated line breaks should be inserted wherever necessary to 121 keep each line shorter than 80 characters. When preparing such data for 122 presentation to the user, isolated line breaks should be replaced by a 123 single SPACE character, and N consecutive CRLF pairs should be presented 124 to the user as N-1 line breaks. 126 Thus text/enriched data that looks like this: 128 This is 129 a single 130 line 132 This is the 133 next line. 135 This is the 136 next section. 138 should be displayed by a text/enriched interpreter as follows: 140 This is a single line 141 This is the next line. 143 This is the next section. 145 The formatting commands, not all of which will be implemented by all 146 implementations, are described in the following sections. 148 Formatting Commands 150 The text/enriched formatting commands all begin with and 151 end with , affecting the formatting of the text between 152 those two tokens. The commands are described here, grouped according to 153 type. 155 Parameter Command 157 Some of the formatting commands may require one or more associated 158 parameters. The "param" command is a special formatting command used to 159 include these parameters. 161 Param 162 Marks the affected text as command parameters, to be 163 interpreted or ignored by the text/enriched interpreter, 164 but not to be shown to the reader. The "param" command 165 always immediately follows some other formatting command, 166 and the parameter data indicates some additional 167 information about the formatting that is to be done. The 168 syntax of the parameter data (whatever appears between 169 the initial "" and the terminating "") is 170 defined for each command that uses it. However, it is 171 always required that the format of such data must not 172 contain nested "param" commands, and either must not use 173 the "<" character or must use it in a way that is 174 compatible with text/enriched parsing. That is, the end 175 of the parameter data should be recognizable with either 176 of two algorithms: simply searching for the first 177 occurrence of "" or parsing until a balanced 178 "" command is found. In either case, however, the 179 parameter data should not be shown to the human reader. 181 Font-Alteration Commands 183 The following formatting commands are intended to alter the font in 184 which text is displayed, but not to alter the indentation or 185 justification state of the text: 187 Bold 188 causes the affected text to be in a bold font. Nested 189 bold commands have the same effect as a single bold 190 command. 192 Italic 193 causes the affected text to be in an italic font. Nested 194 italic commands have the same effect as a single italic 195 command. 197 Underline 198 causes the affected text to be underlined. Nested 199 underline commands have the same effect as a single 200 underline command. 202 Fixed 203 causes the affected text to be in a fixed width font. 204 Nested fixed commands have the same effect as a single 205 fixed command. 207 FontFamily 208 causes the affected text to be displayed in a specified 209 typeface. The "fontfamily" command requires a parameter 210 that is specified by using the "param" command. The 211 parameter data is a case-insensitive string containing 212 the name of a font family. Any currently available font 213 family name (e.g. Times, Palatino, Courier, etc.) may be 214 used. This includes font families defined by commercial 215 type foundries such as Adobe, BitStream, or any other 216 such foundry. Note that implementations should only use 217 the general font family name, not the specific font name 218 (e.g. use "Times", not "TimesRoman" nor 219 "TimesBoldItalic"). When nested, the inner "fontfamily" 220 command takes precedence. Also note that the "fontfamily" 221 command is advisory only; it should not be expected that 222 other implementations will honor the typeface information 223 in this command since the font capabilities of systems 224 vary drastically. 226 Color 227 causes the affected text to be displayed in a specified 228 color. The "color" command requires a parameter that is 229 specified by using the "param" command. The parameter 230 data can be one of the following: 232 red 233 blue 234 green 235 yellow 236 cyan 237 magenta 238 black 239 white 241 or an RGB color value in the form: 243 ####,####,#### 245 where '#' is a hexadecimal digit '0' through '9', 'A' 246 through 'F', or 'a' through 'f'. The three 4-digit 247 hexadecimal values are the RGB values for red, green, and 248 blue respectively, where each component is expressed as 249 an unsigned value between 0 (0000) and 65535 (FFFF). The 250 default color for the message is unspecified, though 251 black is a common choice in many environments. When 252 nested, the inner "color" command takes precedence. 254 Smaller 255 causes the affected text to be in a smaller font. It is 256 recommended that the font size be changed by two points, 257 but other amounts may be more appropriate in some 258 environments. Nested smaller commands produce ever 259 smaller fonts, to the limits of the implementation's 260 capacity to reasonably display them, after which further 261 smaller commands have no incremental effect. 263 Bigger 264 causes the affected text to be in a bigger font. It is 265 recommended that the font size be changed by two points, 266 but other amounts may be more appropriate in some 267 environments. Nested bigger commands produce ever bigger 268 fonts, to the limits of the implementation's capacity to 269 reasonably display them, after which further bigger 270 commands have no incremental effect. 272 While the "bigger" and "smaller" operators are effectively inverses, it 273 is not recommended, for example, that "" be used to end the 274 effect of "". This is properly done with "". 276 Since the capabilities of implementations will vary, it is to be 277 expected that some implementations will not be able to act on some of 278 the font-alteration commands. However, an implementation should still 279 display the text to the user in a reasonable fashion. In particular, the 280 lack of capability to display a particular font family, color, or other 281 text attribute does not mean that an implementation should fail to 282 display text. 284 Fill/Justification/Indentation Commands 286 Initially, text/enriched text is intended to be displayed fully filled 287 (that is, using the rules specified for replacing CRLF pairs with spaces 288 or removing them as appropriate) with appropriate kerning and 289 letter-tracking, and using the maximum available margins as suits the 290 capabilities of the receiving user agent software. 292 The following commands alter that state. Each of these commands force a 293 line break before and after the formatting environment if there is not 294 otherwise a line break. For example, if one of these commands occurs 295 anywhere other than the beginning of a line of text as presented, a new 296 line is begun. 298 Center 299 causes the affected text to be centered. 301 FlushLeft 302 causes the affected text to be left-justified with a 303 ragged right margin. 305 FlushRight 306 causes the affected text to be right-justified with a 307 ragged left margin. 309 FlushBoth 310 causes the affected text to be filled and padded so as to 311 create smooth left and right margins, i.e., to be fully 312 justified. 314 ParaIndent 315 causes the running margins of the affected text to be 316 moved in. The recommended indentation change is the width 317 of four characters, but this may differ among 318 implementations. The "paraindent" command requires a 319 parameter that is specified by using the "param" command. 320 The parameter data is a comma-seperated list of one or 321 more of the following: 323 Left 324 causes the running left margin to be moved to the 325 right. 327 Right 328 causes the running right margin to be moved to the 329 left. 331 In 332 causes the first line of the affected paragraph to 333 be indented in addition to the running margin. The 334 remaining lines remain flush to the running margin. 336 Out 337 causes all lines except for the first line of the 338 affected paragraph to be indented in addition to the 339 running margin. The first line remains flush to the 340 running margin. 342 Nofill 343 causes the affected text to be displayed without filling. 344 That is, the text is displayed without using the rules 345 for replacing CRLF pairs with spaces or removing 346 consecutive sequences of CRLF pairs. However, the current 347 state of the margins and justification is honored; any 348 indentation or justification commands are still applied 349 to the text within the scope of the "nofill". 351 The "center", "flushleft", "flushright", and "flushboth" commands are 352 mutually exclusive, and, when nested, the inner command takes 353 precedence. 355 The "nofill" command is mutually exclusive with the "in" and "out" 356 parameters of the "paraindent" command; when they occur in the same 357 scope, their behavior is undefined. 359 The parameter data for the "paraindent" command my contain multiple 360 occurances of the same parameter (i.e. "left", "right", "in", or "out"). 361 Each occurance causes the text to be further indented in the manner 362 indicated by that parameter. Nested "paraindent" commands cause the 363 affected text to be further indented according to the parameters. Note 364 that the "in" and "out" parameters for "paraindent" are mutually 365 exclusive; when they appear together or when nested "paraindent" 366 commands contain both of them, their behavior is undefined. 368 For purposes of the "in" and "out" parameters, a paragraph is defined as 369 text that is delimited by line breaks after applying the rules for 370 replacing CRLF pairs with spaces or removing consecutive sequences of 371 CRLF pairs. For example, within the scope of an "out", the line 372 following each CRLF is made flush with the running margin, and 373 subsequent lines are indented. Within the scope of an "in", the first 374 line following each CRLF is indented, and subsequent lines remain flush 375 to the running margin. 377 Whether or not text is justified by default (that is, whether the 378 default environment is "flushleft", "flushright", or "flushboth") is 379 unspecified, and depends on the preferences of the user, the 380 capabilities of the local software and hardware, and the nature of the 381 character set in use. On systems where full justification is considered 382 undesirable, the "flushboth" environment may be identical to the default 383 environment. Note that full justification should never be performed 384 inside of "center", "flushleft", "flushright", or "nofill" environments. 385 Note also that for some non-ASCII character sets, full justification may 386 be fundamentally inappropriate. 388 Note that [RFC-1563] defined two additional indentation commands, 389 "Indent" and "IndentRight". These commands did not force a line break, 390 and therefore their behavior was unpredictable since they depended on 391 the margins and character sizes that a particular implementation used. 392 Therefore, their use is deprecated and they should be ignored just as 393 other unrecognized commands. 395 Markup Commands 397 Commands in this section, unlike the other text/enriched commands are 398 declarative markup commands. Text/enriched is not intended as a full 399 markup language, but instead as a simple way to represent common 400 formatting commands. Therefore, markup commands are purposely kept to a 401 minimum. It is only because each was deemed so prevalent or necessary in 402 an e-mail environment that these particular commands have been included 403 at all. 405 Excerpt 406 causes the affected text to be interpreted as a textual 407 excerpt from another source, probably a message being 408 responded to. Typically this will be displayed using 409 indentation and an alternate font, or by indenting lines 410 and preceding them with "> ", but such decisions are up 411 to the implementation. Note that as with the 412 justification commands, the excerpt command implicitly 413 begins and ends with a line break if one is not already 414 there. Nested "excerpt" commands are acceptable and 415 should be interpreted as meaning that the excerpted text 416 was excerpted from yet another source. Again, this can be 417 displayed using additional indentation, different colors, 418 etc. 420 Optionally, the "excerpt" command can take a parameter by 421 using the "param" command. The format of the data is 422 unspecified, but it is intended to uniquely identify the 423 text from which the excerpt is taken. With this 424 information, an implementation should be able to uniquely 425 identify the source of any particular excerpt, especially 426 if two or more excerpts in the message are from the same 427 source, and display it in some way that makes this 428 apparent to the user. 430 Lang 431 causes the affected text to be interpreted as belonging 432 to a particular language. This is most useful when two 433 different languages use the same character set, but may 434 require a different font or formatting depending on the 435 language. For instance, Chinese and Japanese share 436 similar character glyphs, and in some character sets like 437 UNICODE share common code points, but it is considered 438 very important that different fonts be used for the two 439 languages, especially if they appear together, so that 440 meaning is not lost. Also, language information can be 441 used to allow for fancier text handling, like spell 442 checking or hyphenation. 444 The "lang" command requires a parameter using the "param" 445 command. The parameter data can be any of the language 446 tags specified in [RFC-1766], "Tags for the 447 Identification of Languages". These tags are the two 448 letter language codes taken from [ISO-639] or can be 449 other language codes that are registered according to the 450 instructions in the Langauge Tags RFC. Consult that memo 451 for further information. 453 Balancing and Nesting of Formatting Commands 455 Pairs of formatting commands must be properly balanced and nested. Thus, 456 a proper way to describe text in bold italics is: 458 the-text 460 or, alternately, 462 the-text 464 but, in particular, the following is illegal text/enriched: 466 the-text 468 The nesting requirement for formatting commands imposes a slightly 469 higher burden upon the composers of text/enriched bodies, but 470 potentially simplifies text/enriched displayers by allowing them to be 471 stack-based. The main goal of text/enriched is to be simple enough to 472 make multifont, formatted email widely readable, so that those with the 473 capability of sending it will be able to do so with confidence. Thus 474 slightly increased complexity in the composing software was deemed a 475 reasonable tradeoff for simplified reading software. Nonetheless, 476 implementors of text/enriched readers are encouraged to follow the 477 general Internet guidelines of being conservative in what you send and 478 liberal in what you accept. Those implementations that can do so are 479 encouraged to deal reasonably with improperly nested text/enriched data. 481 Unrecognized formatting commands 483 Implementations must regard any unrecognized formatting command as 484 "no-op" commands, that is, as commands having no effect, thus 485 facilitating future extensions to "text/enriched". Private extensions 486 may be defined using formatting commands that begin with "X-", by 487 analogy to Internet mail header field names. 489 In order to formally define extended commands, a new Internet document 490 should be published. 492 White Space in Text/enriched Data 494 No special behavior is required for the SPACE or TAB (HT) character. It 495 is recommended, however, that, at least when fixed-width fonts are in 496 use, the common semantics of the TAB (HT) character should be observed, 497 namely that it moves to the next column position that is a multiple of 498 8. (In other words, if a TAB (HT) occurs in column n, where the leftmost 499 column is column 0, then that TAB (HT) should be replaced by 8-(n mod 8) 500 SPACE characters.) It should also be noted that some mail gateways are 501 notorious for losing (or, less commonly, adding) white space at the end 502 of lines, so reliance on SPACE or TAB characters at the end of a line is 503 not recommended. 505 Initial State of a text/enriched interpreter 507 Text/enriched is assumed to begin with filled text in a variable-width 508 font in a normal typeface and a size that is average for thecurrent 509 display and user. The left and right margins are assumed to be maximal, 510 that is, at the leftmost and rightmost acceptable positions. 512 Non-ASCII character sets 514 One of the great benefits of MIME is the ability to use different 515 varieties of non-ASCII text in messages. To use non-ASCII text in a 516 message, normally a charset parameter is specified in the Content-type 517 line that indicates the character set being used. For purposes of this 518 RFC, any legal MIME charset parameter can be used with the text/enriched 519 Content-type. However, there are two difficulties that arise with regard 520 to the text/enriched Content-type when non-ASCII text is desired. The 521 first problem involves difficulties that occur when the user wishes to 522 create text which would normally require multiple non-ASCII character 523 sets in the same text/enriched message. The second problem is an 524 ambiguity that arises because of the text/enriched use of the "<" 525 character in formatting commands. 527 Using multiple non-ASCII character sets 529 Normally, if a user wishes to produce text which contains characters 530 from entirely different character sets within the same MIME message (for 531 example, using Russian Cyrillic characters from ISO 8859-5 and Hebrew 532 characters from ISO 8859-8), a multipart message is used. Every time a 533 new character set is desired, a new MIME body part is started with 534 different character sets specified in the charset parameter of the 535 Content-type line. However, using multiple character sets this way in 536 text/enriched messages introduces problems. Since a change in the 537 charset parameter requires a new part, text/enriched formatting commands 538 used in the first part would not be able to apply to text that occurs in 539 subsequent parts. It is not possible for text/enriched formatting 540 commands to apply across MIME body part boundaries. 542 [RFC-1341] attempted to get around this problem in the now obsolete 543 text/richtext format by introducing different character set formatting 544 commands like "iso-8859-5" and "us-ascii". But this, or even a more 545 general solution along the same lines, is still undesirable: It is 546 common for a MIME application to decide, for example, what character 547 font resources or character lookup tables it will require based on the 548 information provided by the charset parameter of the Content-type line, 549 before it even begins to interpret or display the data in that body 550 part. By allowing the text/enriched interpreter to subsequently change 551 the character set, perhaps to one completely different from the charset 552 specified in the Content-type line (with potentially much different 553 resource requirements), too much burden would be placed on the 554 text/enriched interpreter itself. 556 Therefore, if multiple types of non-ASCII characters are desired in a 557 text/enriched document, one of the following two methods must be used: 559 1. For cases where the different types of non-ASCII text can be 560 limited to their own paragraphs with distinct formatting, a 561 multipart message can be used with each part having a Content-Type 562 of text/enriched and a different charset parameter. The one caveat 563 to using this method is that each new part must start in the 564 initial state for a text/enriched document. That means that all of 565 the text/enriched commands in the preceding part must be properly 566 balanced with ending commands before the next text/enriched part 567 begins. Also, each text/enriched part must begin a new paragraph. 569 2. If different types of non-ASCII text are to appear in the same line 570 or paragraph, or if text/enriched formatting (e.g. margins, 571 typeface, justification) is required across several different types 572 of non-ASCII text, a single text/enriched body part should be used 573 with a character set specified that contains all of the required 574 characters. For example, a charset parameter of "UNICODE-1-1-UTF-7" 575 as specified in [RFC-1642] could be used for such purposes. Not 576 only does UNICODE contain all of the characters that can be 577 represented in all of the other registered ISO 8859 MIME character 578 sets, but UTF-7 is fully compatible with other aspects of the 579 text/enriched standard, including the use of the "<" character 580 referred to below. Any other character sets that are specified for 581 use in MIME which contain different types of non-ASCII text can 582 also be used in these instances. 584 Use of the "<" character in formatting commands 586 If the character set specified by the charset parameter on the 587 Content-type line is anything other than "US- ASCII", this means that 588 the text being described by text/enriched formatting commands is in a 589 non-ASCII character set. However, the commands themselves are still the 590 same ASCII commands that are defined in this document. This creates an 591 ambiguity only with reference to the "<" character, the octet with 592 numeric value 60. In single byte character sets, such as the ISO-8859 593 family, this is not a problem; the octet 60 can be quoted by including 594 it twice, just as for ASCII. The problem is more complicated, however, 595 in the case of multi-byte character sets, where the octet 60 might 596 appear at any point in the byte sequence for any of several characters. 598 In practice, however, most multi-byte character sets address this 599 problem internally. For example, the UNICODE character sets can use the 600 UTF-7 encoding which preserves all of the important ASCII characters in 601 their single byte form. The ISO-2022 family of character sets can use 602 certain character sequences to switch back into ASCII at any moment. 603 Therefore it is specified that, before text/enriched formatting 604 commands, the prevailing character set should be "switched back" into 605 ASCII, and that only those characters which would be interpreted as "<" 606 in plain text should be interpreted as token delimiters in 607 text/enriched. 609 The question of what to do for hypothetical future character sets that 610 do not subsume ASCII is not addressed in this memo. 612 Minimal text/enriched conformance 614 A minimal text/enriched implementation is one that converts "<<" to "<", 615 removes everything between a command and the next balancing 616 command, removes all other formatting commands (all text 617 enclosed in angle brackets), and, outside of environments, 618 converts any series of n CRLFs to n-1 CRLFs, and converts any lone CRLF 619 pairs to SPACE. 621 Notes for Implementors 623 It is recognized that implementors of future mail systems will want rich 624 text functionality far beyond that currently defined for text/enriched. 625 The intent of text/enriched is to provide a common format for expressing 626 that functionality in a form in which much of it, at least, will be 627 understood by interoperating software. Thus, in particular, software 628 with a richer notion of formatted text than text/enriched can still use 629 text/enriched as its basic representation, but can extend it with new 630 formatting commands and by hiding information specific to that software 631 system in text/enriched constructs. As such systems evolve, it 632 is expected that the definition of text/enriched will be further refined 633 by future published specifications, but text/enriched as defined here 634 provides a platform on which evolutionary refinements can be based. 636 An expected common way that sophisticated mail programs will generate 637 text/enriched data is as part of a multipart/alternative construct. For 638 example, a mail agent that can generate enriched mail in ODA format can 639 generate that mail in a more widely interoperable form by generating 640 both text/enriched and ODA versions of the same data, e.g.: 642 Content-type: multipart/alternative; boundary=foo 644 --foo 645 Content-type: text/enriched 647 [text/enriched version of data] 648 --foo Content-type: application/oda 650 [ODA version of data] 651 --foo-- 653 If such a message is read using a MIME-conformant mail reader that 654 understands ODA, the ODA version will be displayed; otherwise, the 655 text/enriched version will be shown. 657 In some environments, it might be impossible to combine certain 658 text/enriched formatting commands, whereas in others they might be 659 combined easily. For example, the combination of and 660 might produce bold italics on systems that support such fonts, but there 661 exist systems that can make text bold or italicized, but not both. In 662 such cases, the most recently issued (innermost) recognized formatting 663 command should be preferred. 665 One of the major goals in the design of text/enriched was to make it so 666 simple that even text-only mailers will implement enriched-to- 667 plain-text translators, thus increasing the likelihood that enriched 668 text will become "safe" to use very widely. To demonstrate this 669 simplicity, an extremely simple C program that converts text/enriched 670 input into plain text output is included in Appendix A. 672 Extensions to text/enriched 674 It is expected that various mail system authors will desire extensions 675 to text/enriched. The simple syntax of text/enriched, and the 676 specification that unrecognized formatting commands should simply be 677 ignored, are intended to promote such extensions. 679 An Example 681 Putting all this together, the following "text/enriched" body fragment: 683 From: Nathaniel Borenstein 684 To: Ned Freed 685 Content-type: text/enriched 687 Now is the time for all 688 good men 689 (and <) to 690 come 692 to the aid of their 694 redbeloved 695 country. 697 By the way, 698 I think that left< 700 should REALLY be called 702 left< 703 and that I am always right. 705 -- the end 707 represents the following formatted text (which will, no doubt, look 708 somewhat cryptic in the text-only version of this document): 710 Now is the time for all good men (and ) to come 711 to the aid of their 713 beloved country. 714 By the way, I think that 715 716 should REALLY be called 717 718 and that I am always right. 719 -- the end 721 where the word "beloved" would be in red on a color display. 723 Security Considerations 725 Security issues are not discussed in this memo, as the mechanism raises 726 no security issues. 728 Author's Address 730 For more information, the authors of this document may be contacted via 731 Internet mail: 733 Peter W. Resnick 734 QUALCOMM Incorporated 735 6455 Lusk Boulevard 736 San Diego, CA 92121-2779 737 Phone: +1 619 587 1121 738 FAX: +1 619 658 2230 739 e-mail: presnick@qualcomm.com 741 Amanda Walker 742 InterCon Systems Corporation 743 950 Herndon Parkway 744 Herndon, VA 22070 745 Phone: +1 703 709 5500 746 FAX: +1 703 709 5555 747 e-mail: amanda@intercon.com 749 Acknowledgements 751 The authors gratefully acknowledge the input of many contributors, 752 readers, and implementors of the specification in this document. 753 Particular thanks are due to Nathaniel Borenstein, the original author 754 of RFC 1563. 756 References 758 [RFC-1341] 759 Borenstein, N., Freed, N., "MIME (Multipurpose Internet Mail 760 Extensions): Mechanisms for Specifying and Describing the Format of 761 Internet Message Bodies", 06/11/1992. 763 [RFC-1521] 764 Borenstein, N., Freed, N., "MIME (Multipurpose Internet Mail 765 Extensions) Part One: Mechanisms for Specifying and Describing the 766 Format of Internet Message Bodies", 09/23/1993. 768 [RFC-1523] 769 Borenstein, N., "The text/enriched MIME Content-type", 09/23/1993. 771 [RFC-1563] 772 Borenstein, N., "The text/enriched MIME Content-type", 01/10/1994. 774 [RFC-1642] 775 Goldsmith, D., Davis, M., "UTF-7 - A Mail-Safe Transformation 776 Format of Unicode", 07/13/1994. 778 [RFC-1766] 779 Alvestrand, H., "Tags for the Identification of Languages", 780 03/02/1995. 782 [RFC-1866] 783 Berners-Lee, T., Connolly, D., "Hypertext Markup Language - 2.0", 784 11/03/1995. 786 Appendix A--A Simple enriched-to-plain Translator in C 788 One of the major goals in the design of the text/enriched subtype of the 789 text Content-Type is to make formatted text so simple that even 790 text-only mailers will implement enriched-to-plain-text translators, 791 thus increasing the likelihood that multifont text will become "safe" to 792 use very widely. To demonstrate this simplicity, what follows is a 793 simple C program that converts text/enriched input into plain text 794 output. Note that the local newline convention (the single character 795 represented by "\n") is assumed by this program, but that special CRLF 796 handling might be necessary on some systems. 798 #include 799 #include 800 #include 801 #include 803 main() { 804 int c, i, paramct=0, newlinect=0, nofill=0; 805 char token[62], *p; 807 while ((c=getc(stdin)) != EOF) { 808 if (c == '<') { 809 if (newlinect == 1) putc(' ', stdout); 810 newlinect = 0; 811 c = getc(stdin); 812 if (c == '<') { 813 if (paramct <= 0) putc(c, stdout); 814 } else { 815 ungetc(c, stdin); 816 for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) { 817 if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c; 818 } 819 *p = '\0'; 820 if (c == EOF) break; 821 if (strcmp(token, "param") == 0) 822 paramct++; 823 else if (strcmp(token, "nofill") == 0) 824 nofill++; 825 else if (strcmp(token, "/param") == 0) 826 paramct--; 827 else if (strcmp(token, "/nofill") == 0) 828 nofill--; 829 } 830 } else { 831 if (paramct > 0) 832 ; /* ignore params */ 833 else if (c == '\n' && nofill <= 0) { 834 if (++newlinect > 1) putc(c, stdout); 835 } else { 836 if (newlinect == 1) putc(' ', stdout); 837 newlinect = 0; 838 putc(c, stdout); 839 } 840 } 841 } 842 /* The following line is only needed with line-buffering */ 843 putc('\n', stdout); 844 exit(0); 845 } 847 It should be noted that one can do considerably better than this in 848 displaying text/enriched data on a dumb terminal. In particular, one can 849 replace font information such as "bold" with textual emphasis (like 850 *this* or _T_H_I_S_). One can also properly handle the text/enriched 851 formatting commands regarding indentation, justification, and others. 852 However, the above program is all that is necessary in order to present 853 text/enriched on a dumb terminal without showing the user any formatting 854 artifacts. 856 Appendix B--A Simple enriched-to-HTML Translator in C 858 It is fully expected that other text formatting standards like HTML and 859 SGML will supplant text/enriched in Internet mail. It is also likely 860 that as this happens, recipients of text/enriched mail will wish to view 861 such mail with an HTML viewer. To this end, the following is a simple 862 example of a C program to convert text/enriched to HTML. Since the 863 current version of HTML at the time of this document's publication is 864 HTML 2.0 defined in [RFC-1866], this program converts to that standard. 865 There are several text/enriched commands that have no HTML 2.0 866 equivalent. In those cases, this program simply puts those commands into 867 processing instructions; that is, surrounded by "". As in 868 Appendix A, the local newline convention (the single character 869 represented by "\n") is assumed by this program, but special CRLF 870 handling might be necessary on some systems. 872 #include 873 #include 874 #include 875 #include 877 main() { 878 int c, i, paramct=0, nofill=0; 879 char token[62], *p; 881 while((c=getc(stdin)) != EOF) { 882 if(c == '<') { 883 c = getc(stdin); 884 if(c == '<') { 885 fputs("<", stdout); 886 } else { 887 ungetc(c, stdin); 888 for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) { 889 if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c; 890 } 891 *p = '\0'; 892 if(c == EOF) break; 893 if(strcmp(token, "/param") == 0) { 894 paramct--; 895 putc('>', stdout); 896 } else if(paramct > 0) { 897 fputs("<", stdout); 898 fputs(token, stdout); 899 fputs(">", stdout); 900 } else { 901 putc('<', stdout); 902 if(strcmp(token, "nofill") == 0) { 903 nofill++; 904 fputs("pre", stdout); 905 } else if(strcmp(token, "/nofill") == 0) { 906 nofill--; 907 fputs("/pre", stdout); 908 } else if(strcmp(token, "bold") == 0) { 909 fputs("b", stdout); 910 } else if(strcmp(token, "/bold") == 0) { 911 fputs("/b", stdout); 912 } else if(strcmp(token, "italic") == 0) { 913 fputs("i", stdout); 914 } else if(strcmp(token, "/italic") == 0) { 915 fputs("/i", stdout); 916 } else if(strcmp(token, "fixed") == 0) { 917 fputs("tt", stdout); 918 } else if(strcmp(token, "/fixed") == 0) { 919 fputs("/tt", stdout); 920 } else if(strcmp(token, "excerpt") == 0) { 921 fputs("blockquote", stdout); 922 } else if(strcmp(token, "/excerpt") == 0) { 923 fputs("/blockquote", stdout); 924 } else { 925 putc('?', stdout); 926 fputs(token, stdout); 927 if(strcmp(token, "param") == 0) { 928 paramct++; 929 putc(' ', stdout); 930 continue; 931 } 932 } 933 putc('>', stdout); 934 } 935 } 936 } else if(c == '>') { 937 fputs(">", stdout); 938 } else { 939 if(c == '\n' && nofill <= 0 && paramct <= 0) { 940 while((i=getc(stdin)) == '\n') fputs("
", stdout); 941 ungetc(i, stdin); 942 } 943 putc(c, stdout); 944 } 945 } 946 /* The following line is only needed with line-buffering */ 947 putc('\n', stdout); 948 exit(0); 949 }