< draft-resnick-text-enriched-01.txt   draft-resnick-text-enriched-02.txt >
Network Working Group P. Resnick Network Working Group P. Resnick
INTERNET-DRAFT A. Walker INTERNET-DRAFT QUALCOMM
To-obsolete RFCs: 1523, 1563 December 1995 To-obsolete RFCs: 1523, 1563 A. Walker
Category: Informational <draft-resnick-text-enriched-01.txt> Category: Informational InterCon
January 1996
<draft-resnick-text-enriched-02.txt>
The text/enriched MIME Content-type The text/enriched MIME Content-type
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas, and
and its working groups. Note that other groups may also distribute its working groups. Note that other groups may also distribute working
working documents as Internet-Drafts. documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six months
months and may be updated, replaced, or obsoleted by other documents and may be updated, replaced, or obsoleted by other documents at any
at any time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference material
material or to cite them other than as "work in progress." or to cite them other than as "work in progress."
To learn the current status of any Internet-Draft, please check the To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast). ftp.isi.edu (US West Coast).
Abstract Abstract
MIME [RFC-1521] defines a format and general framework for the MIME [RFC-1521] defines a format and general framework for the
representation of a wide variety of data types in Internet mail. representation of a wide variety of data types in Internet mail. This
This document defines one particular type of MIME data, the document defines one particular type of MIME data, the text/enriched
text/enriched MIME type. The text/enriched MIME type is intended to MIME type. The text/enriched MIME type is intended to facilitate the
facilitate the wider interoperation of simple enriched text across a wider interoperation of simple enriched text across a wide variety of
wide variety of hardware and software platforms. This document is hardware and software platforms. This document is only a minor revision
only a minor revision to the text/enriched MIME type that was first to the text/enriched MIME type that was first described in [RFC-1523]
described in [RFC-1523] and [RFC-1563], and is only intended to be and [RFC-1563], and is only intended to be used in the short term until
used in the short term until other MIME types for text formatting in other MIME types for text formatting in Internet mail are developed and
Internet mail are developed and deployed. deployed.
The text/enriched MIME type The text/enriched MIME type
In order to promote the wider interoperability of simple formatted In order to promote the wider interoperability of simple formatted text,
text, this document defines an extremely simple subtype of the MIME this document defines an extremely simple subtype of the MIME
content-type "text", the "text/enriched" subtype. The content-type content-type "text", the "text/enriched" subtype. The content-type line
line for this type may have one optional parameter, the "charset" for this type may have one optional parameter, the "charset" parameter,
parameter, with the same values permitted for the "text/plain" MIME with the same values permitted for the "text/plain" MIME content-type.
content-type.
The text/enriched subtype was designed to meet the following The text/enriched subtype was designed to meet the following criteria:
criteria:
1. The syntax must be extremely simple to parse, so that even 1. The syntax must be extremely simple to parse, so that even
teletype-oriented mail systems can easily strip away the teletype-oriented mail systems can easily strip away the formatting
formatting information and leave only the readable text. information and leave only the readable text.
2. The syntax must be extensible to allow for new formatting 2. The syntax must be extensible to allow for new formatting commands
commands that are deemed essential for some application. that are deemed essential for some application.
3. If the character set in use is ASCII or an 8- bit ASCII 3. If the character set in use is ASCII or an 8- bit ASCII superset,
superset, then the raw form of the data must be readable enough then the raw form of the data must be readable enough to be largely
to be largely unobjectionable in the event that it is displayed unobjectionable in the event that it is displayed on the screen of
on the screen of the user of a non-MIME-conformant mail reader. the user of a non-MIME-conformant mail reader.
4. The capabilities must be extremely limited, to ensure that it 4. The capabilities must be extremely limited, to ensure that it can
can represent no more than is likely to be representable by the represent no more than is likely to be representable by the user's
user's primary word processor. While this limits what can be primary word processor. While this limits what can be sent, it
sent, it increases the likelihood that what is sent can be increases the likelihood that what is sent can be properly
properly displayed. displayed.
There are other text formatting standards which meet some of these There are other text formatting standards which meet some of these
criteria. In particular, HTML and SGML have come into widespread use criteria. In particular, HTML and SGML have come into widespread use on
on the Internet. However, there are two important reasons that this the Internet. However, there are two important reasons that this
document further promotes the use of text/enriched in Internet mail document further promotes the use of text/enriched in Internet mail over
over other such standards: other such standards:
1. Most MIME-aware Internet mail applications are already able to 1. Most MIME-aware Internet mail applications are already able to
either properly format text/enriched mail or, at the very either properly format text/enriched mail or, at the very least,
least, are able to strip out the formatting commands and are able to strip out the formatting commands and display the
display the readable text. The same is not true for HTML or readable text. The same is not true for HTML or SGML.
SGML.
2. The current RFC on HTML [RFC-1866] and Internet Drafts on SGML 2. The current RFC on HTML [RFC-1866] and Internet Drafts on SGML have
have many features which are not necessary for Internet mail, many features which are not necessary for Internet mail, and are
and are missing a few capabilities that text/enriched already missing a few capabilities that text/enriched already has.
has.
For these reasons, this document is promoting the use of For these reasons, this document is promoting the use of text/enriched
text/enriched until other Internet standards come into more until other Internet standards come into more widespread use. For those
widespread use. For those who will want to use HTML, Appendix B of who will want to use HTML, Appendix B of this document contains a very
this document contains a very simple C program that converts simple C program that converts text/enriched to HTML 2.0 described in
text/enriched to HTML 2.0 described in [RFC-1866]. [RFC-1866].
Syntax Syntax
The syntax of "text/enriched" is very simple. It represents text in The syntax of "text/enriched" is very simple. It represents text in a
a single character set--US-ASCII by default, although a different single character set--US-ASCII by default, although a different
character set can be specified by the use of the "charset" character set can be specified by the use of the "charset" parameter.
parameter. (The semantics of text/enriched in non-ASCII character (The semantics of text/enriched in non-ASCII character sets are
sets are discussed later in this document.) All characters represent discussed later in this document.) All characters represent themselves,
themselves, with the exception of the "<" character (ASCII 60), with the exception of the "<" character (ASCII 60), which is used to
which is used to mark the beginning of a formatting command. A mark the beginning of a formatting command. A literal less-than sign
literal less-than sign ("<") can be represented by a sequence of two ("<") can be represented by a sequence of two such characters, "<<".
such characters, "<<".
Formatting instructions consist of formatting commands surrounded by Formatting instructions consist of formatting commands surrounded by
angle brackets ("<>", ASCII 60 and 62). Each formatting command may angle brackets ("<>", ASCII 60 and 62). Each formatting command may be
be no more than 60 characters in length, all in US-ASCII, restricted no more than 60 characters in length, all in US-ASCII, restricted to the
to the alphanumeric and hyphen ("-") characters. Formatting commands alphanumeric and hyphen ("-") characters. Formatting commands may be
may be preceded by a solidus ("/", ASCII 47), making them negations, preceded by a solidus ("/", ASCII 47), making them negations, and such
and such negations must always exist to balance the initial opening negations must always exist to balance the initial opening commands.
commands. Thus, if the formatting command "<bold>" appears at some Thus, if the formatting command "<bold>" appears at some point, there
point, there must later be a "</bold>" to balance it. (NOTE: The 60 must later be a "</bold>" to balance it. (NOTE: The 60 character limit
character limit on formatting commands does NOT include the "<", on formatting commands does NOT include the "<", ">", or "/" characters
">", or "/" characters that might be attached to such commands.) that might be attached to such commands.)
Line break rules Line break rules
Line breaks (CRLF pairs in standard network representation) are Line breaks (CRLF pairs in standard network representation) are handled
handled specially. In particular, isolated CRLF pairs are translated specially. In particular, isolated CRLF pairs are translated into a
into a single SPACE character. Sequences of N consecutive CRLF single SPACE character. Sequences of N consecutive CRLF pairs, however,
pairs, however, are translated into N-1 actual line breaks. This are translated into N-1 actual line breaks. This permits long lines of
permits long lines of data to be represented in a natural looking data to be represented in a natural looking manner despite the frequency
manner despite the frequency of line-wrapping in Internet mailers. of line-wrapping in Internet mailers. When preparing the data for mail
When preparing the data for mail transport, isolated line breaks transport, isolated line breaks should be inserted wherever necessary to
should be inserted wherever necessary to keep each line shorter than keep each line shorter than 80 characters. When preparing such data for
80 characters. When preparing such data for presentation to the presentation to the user, isolated line breaks should be replaced by a
user, isolated line breaks should be replaced by a single SPACE single SPACE character, and N consecutive CRLF pairs should be presented
character, and N consecutive CRLF pairs should be presented to the to the user as N-1 line breaks.
user as N-1 line breaks.
Thus text/enriched data that looks like this: Thus text/enriched data that looks like this:
This is This is
a single a single
line line
This is the This is the
next line. next line.
skipping to change at line 155 skipping to change at line 151
This is a single line This is a single line
This is the next line. This is the next line.
This is the next section. This is the next section.
The formatting commands, not all of which will be implemented by all The formatting commands, not all of which will be implemented by all
implementations, are described in the following sections. implementations, are described in the following sections.
Formatting Commands Formatting Commands
The text/enriched formatting commands all begin with <commandname> The text/enriched formatting commands all begin with <commandname> and
and end with </commandname>, affecting the formatting of the text end with </commandname>, affecting the formatting of the text between
between those two tokens. The commands are described here, grouped those two tokens. The commands are described here, grouped according to
according to type. type.
Parameter Command Parameter Command
Some of the formatting commands may require one or more associated Some of the formatting commands may require one or more associated
parameters. The "param" command is a special formatting command used parameters. The "param" command is a special formatting command used to
to include these parameters. include these parameters.
Param Param
Marks the affected text as command parameters, to be Marks the affected text as command parameters, to be
interpreted or ignored by the text/enriched interpreted or ignored by the text/enriched interpreter,
interpreter, but not to be shown to the reader. The but not to be shown to the reader. The "param" command
"param" command always immediately follows some other always immediately follows some other formatting command,
formatting command, and the parameter data indicates and the parameter data indicates some additional
some additional information about the formatting that information about the formatting that is to be done. The
is to be done. The syntax of the parameter data syntax of the parameter data (whatever appears between
(whatever appears between the initial "<param>" and the initial "<param>" and the terminating "</param>") is
the terminating "</param>") is defined for each defined for each command that uses it. However, it is
command that uses it. However, it is always required always required that the format of such data must not
that the format of such data must not contain nested contain nested "param" commands, and either must not use
"param" commands, and either must not use the "<" the "<" character or must use it in a way that is
character or must use it in a way that is compatible compatible with text/enriched parsing. That is, the end
with text/enriched parsing. That is, the end of the of the parameter data should be recognizable with either
parameter data should be recognizable with either of of two algorithms: simply searching for the first
two algorithms: simply searching for the first
occurrence of "</param>" or parsing until a balanced occurrence of "</param>" or parsing until a balanced
"</param>" command is found. In either case, however, "</param>" command is found. In either case, however, the
the parameter data should not be shown to the human parameter data should not be shown to the human reader.
reader.
Font-Alteration Commands Font-Alteration Commands
The following formatting commands are intended to alter the font in The following formatting commands are intended to alter the font in
which text is displayed, but not to alter the indentation or which text is displayed, but not to alter the indentation or
justification state of the text: justification state of the text:
Bold Bold
causes the affected text to be in a bold font. Nested causes the affected text to be in a bold font. Nested
bold commands have the same effect as a single bold bold commands have the same effect as a single bold
command. command.
Italic Italic
causes the affected text to be in an italic font. causes the affected text to be in an italic font. Nested
Nested italic commands have the same effect as a italic commands have the same effect as a single italic
single italic command. command.
Underline Underline
causes the affected text to be underlined. Nested causes the affected text to be underlined. Nested
underline commands have the same effect as a single underline commands have the same effect as a single
underline command. underline command.
Fixed Fixed
causes the affected text to be in a fixed width font. causes the affected text to be in a fixed width font.
Nested fixed commands have the same effect as a Nested fixed commands have the same effect as a single
single fixed command. fixed command.
FontFamily FontFamily
causes the affected text to be displayed in a causes the affected text to be displayed in a specified
specified typeface. The "fontfamily" command requires typeface. The "fontfamily" command requires a parameter
a parameter that is specified by using the "param" that is specified by using the "param" command. The
command. The parameter data is a case-insensitive parameter data is a case-insensitive string containing
string containing the name of a font family. Any the name of a font family. Any currently available font
currently available font family name (e.g. Times, family name (e.g. Times, Palatino, Courier, etc.) may be
Palatino, Courier, etc.) may be used. This includes used. This includes font families defined by commercial
font families defined by commercial type foundries type foundries such as Adobe, BitStream, or any other
such as Adobe, BitStream, or any other such foundry. such foundry. Note that implementations should only use
Note that implementations should only use the general the general font family name, not the specific font name
font family name, not the specific font name (e.g. (e.g. use "Times", not "TimesRoman" nor
use "Times", not "TimesRoman" nor "TimesBoldItalic"). "TimesBoldItalic"). When nested, the inner "fontfamily"
When nested, the inner "fontfamily" command takes command takes precedence. Also note that the "fontfamily"
precedence. Also note that the "fontfamily" command command is advisory only; it should not be expected that
is advisory only; it should not be expected that other implementations will honor the typeface information
other implementations will honor the typeface in this command since the font capabilities of systems
information in this command since the font vary drastically.
capabilities of systems vary drastically.
Color Color
causes the affected text to be displayed in a causes the affected text to be displayed in a specified
specified color. The "color" command requires a color. The "color" command requires a parameter that is
parameter that is specified by using the "param" specified by using the "param" command. The parameter
command. The parameter data can be one of the data can be one of the following:
following:
red red
blue blue
green green
yellow yellow
cyan cyan
magenta magenta
black black
white white
or an RGB color value in the form: or an RGB color value in the form:
####,####,#### ####,####,####
where '#' is a hexadecimal digit '0' through '9', 'A' where '#' is a hexadecimal digit '0' through '9', 'A'
through 'F', or 'a' through 'f'. The three 4-digit through 'F', or 'a' through 'f'. The three 4-digit
hexadecimal values are the RGB values for red, green, hexadecimal values are the RGB values for red, green, and
and blue respectively, where each component is blue respectively, where each component is expressed as
expressed as an unsigned value between 0 (0000) and an unsigned value between 0 (0000) and 65535 (FFFF). The
65535 (FFFF). The default color for the message is default color for the message is unspecified, though
unspecified, though black is a common choice in many black is a common choice in many environments. When
environments. When nested, the inner "color" command nested, the inner "color" command takes precedence.
takes precedence.
Smaller Smaller
causes the affected text to be in a smaller font. It causes the affected text to be in a smaller font. It is
is recommended that the font size be changed by two recommended that the font size be changed by two points,
points, but other amounts may be more appropriate in but other amounts may be more appropriate in some
some environments. Nested smaller commands produce environments. Nested smaller commands produce ever
ever smaller fonts, to the limits of the smaller fonts, to the limits of the implementation's
implementation's capacity to reasonably display them, capacity to reasonably display them, after which further
after which further smaller commands have no smaller commands have no incremental effect.
incremental effect.
Bigger Bigger
causes the affected text to be in a bigger font. It causes the affected text to be in a bigger font. It is
is recommended that the font size be changed by two recommended that the font size be changed by two points,
points, but other amounts may be more appropriate in but other amounts may be more appropriate in some
some environments. Nested bigger commands produce environments. Nested bigger commands produce ever bigger
ever bigger fonts, to the limits of the fonts, to the limits of the implementation's capacity to
implementation's capacity to reasonably display them, reasonably display them, after which further bigger
after which further bigger commands have no commands have no incremental effect.
incremental effect.
While the "bigger" and "smaller" operators are effectively inverses, While the "bigger" and "smaller" operators are effectively inverses, it
it is not recommended, for example, that "<smaller>" be used to end is not recommended, for example, that "<smaller>" be used to end the
the effect of "<bigger>". This is properly done with "</bigger>". effect of "<bigger>". This is properly done with "</bigger>".
Since the capabilities of implementations will vary, it is to be Since the capabilities of implementations will vary, it is to be
expected that some implementations will not be able to act on some expected that some implementations will not be able to act on some of
of the font-alteration commands. However, an implementation should the font-alteration commands. However, an implementation should still
still display the text to the user in a reasonable fashion. In display the text to the user in a reasonable fashion. In particular, the
particular, the lack of capability to display a particular font lack of capability to display a particular font family, color, or other
family, color, or other text attribute does not mean that an text attribute does not mean that an implementation should fail to
implementation should fail to display text. display text.
Fill/Justification/Indentation Commands Fill/Justification/Indentation Commands
Initially, text/enriched text is intended to be displayed fully Initially, text/enriched text is intended to be displayed fully filled
filled (that is, using the rules specified for replacing CRLF pairs (that is, using the rules specified for replacing CRLF pairs with spaces
with spaces or removing them as appropriate) with appropriate or removing them as appropriate) with appropriate kerning and
kerning and letter-tracking, and using the maximum available margins letter-tracking, and using the maximum available margins as suits the
as suits the capabilities of the receiving user agent software. capabilities of the receiving user agent software.
The following commands alter that state. Each of these commands The following commands alter that state. Each of these commands force a
force a line break before and after the formatting environment if line break before and after the formatting environment if there is not
there is not otherwise a line break. For example, if one of these otherwise a line break. For example, if one of these commands occurs
commands occurs anywhere other than the beginning of a line of text anywhere other than the beginning of a line of text as presented, a new
as presented, a new line is begun. line is begun.
Center Center
causes the affected text to be centered. causes the affected text to be centered.
FlushLeft FlushLeft
causes the affected text to be left-justified with a causes the affected text to be left-justified with a
ragged right margin. ragged right margin.
FlushRight FlushRight
causes the affected text to be right-justified with a causes the affected text to be right-justified with a
ragged left margin. ragged left margin.
FlushBoth FlushBoth
causes the affected text to be filled and padded so causes the affected text to be filled and padded so as to
as to create smooth left and right margins, i.e., to create smooth left and right margins, i.e., to be fully
be fully justified. justified.
ParaIndent ParaIndent
causes the running margins of the affected text to be causes the running margins of the affected text to be
moved in. The recommended indentation change is the moved in. The recommended indentation change is the width
width of four characters, but this may differ among of four characters, but this may differ among
implementations. The "paraindent" command requires a implementations. The "paraindent" command requires a
parameter that is specified by using the "param" parameter that is specified by using the "param" command.
command. The parameter data is a comma-seperated list The parameter data is a comma-seperated list of one or
of one or more of the following: more of the following:
Left Left
causes the running left margin to be moved to causes the running left margin to be moved to the
the right. right.
Right Right
causes the running right margin to be moved to causes the running right margin to be moved to the
the left. left.
In In
causes the first line of the affected paragraph causes the first line of the affected paragraph to
to be indented in addition to the running be indented in addition to the running margin. The
margin. The remaining lines remain flush to the remaining lines remain flush to the running margin.
running margin.
Out Out
causes all lines except for the first line of causes all lines except for the first line of the
the affected paragraph to be indented in affected paragraph to be indented in addition to the
addition to the running margin. The first line running margin. The first line remains flush to the
remains flush to the running margin. running margin.
Nofill Nofill
causes the affected text to be displayed without causes the affected text to be displayed without filling.
filling. That is, the text is displayed without using That is, the text is displayed without using the rules
the rules for replacing CRLF pairs with spaces or for replacing CRLF pairs with spaces or removing
removing consecutive sequences of CRLF pairs. consecutive sequences of CRLF pairs. However, the current
However, the current state of the margins and state of the margins and justification is honored; any
justification is honored; any indentation or indentation or justification commands are still applied
justification commands are still applied to the text to the text within the scope of the "nofill".
within the scope of the "nofill".
The "center", "flushleft", "flushright", and "flushboth" commands The "center", "flushleft", "flushright", and "flushboth" commands are
are mutually exclusive, and, when nested, the inner command takes mutually exclusive, and, when nested, the inner command takes
precedence. precedence.
The "nofill" command is mutually exclusive with the "in" and "out" The "nofill" command is mutually exclusive with the "in" and "out"
parameters of the "paraindent" command; when they occur in the same parameters of the "paraindent" command; when they occur in the same
scope, their behavior is undefined. scope, their behavior is undefined.
The parameter data for the "paraindent" command my contain multiple The parameter data for the "paraindent" command my contain multiple
occurances of the same parameter (i.e. "left", "right", "in", or occurances of the same parameter (i.e. "left", "right", "in", or "out").
"out"). Each occurance causes the text to be further indented in the Each occurance causes the text to be further indented in the manner
manner indicated by that parameter. Nested "paraindent" commands indicated by that parameter. Nested "paraindent" commands cause the
cause the affected text to be further indented according to the affected text to be further indented according to the parameters. Note
parameters. Note that the "in" and "out" parameters for "paraindent" that the "in" and "out" parameters for "paraindent" are mutually
are mutually exclusive; when they appear together or when nested exclusive; when they appear together or when nested "paraindent"
"paraindent" commands contain both of them, their behavior is commands contain both of them, their behavior is undefined.
undefined.
For purposes of the "in" and "out" parameters, a paragraph is For purposes of the "in" and "out" parameters, a paragraph is defined as
defined as text that is delimited by line breaks after applying the text that is delimited by line breaks after applying the rules for
rules for replacing CRLF pairs with spaces or removing consecutive replacing CRLF pairs with spaces or removing consecutive sequences of
sequences of CRLF pairs. For example, within the scope of an "out", CRLF pairs. For example, within the scope of an "out", the line
the line following each CRLF is made flush with the running margin, following each CRLF is made flush with the running margin, and
and subsequent lines are indented. Within the scope of an "in", the subsequent lines are indented. Within the scope of an "in", the first
first line following each CRLF is indented, and subsequent lines line following each CRLF is indented, and subsequent lines remain flush
remain flush to the running margin. to the running margin.
Whether or not text is justified by default (that is, whether the Whether or not text is justified by default (that is, whether the
default environment is "flushleft", "flushright", or "flushboth") is default environment is "flushleft", "flushright", or "flushboth") is
unspecified, and depends on the preferences of the user, the unspecified, and depends on the preferences of the user, the
capabilities of the local software and hardware, and the nature of capabilities of the local software and hardware, and the nature of the
the character set in use. On systems where full justification is character set in use. On systems where full justification is considered
considered undesirable, the "flushboth" environment may be identical undesirable, the "flushboth" environment may be identical to the default
to the default environment. Note that full justification should environment. Note that full justification should never be performed
never be performed inside of "center", "flushleft", "flushright", or inside of "center", "flushleft", "flushright", or "nofill" environments.
"nofill" environments. Note also that for some non-ASCII character Note also that for some non-ASCII character sets, full justification may
sets, full justification may be fundamentally inappropriate. be fundamentally inappropriate.
Note that [RFC-1563] defined two additional indentation commands, Note that [RFC-1563] defined two additional indentation commands,
"Indent" and "IndentRight". These commands did not force a line "Indent" and "IndentRight". These commands did not force a line break,
break, and therefore their behavior was unpredictable since they and therefore their behavior was unpredictable since they depended on
depended on the margins and character sizes that a particular the margins and character sizes that a particular implementation used.
implementation used. Therefore, their use is deprecated and they Therefore, their use is deprecated and they should be ignored just as
should be ignored just as other unrecognized commands. other unrecognized commands.
Markup Commands Markup Commands
Commands in this section, unlike the other text/enriched commands Commands in this section, unlike the other text/enriched commands are
are declarative markup commands. Text/enriched is not intended as a declarative markup commands. Text/enriched is not intended as a full
full markup language, but instead as a simple way to represent markup language, but instead as a simple way to represent common
common formatting commands. Therefore, markup commands are purposely formatting commands. Therefore, markup commands are purposely kept to a
kept to a minimum. It is only because each was deemed so prevalent minimum. It is only because each was deemed so prevalent or necessary in
or necessary in an e-mail environment that these particular commands an e-mail environment that these particular commands have been included
have been included at all. at all.
Excerpt Excerpt
causes the affected text to be interpreted as a causes the affected text to be interpreted as a textual
textual excerpt from another source, probably a excerpt from another source, probably a message being
message being responded to. Typically this will be responded to. Typically this will be displayed using
displayed using indentation and an alternate font, or indentation and an alternate font, or by indenting lines
by indenting lines and preceding them with "> ", but and preceding them with "> ", but such decisions are up
such decisions are up to the implementation. Note to the implementation. Note that as with the
that as with the justification commands, the excerpt justification commands, the excerpt command implicitly
command implicitly begins and ends with a line break begins and ends with a line break if one is not already
if one is not already there. Nested "excerpt" there. Nested "excerpt" commands are acceptable and
commands are acceptable and should be interpreted as should be interpreted as meaning that the excerpted text
meaning that the excerpted text was excerpted from was excerpted from yet another source. Again, this can be
yet another source. Again, this can be displayed displayed using additional indentation, different colors,
using additional indentation, different colors, etc. etc.
Optionally, the "excerpt" command can take a Optionally, the "excerpt" command can take a parameter by
parameter by using the "param" command. The format of using the "param" command. The format of the data is
the data is unspecified, but it is intended to unspecified, but it is intended to uniquely identify the
uniquely identify the text from which the excerpt is text from which the excerpt is taken. With this
taken. With this information, an implementation information, an implementation should be able to uniquely
should be able to uniquely identify the source of any identify the source of any particular excerpt, especially
particular excerpt, especially if two or more if two or more excerpts in the message are from the same
excerpts in the message are from the same source, and source, and display it in some way that makes this
display it in some way that makes this apparent to apparent to the user.
the user.
Lang Lang
causes the affected text to be interpreted as causes the affected text to be interpreted as belonging
belonging to a particular language. This is most to a particular language. This is most useful when two
useful when two different languages use the same different languages use the same character set, but may
character set, but may require a different font or require a different font or formatting depending on the
formatting depending on the language. For instance, language. For instance, Chinese and Japanese share
Chinese and Japanese share similar character glyphs, similar character glyphs, and in some character sets like
and in some character sets like UNICODE share common UNICODE share common code points, but it is considered
code points, but it is considered very important that very important that different fonts be used for the two
different fonts be used for the two languages, languages, especially if they appear together, so that
especially if they appear together, so that meaning meaning is not lost. Also, language information can be
is not lost. Also, language information can be used used to allow for fancier text handling, like spell
to allow for fancier text handling, like spell
checking or hyphenation. checking or hyphenation.
The "lang" command requires a parameter using the The "lang" command requires a parameter using the "param"
"param" command. The parameter data can be any of the command. The parameter data can be any of the language
language tags specified in [RFC-1766], "Tags for the tags specified in [RFC-1766], "Tags for the
Identification of Languages". These tags are the two Identification of Languages". These tags are the two
letter language codes taken from [ISO-639] or can be letter language codes taken from [ISO-639] or can be
other language codes that are registered according to other language codes that are registered according to the
the instructions in the Langauge Tags RFC. Consult instructions in the Langauge Tags RFC. Consult that memo
that memo for further information. for further information.
Balancing and Nesting of Formatting Commands Balancing and Nesting of Formatting Commands
Pairs of formatting commands must be properly balanced and nested. Pairs of formatting commands must be properly balanced and nested. Thus,
Thus, a proper way to describe text in bold italics is: a proper way to describe text in bold italics is:
<bold><italic>the-text</italic></bold> <bold><italic>the-text</italic></bold>
or, alternately, or, alternately,
<italic><bold>the-text</bold></italic> <italic><bold>the-text</bold></italic>
but, in particular, the following is illegal text/enriched: but, in particular, the following is illegal text/enriched:
<bold><italic>the-text</bold></italic> <bold><italic>the-text</bold></italic>
The nesting requirement for formatting commands imposes a slightly The nesting requirement for formatting commands imposes a slightly
higher burden upon the composers of text/enriched bodies, but higher burden upon the composers of text/enriched bodies, but
potentially simplifies text/enriched displayers by allowing them to potentially simplifies text/enriched displayers by allowing them to be
be stack-based. The main goal of text/enriched is to be simple stack-based. The main goal of text/enriched is to be simple enough to
enough to make multifont, formatted email widely readable, so that make multifont, formatted email widely readable, so that those with the
those with the capability of sending it will be able to do so with capability of sending it will be able to do so with confidence. Thus
confidence. Thus slightly increased complexity in the composing slightly increased complexity in the composing software was deemed a
software was deemed a reasonable tradeoff for simplified reading reasonable tradeoff for simplified reading software. Nonetheless,
software. Nonetheless, implementors of text/enriched readers are implementors of text/enriched readers are encouraged to follow the
encouraged to follow the general Internet guidelines of being general Internet guidelines of being conservative in what you send and
conservative in what you send and liberal in what you accept. Those liberal in what you accept. Those implementations that can do so are
implementations that can do so are encouraged to deal reasonably encouraged to deal reasonably with improperly nested text/enriched data.
with improperly nested text/enriched data.
Unrecognized formatting commands Unrecognized formatting commands
Implementations must regard any unrecognized formatting command as Implementations must regard any unrecognized formatting command as
"no-op" commands, that is, as commands having no effect, thus "no-op" commands, that is, as commands having no effect, thus
facilitating future extensions to "text/enriched". Private facilitating future extensions to "text/enriched". Private extensions
extensions may be defined using formatting commands that begin with may be defined using formatting commands that begin with "X-", by
"X-", by analogy to Internet mail header field names. analogy to Internet mail header field names.
In order to formally define extended commands, a new Internet In order to formally define extended commands, a new Internet document
document should be published. should be published.
White Space in Text/enriched Data White Space in Text/enriched Data
No special behavior is required for the SPACE or TAB (HT) character. No special behavior is required for the SPACE or TAB (HT) character. It
It is recommended, however, that, at least when fixed-width fonts is recommended, however, that, at least when fixed-width fonts are in
are in use, the common semantics of the TAB (HT) character should be use, the common semantics of the TAB (HT) character should be observed,
observed, namely that it moves to the next column position that is a namely that it moves to the next column position that is a multiple of
multiple of 8. (In other words, if a TAB (HT) occurs in column n, 8. (In other words, if a TAB (HT) occurs in column n, where the leftmost
where the leftmost column is column 0, then that TAB (HT) should be column is column 0, then that TAB (HT) should be replaced by 8-(n mod 8)
replaced by 8-(n mod 8) SPACE characters.) It should also be noted SPACE characters.) It should also be noted that some mail gateways are
that some mail gateways are notorious for losing (or, less commonly, notorious for losing (or, less commonly, adding) white space at the end
adding) white space at the end of lines, so reliance on SPACE or TAB of lines, so reliance on SPACE or TAB characters at the end of a line is
characters at the end of a line is not recommended. not recommended.
Initial State of a text/enriched interpreter Initial State of a text/enriched interpreter
Text/enriched is assumed to begin with filled text in a Text/enriched is assumed to begin with filled text in a variable-width
variable-width font in a normal typeface and a size that is average font in a normal typeface and a size that is average for thecurrent
for thecurrent display and user. The left and right margins are display and user. The left and right margins are assumed to be maximal,
assumed to be maximal, that is, at the leftmost and rightmost that is, at the leftmost and rightmost acceptable positions.
acceptable positions.
Non-ASCII character sets Non-ASCII character sets
One of the great benefits of MIME is the ability to use different One of the great benefits of MIME is the ability to use different
varieties of non-ASCII text in messages. To use non-ASCII text in a varieties of non-ASCII text in messages. To use non-ASCII text in a
message, normally a charset parameter is specified in the message, normally a charset parameter is specified in the Content-type
Content-type line that indicates the character set being used. For line that indicates the character set being used. For purposes of this
purposes of this RFC, any legal MIME charset parameter can be used RFC, any legal MIME charset parameter can be used with the text/enriched
with the text/enriched Content-type. However, there are two Content-type. However, there are two difficulties that arise with regard
difficulties that arise with regard to the text/enriched to the text/enriched Content-type when non-ASCII text is desired. The
Content-type when non-ASCII text is desired. The first problem first problem involves difficulties that occur when the user wishes to
involves difficulties that occur when the user wishes to create text create text which would normally require multiple non-ASCII character
which would normally require multiple non-ASCII character sets in sets in the same text/enriched message. The second problem is an
the same text/enriched message. The second problem is an ambiguity ambiguity that arises because of the text/enriched use of the "<"
that arises because of the text/enriched use of the "<" character in character in formatting commands.
formatting commands.
Using multiple non-ASCII character sets Using multiple non-ASCII character sets
Normally, if a user wishes to produce text which contains characters Normally, if a user wishes to produce text which contains characters
from entirely different character sets within the same MIME message from entirely different character sets within the same MIME message (for
(for example, using Russian Cyrillic characters from ISO 8859-5 and example, using Russian Cyrillic characters from ISO 8859-5 and Hebrew
Hebrew characters from ISO 8859-8), a multipart message is used. characters from ISO 8859-8), a multipart message is used. Every time a
Every time a new character set is desired, a new MIME body part is new character set is desired, a new MIME body part is started with
started with different character sets specified in the charset different character sets specified in the charset parameter of the
parameter of the Content-type line. However, using multiple Content-type line. However, using multiple character sets this way in
character sets this way in text/enriched messages introduces text/enriched messages introduces problems. Since a change in the
problems. Since a change in the charset parameter requires a new charset parameter requires a new part, text/enriched formatting commands
part, text/enriched formatting commands used in the first part would used in the first part would not be able to apply to text that occurs in
not be able to apply to text that occurs in subsequent parts. It is subsequent parts. It is not possible for text/enriched formatting
not possible for text/enriched formatting commands to apply across commands to apply across MIME body part boundaries.
MIME body part boundaries.
[RFC-1341] attempted to get around this problem in the now obsolete [RFC-1341] attempted to get around this problem in the now obsolete
text/richtext format by introducing different character set text/richtext format by introducing different character set formatting
formatting commands like "iso-8859-5" and "us-ascii". But this, or commands like "iso-8859-5" and "us-ascii". But this, or even a more
even a more general solution along the same lines, is still general solution along the same lines, is still undesirable: It is
undesirable: It is common for a MIME application to decide, for common for a MIME application to decide, for example, what character
example, what character font resources or character lookup tables it font resources or character lookup tables it will require based on the
will require based on the information provided by the charset information provided by the charset parameter of the Content-type line,
parameter of the Content-type line, before it even begins to before it even begins to interpret or display the data in that body
interpret or display the data in that body part. By allowing the part. By allowing the text/enriched interpreter to subsequently change
text/enriched interpreter to subsequently change the character set, the character set, perhaps to one completely different from the charset
perhaps to one completely different from the charset specified in specified in the Content-type line (with potentially much different
the Content-type line (with potentially much different resource resource requirements), too much burden would be placed on the
requirements), too much burden would be placed on the text/enriched text/enriched interpreter itself.
interpreter itself.
Therefore, if multiple types of non-ASCII characters are desired in Therefore, if multiple types of non-ASCII characters are desired in a
a text/enriched document, one of the following two methods must be text/enriched document, one of the following two methods must be used:
used:
1. For cases where the different types of non-ASCII text can be 1. For cases where the different types of non-ASCII text can be
limited to their own paragraphs with distinct formatting, a limited to their own paragraphs with distinct formatting, a
multipart message can be used with each part having a multipart message can be used with each part having a Content-Type
Content-Type of text/enriched and a different charset of text/enriched and a different charset parameter. The one caveat
parameter. The one caveat to using this method is that each new to using this method is that each new part must start in the
part must start in the initial state for a text/enriched initial state for a text/enriched document. That means that all of
document. That means that all of the text/enriched commands in the text/enriched commands in the preceding part must be properly
the preceding part must be properly balanced with ending balanced with ending commands before the next text/enriched part
commands before the next text/enriched part begins. Also, each begins. Also, each text/enriched part must begin a new paragraph.
text/enriched part must begin a new paragraph.
2. If different types of non-ASCII text are to appear in the same 2. If different types of non-ASCII text are to appear in the same line
line or paragraph, or if text/enriched formatting (e.g. or paragraph, or if text/enriched formatting (e.g. margins,
margins, typeface, justification) is required across several typeface, justification) is required across several different types
different types of non-ASCII text, a single text/enriched body of non-ASCII text, a single text/enriched body part should be used
part should be used with a character set specified that with a character set specified that contains all of the required
contains all of the required characters. For example, a charset characters. For example, a charset parameter of "UNICODE-1-1-UTF-7"
parameter of "UNICODE-1-1-UTF-7" as specified in [RFC-1642] as specified in [RFC-1642] could be used for such purposes. Not
could be used for such purposes. Not only does UNICODE contain only does UNICODE contain all of the characters that can be
all of the characters that can be represented in all of the represented in all of the other registered ISO 8859 MIME character
other registered ISO 8859 MIME character sets, but UTF-7 is sets, but UTF-7 is fully compatible with other aspects of the
fully compatible with other aspects of the text/enriched text/enriched standard, including the use of the "<" character
standard, including the use of the "<" character referred to referred to below. Any other character sets that are specified for
below. Any other character sets that are specified for use in use in MIME which contain different types of non-ASCII text can
MIME which contain different types of non-ASCII text can also also be used in these instances.
be used in these instances.
Use of the "<" character in formatting commands Use of the "<" character in formatting commands
If the character set specified by the charset parameter on the If the character set specified by the charset parameter on the
Content-type line is anything other than "US- ASCII", this means Content-type line is anything other than "US- ASCII", this means that
that the text being described by text/enriched formatting commands the text being described by text/enriched formatting commands is in a
is in a non-ASCII character set. However, the commands themselves non-ASCII character set. However, the commands themselves are still the
are still the same ASCII commands that are defined in this document. same ASCII commands that are defined in this document. This creates an
This creates an ambiguity only with reference to the "<" character, ambiguity only with reference to the "<" character, the octet with
the octet with numeric value 60. In single byte character sets, such numeric value 60. In single byte character sets, such as the ISO-8859
as the ISO-8859 family, this is not a problem; the octet 60 can be family, this is not a problem; the octet 60 can be quoted by including
quoted by including it twice, just as for ASCII. The problem is more it twice, just as for ASCII. The problem is more complicated, however,
complicated, however, in the case of multi-byte character sets, in the case of multi-byte character sets, where the octet 60 might
where the octet 60 might appear at any point in the byte sequence appear at any point in the byte sequence for any of several characters.
for any of several characters.
In practice, however, most multi-byte character sets address this In practice, however, most multi-byte character sets address this
problem internally. For example, the UNICODE character sets can use problem internally. For example, the UNICODE character sets can use the
the UTF-7 encoding which preserves all of the important ASCII UTF-7 encoding which preserves all of the important ASCII characters in
characters in their single byte form. The ISO-2022 family of their single byte form. The ISO-2022 family of character sets can use
character sets can use certain character sequences to switch back certain character sequences to switch back into ASCII at any moment.
into ASCII at any moment. Therefore it is specified that, before Therefore it is specified that, before text/enriched formatting
text/enriched formatting commands, the prevailing character set commands, the prevailing character set should be "switched back" into
should be "switched back" into ASCII, and that only those characters ASCII, and that only those characters which would be interpreted as "<"
which would be interpreted as "<" in plain text should be in plain text should be interpreted as token delimiters in
interpreted as token delimiters in text/enriched. text/enriched.
The question of what to do for hypothetical future character sets The question of what to do for hypothetical future character sets that
that do not subsume ASCII is not addressed in this memo. do not subsume ASCII is not addressed in this memo.
Minimal text/enriched conformance Minimal text/enriched conformance
A minimal text/enriched implementation is one that converts "<<" to A minimal text/enriched implementation is one that converts "<<" to "<",
"<", removes everything between a <param> command and the next removes everything between a <param> command and the next balancing
balancing </param> command, removes all other formatting commands </param> command, removes all other formatting commands (all text
(all text enclosed in angle brackets), and, outside of <nofill> enclosed in angle brackets), and, outside of <nofill> environments,
environments, converts any series of n CRLFs to n-1 CRLFs, and converts any series of n CRLFs to n-1 CRLFs, and converts any lone CRLF
converts any lone CRLF pairs to SPACE. pairs to SPACE.
Notes for Implementors Notes for Implementors
It is recognized that implementors of future mail systems will want It is recognized that implementors of future mail systems will want rich
rich text functionality far beyond that currently defined for text functionality far beyond that currently defined for text/enriched.
text/enriched. The intent of text/enriched is to provide a common The intent of text/enriched is to provide a common format for expressing
format for expressing that functionality in a form in which much of that functionality in a form in which much of it, at least, will be
it, at least, will be understood by interoperating software. Thus, understood by interoperating software. Thus, in particular, software
in particular, software with a richer notion of formatted text than with a richer notion of formatted text than text/enriched can still use
text/enriched can still use text/enriched as its basic text/enriched as its basic representation, but can extend it with new
representation, but can extend it with new formatting commands and formatting commands and by hiding information specific to that software
by hiding information specific to that software system in system in text/enriched <param> constructs. As such systems evolve, it
text/enriched <param> constructs. As such systems evolve, it is is expected that the definition of text/enriched will be further refined
expected that the definition of text/enriched will be further by future published specifications, but text/enriched as defined here
refined by future published specifications, but text/enriched as provides a platform on which evolutionary refinements can be based.
defined here provides a platform on which evolutionary refinements
can be based.
An expected common way that sophisticated mail programs will An expected common way that sophisticated mail programs will generate
generate text/enriched data is as part of a multipart/alternative text/enriched data is as part of a multipart/alternative construct. For
construct. For example, a mail agent that can generate enriched mail example, a mail agent that can generate enriched mail in ODA format can
in ODA format can generate that mail in a more widely interoperable generate that mail in a more widely interoperable form by generating
form by generating both text/enriched and ODA versions of the same both text/enriched and ODA versions of the same data, e.g.:
data, e.g.:
Content-type: multipart/alternative; boundary=foo Content-type: multipart/alternative; boundary=foo
--foo --foo
Content-type: text/enriched Content-type: text/enriched
[text/enriched version of data] [text/enriched version of data]
--foo Content-type: application/oda --foo Content-type: application/oda
[ODA version of data] [ODA version of data]
--foo-- --foo--
If such a message is read using a MIME-conformant mail reader that If such a message is read using a MIME-conformant mail reader that
understands ODA, the ODA version will be displayed; otherwise, the understands ODA, the ODA version will be displayed; otherwise, the
text/enriched version will be shown. text/enriched version will be shown.
In some environments, it might be impossible to combine certain In some environments, it might be impossible to combine certain
text/enriched formatting commands, whereas in others they might be text/enriched formatting commands, whereas in others they might be
combined easily. For example, the combination of <bold> and <italic> combined easily. For example, the combination of <bold> and <italic>
might produce bold italics on systems that support such fonts, but might produce bold italics on systems that support such fonts, but there
there exist systems that can make text bold or italicized, but not exist systems that can make text bold or italicized, but not both. In
both. In such cases, the most recently issued (innermost) recognized such cases, the most recently issued (innermost) recognized formatting
formatting command should be preferred. command should be preferred.
One of the major goals in the design of text/enriched was to make it One of the major goals in the design of text/enriched was to make it so
so simple that even text-only mailers will implement enriched-to- simple that even text-only mailers will implement enriched-to-
plain-text translators, thus increasing the likelihood that enriched plain-text translators, thus increasing the likelihood that enriched
text will become "safe" to use very widely. To demonstrate this text will become "safe" to use very widely. To demonstrate this
simplicity, an extremely simple C program that converts simplicity, an extremely simple C program that converts text/enriched
text/enriched input into plain text output is included in Appendix input into plain text output is included in Appendix A.
A.
Extensions to text/enriched Extensions to text/enriched
It is expected that various mail system authors will desire It is expected that various mail system authors will desire extensions
extensions to text/enriched. The simple syntax of text/enriched, and to text/enriched. The simple syntax of text/enriched, and the
the specification that unrecognized formatting commands should specification that unrecognized formatting commands should simply be
simply be ignored, are intended to promote such extensions. ignored, are intended to promote such extensions.
An Example An Example
Putting all this together, the following "text/enriched" body Putting all this together, the following "text/enriched" body fragment:
fragment:
From: Nathaniel Borenstein <nsb@bellcore.com> From: Nathaniel Borenstein <nsb@bellcore.com>
To: Ned Freed <ned@innosoft.com> To: Ned Freed <ned@innosoft.com>
Content-type: text/enriched Content-type: text/enriched
<bold>Now</bold> is the time for <italic>all</italic> <bold>Now</bold> is the time for <italic>all</italic>
good men good men
<smaller>(and <<women>)</smaller> to <smaller>(and <<women>)</smaller> to
<ignoreme>come</ignoreme> <ignoreme>come</ignoreme>
skipping to change at line 756 skipping to change at line 726
<smaller> <smaller>
should REALLY be called should REALLY be called
<tinier> <tinier>
and that I am always right. and that I am always right.
-- the end -- the end
where the word "beloved" would be in red on a color display. where the word "beloved" would be in red on a color display.
Security Considerations Security Considerations
Security issues are not discussed in this memo, as the mechanism Security issues are not discussed in this memo, as the mechanism raises
raises no security issues. no security issues.
Author's Address Author's Address
For more information, the authors of this document may be contacted For more information, the authors of this document may be contacted via
via Internet mail: Internet mail:
Peter W. Resnick Peter W. Resnick
QUALCOMM Incorporated QUALCOMM Incorporated
1009 North Busey Avenue 6455 Lusk Boulevard
Urbana, IL 61801-1607 San Diego, CA 92121-2779
Phone: +1 217 337 1905 Phone: +1 619 587 1121
FAX: +1 217 337 1905 FAX: +1 619 658 2230
e-mail: presnick@qualcomm.com e-mail: presnick@qualcomm.com
Amanda Walker Amanda Walker
InterCon Systems Corporation InterCon Systems Corporation
950 Herndon Parkway 950 Herndon Parkway
Herndon, VA 22070 Herndon, VA 22070
Phone: +1 703 709 5500 Phone: +1 703 709 5500
FAX: +1 703 709 5555 FAX: +1 703 709 5555
e-mail: amanda@intercon.com e-mail: amanda@intercon.com
Acknowledgements Acknowledgements
The authors gratefully acknowledge the input of many contributors,
readers, and implementors of the specification in this document.
Particular thanks are due to Nathaniel Borenstein, the original author
of RFC 1563.
References References
[RFC-1341] [RFC-1341]
Borenstein, N., Freed, N., "MIME (Multipurpose Internet Mail
Extensions): Mechanisms for Specifying and Describing the Format of
Internet Message Bodies", 06/11/1992.
[RFC-1521] [RFC-1521]
Borenstein, N., Freed, N., "MIME (Multipurpose Internet Mail
Extensions) Part One: Mechanisms for Specifying and Describing the
Format of Internet Message Bodies", 09/23/1993.
[RFC-1523] [RFC-1523]
Borenstein, N., "The text/enriched MIME Content-type", 09/23/1993.
[RFC-1563] [RFC-1563]
Borenstein, N., "The text/enriched MIME Content-type", 01/10/1994.
[RFC-1642] [RFC-1642]
Goldsmith, D., Davis, M., "UTF-7 - A Mail-Safe Transformation
Format of Unicode", 07/13/1994.
[RFC-1766] [RFC-1766]
Alvestrand, H., "Tags for the Identification of Languages",
03/02/1995.
[RFC-1866] [RFC-1866]
Berners-Lee, T., Connolly, D., "Hypertext Markup Language - 2.0",
11/03/1995.
Appendix A--A Simple enriched-to-plain Translator in C Appendix A--A Simple enriched-to-plain Translator in C
One of the major goals in the design of the text/enriched subtype of One of the major goals in the design of the text/enriched subtype of the
the text Content-Type is to make formatted text so simple that even text Content-Type is to make formatted text so simple that even
text-only mailers will implement enriched-to-plain-text translators, text-only mailers will implement enriched-to-plain-text translators,
thus increasing the likelihood that multifont text will become thus increasing the likelihood that multifont text will become "safe" to
"safe" to use very widely. To demonstrate this simplicity, what use very widely. To demonstrate this simplicity, what follows is a
follows is a simple C program that converts text/enriched input into simple C program that converts text/enriched input into plain text
plain text output. Note that the local newline convention (the output. Note that the local newline convention (the single character
single character represented by "\n") is assumed by this program, represented by "\n") is assumed by this program, but that special CRLF
but that special CRLF handling might be necessary on some systems. handling might be necessary on some systems.
#include <ctype.h> #include <ctype.h>
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
main() { main() {
int c, i, paramct=0, newlinect=0, nofill=0; int c, i, paramct=0, newlinect=0, nofill=0;
char token[62], *p; char token[62], *p;
while ((c=getc(stdin)) != EOF) { while ((c=getc(stdin)) != EOF) {
if (c == '<') { if (c == '<') {
if (newlinect == 1) putc(' ', stdout); if (newlinect == 1) putc(' ', stdout);
newlinect = 0; newlinect = 0;
c = getc(stdin); c = getc(stdin);
if (c == '<') { if (c == '<') {
if (paramct <= 0) putc(c, stdout); if (paramct <= 0) putc(c, stdout);
} else { } else {
ungetc(c, stdin); ungetc(c, stdin);
for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) { for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) {
if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c; if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c;
} }
*p = '\0'; *p = '\0';
if (c == EOF) break; if (c == EOF) break;
if (strcmp(token, "param") == 0) if (strcmp(token, "param") == 0)
paramct++; paramct++;
else if (strcmp(token, "nofill") == 0) else if (strcmp(token, "nofill") == 0)
nofill++; nofill++;
else if (strcmp(token, "/param") == 0) else if (strcmp(token, "/param") == 0)
paramct--; paramct--;
else if (strcmp(token, "/nofill") == 0) else if (strcmp(token, "/nofill") == 0)
nofill--; nofill--;
} }
} else { } else {
if (paramct > 0) if (paramct > 0)
; /* ignore params */ ; /* ignore params */
else if (c == '\n' && nofill <= 0) { else if (c == '\n' && nofill <= 0) {
if (++newlinect > 1) putc(c, stdout); if (++newlinect > 1) putc(c, stdout);
} else { } else {
if (newlinect == 1) putc(' ', stdout); if (newlinect == 1) putc(' ', stdout);
newlinect = 0; newlinect = 0;
putc(c, stdout); putc(c, stdout);
} }
}
} }
} /* The following line is only needed with line-buffering */
/* The following line is only needed with line-buffering */ putc('\n', stdout);
putc('\n', stdout); exit(0);
exit(0);
} }
It should be noted that one can do considerably better than this in It should be noted that one can do considerably better than this in
displaying text/enriched data on a dumb terminal. In particular, one displaying text/enriched data on a dumb terminal. In particular, one can
can replace font information such as "bold" with textual emphasis replace font information such as "bold" with textual emphasis (like
(like *this* or _T_H_I_S_). One can also properly handle the *this* or _T_H_I_S_). One can also properly handle the text/enriched
text/enriched formatting commands regarding indentation, formatting commands regarding indentation, justification, and others.
justification, and others. However, the above program is all that is However, the above program is all that is necessary in order to present
necessary in order to present text/enriched on a dumb terminal text/enriched on a dumb terminal without showing the user any formatting
without showing the user any formatting artifacts. artifacts.
Appendix B--A Simple enriched-to-HTML Translator in C Appendix B--A Simple enriched-to-HTML Translator in C
It is fully expected that other text formatting standards like HTML It is fully expected that other text formatting standards like HTML and
and SGML will supplant text/enriched in Internet mail. It is also SGML will supplant text/enriched in Internet mail. It is also likely
likely that as this happens, recipients of text/enriched mail will that as this happens, recipients of text/enriched mail will wish to view
wish to view such mail with an HTML viewer. To this end, the such mail with an HTML viewer. To this end, the following is a simple
following is a simple example of a C program to convert example of a C program to convert text/enriched to HTML. Since the
text/enriched to HTML. Since the current version of HTML at the time current version of HTML at the time of this document's publication is
of this document's publication is HTML 2.0 defined in [RFC-1866], HTML 2.0 defined in [RFC-1866], this program converts to that standard.
this program converts to that standard. There are several There are several text/enriched commands that have no HTML 2.0
text/enriched commands that have no HTML 2.0 equivalent. In those equivalent. In those cases, this program simply puts those commands into
cases, this program simply puts those commands into processing processing instructions; that is, surrounded by "<?" and ">". As in
instructions; that is, surrounded by "<?" and ">". As in Appendix A, Appendix A, the local newline convention (the single character
the local newline convention (the single character represented by represented by "\n") is assumed by this program, but special CRLF
"\n") is assumed by this program, but special CRLF handling might be handling might be necessary on some systems.
necessary on some systems.
#include <ctype.h> #include <ctype.h>
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
main() { main() {
int c, i, paramct=0, nofill=0; int c, i, paramct=0, nofill=0;
char token[62], *p; char token[62], *p;
while((c=getc(stdin)) != EOF) { while((c=getc(stdin)) != EOF) {
if(c == '<') { if(c == '<') {
c = getc(stdin); c = getc(stdin);
if(c == '<') { if(c == '<') {
fputs("&lt;", stdout); fputs("&lt;", stdout);
} else { } else {
ungetc(c, stdin); ungetc(c, stdin);
for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) { for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) {
if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c; if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c;
} }
*p = '\0'; *p = '\0';
if(c == EOF) break; if(c == EOF) break;
if(strcmp(token, "/param") == 0) { if(strcmp(token, "/param") == 0) {
paramct--; paramct--;
putc('>', stdout); putc('>', stdout);
} else if(paramct > 0) { } else if(paramct > 0) {
fputs("&lt;", stdout); fputs("&lt;", stdout);
fputs(token, stdout); fputs(token, stdout);
fputs("&gt;", stdout); fputs("&gt;", stdout);
} else {
putc('<', stdout);
if(strcmp(token, "nofill") == 0) {
nofill++;
fputs("pre", stdout);
} else if(strcmp(token, "/nofill") == 0) {
nofill--;
fputs("/pre", stdout);
} else if(strcmp(token, "bold") == 0) {
fputs("b", stdout);
} else if(strcmp(token, "/bold") == 0) {
fputs("/b", stdout);
} else if(strcmp(token, "italic") == 0) {
fputs("i", stdout);
} else if(strcmp(token, "/italic") == 0) {
fputs("/i", stdout);
} else if(strcmp(token, "fixed") == 0) {
fputs("tt", stdout);
} else if(strcmp(token, "/fixed") == 0) {
fputs("/tt", stdout);
} else if(strcmp(token, "excerpt") == 0) {
fputs("blockquote", stdout);
} else if(strcmp(token, "/excerpt") == 0) {
fputs("/blockquote", stdout);
} else {
putc('?', stdout);
fputs(token, stdout);
if(strcmp(token, "param") == 0) {
paramct++;
putc(' ', stdout);
continue;
}
}
putc('>', stdout);
}
}
} else if(c == '>') {
fputs("&gt;", stdout);
} else { } else {
putc('<', stdout); if(c == '\n' && nofill <= 0 && paramct <= 0) {
if(strcmp(token, "nofill") == 0) { while((i=getc(stdin)) == '\n') fputs("<br>", stdout);
nofill++; ungetc(i, stdin);
fputs("pre", stdout);
} else if(strcmp(token, "/nofill") == 0) {
nofill--;
fputs("/pre", stdout);
} else if(strcmp(token, "bold") == 0) {
fputs("b", stdout);
} else if(strcmp(token, "/bold") == 0) {
fputs("/b", stdout);
} else if(strcmp(token, "italic") == 0) {
fputs("i", stdout);
} else if(strcmp(token, "/italic") == 0) {
fputs("/i", stdout);
} else if(strcmp(token, "fixed") == 0) {
fputs("tt", stdout);
} else if(strcmp(token, "/fixed") == 0) {
fputs("/tt", stdout);
} else if(strcmp(token, "excerpt") == 0) {
fputs("blockquote", stdout);
} else if(strcmp(token, "/excerpt") == 0) {
fputs("/blockquote", stdout);
} else {
putc('?', stdout);
fputs(token, stdout);
if(strcmp(token, "param") == 0) {
paramct++;
putc(' ', stdout);
continue;
} }
} putc(c, stdout);
putc('>', stdout);
} }
}
} else if(c == '>') {
fputs("&gt;", stdout);
} else {
if(c == '\n' && nofill <= 0 && paramct <= 0) {
while((i=getc(stdin)) == '\n') fputs("<br>", stdout);
ungetc(i, stdin);
}
putc(c, stdout);
} }
} /* The following line is only needed with line-buffering */
/* The following line is only needed with line-buffering */ putc('\n', stdout);
putc('\n', stdout); exit(0);
exit(0);
} }
 End of changes. 100 change blocks. 
598 lines changed or deleted 592 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/