idnits 2.17.1 draft-resnick-text-enriched-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 954 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 19 instances of too long lines in the document, the longest one being 28 characters in excess of 72. ** The abstract seems to contain references ([ISO-639], [RFC-1766], [RFC-1866], [RFC-1563], [RFC-1521], [RFC-1523]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 1996) is 10329 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'ISO-639' is mentioned on line 448, but not defined

  -- Looks like a reference, but probably isn't: '62' on line 879

  ** Obsolete normative reference: RFC 1341 (Obsoleted by RFC 1521)

  ** Obsolete normative reference: RFC 1521 (Obsoleted by RFC 2045, RFC 2046,
     RFC 2047, RFC 2048, RFC 2049)

  ** Obsolete normative reference: RFC 1523 (Obsoleted by RFC 1563, RFC 1896)

  ** Obsolete normative reference: RFC 1563 (Obsoleted by RFC 1896)

  ** Obsolete normative reference: RFC 1642 (Obsoleted by RFC 2152)

  ** Obsolete normative reference: RFC 1766 (Obsoleted by RFC 3066, RFC 3282)

  ** Obsolete normative reference: RFC 1866 (Obsoleted by RFC 2854)


     Summary: 17 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group               P. Resnick
2	INTERNET-DRAFT                      QUALCOMM
3	To-obsolete RFCs: 1523, 1563        A. Walker
4	Category: Informational             InterCon
5	                                    January 1996
6	                                    

8	                   The text/enriched MIME Content-type

10	Status of this Memo

12	This document is an Internet-Draft. Internet-Drafts are working
13	documents of the Internet Engineering Task Force (IETF), its areas, and
14	its working groups. Note that other groups may also distribute working
15	documents as Internet-Drafts.

17	Internet-Drafts are draft documents valid for a maximum of six months
18	and may be updated, replaced, or obsoleted by other documents at any
19	time. It is inappropriate to use Internet-Drafts as reference material
20	or to cite them other than as "work in progress."

22	To learn the current status of any Internet-Draft, please check the
23	"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
24	Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
25	munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
26	ftp.isi.edu (US West Coast).

28	Abstract

30	MIME [RFC-1521] defines a format and general framework for the
31	representation of a wide variety of data types in Internet mail. This
32	document defines one particular type of MIME data, the text/enriched
33	MIME type. The text/enriched MIME type is intended to facilitate the
34	wider interoperation of simple enriched text across a wide variety of
35	hardware and software platforms. This document is only a minor revision
36	to the text/enriched MIME type that was first described in [RFC-1523]
37	and [RFC-1563], and is only intended to be used in the short term until
38	other MIME types for text formatting in Internet mail are developed and
39	deployed.

41	The text/enriched MIME type

43	In order to promote the wider interoperability of simple formatted text,
44	this document defines an extremely simple subtype of the MIME
45	content-type "text", the "text/enriched" subtype. The content-type line
46	for this type may have one optional parameter, the "charset" parameter,
47	with the same values permitted for the "text/plain" MIME content-type.

49	The text/enriched subtype was designed to meet the following criteria:

51	  1. The syntax must be extremely simple to parse, so that even
52	     teletype-oriented mail systems can easily strip away the formatting
53	     information and leave only the readable text.

55	  2. The syntax must be extensible to allow for new formatting commands
56	     that are deemed essential for some application.

58	  3. If the character set in use is ASCII or an 8- bit ASCII superset,
59	     then the raw form of the data must be readable enough to be largely
60	     unobjectionable in the event that it is displayed on the screen of
61	     the user of a non-MIME-conformant mail reader.

63	  4. The capabilities must be extremely limited, to ensure that it can
64	     represent no more than is likely to be representable by the user's
65	     primary word processor. While this limits what can be sent, it
66	     increases the likelihood that what is sent can be properly
67	     displayed.

69	There are other text formatting standards which meet some of these
70	criteria. In particular, HTML and SGML have come into widespread use on
71	the Internet. However, there are two important reasons that this
72	document further promotes the use of text/enriched in Internet mail over
73	other such standards:

75	  1. Most MIME-aware Internet mail applications are already able to
76	     either properly format text/enriched mail or, at the very least,
77	     are able to strip out the formatting commands and display the
78	     readable text. The same is not true for HTML or SGML.

80	  2. The current RFC on HTML [RFC-1866] and Internet Drafts on SGML have
81	     many features which are not necessary for Internet mail, and are
82	     missing a few capabilities that text/enriched already has.

84	For these reasons, this document is promoting the use of text/enriched
85	until other Internet standards come into more widespread use. For those
86	who will want to use HTML, Appendix B of this document contains a very
87	simple C program that converts text/enriched to HTML 2.0 described in
88	[RFC-1866].

90	Syntax

92	The syntax of "text/enriched" is very simple. It represents text in a
93	single character set--US-ASCII by default, although a different
94	character set can be specified by the use of the "charset" parameter.
95	(The semantics of text/enriched in non-ASCII character sets are
96	discussed later in this document.) All characters represent themselves,
97	with the exception of the "<" character (ASCII 60), which is used to
98	mark the beginning of a formatting command. A literal less-than sign
99	("<") can be represented by a sequence of two such characters, "<<".

101	Formatting instructions consist of formatting commands surrounded by
102	angle brackets ("<>", ASCII 60 and 62). Each formatting command may be
103	no more than 60 characters in length, all in US-ASCII, restricted to the
104	alphanumeric and hyphen ("-") characters. Formatting commands may be
105	preceded by a solidus ("/", ASCII 47), making them negations, and such
106	negations must always exist to balance the initial opening commands.
107	Thus, if the formatting command "" appears at some point, there
108	must later be a "" to balance it. (NOTE: The 60 character limit
109	on formatting commands does NOT include the "<", ">", or "/" characters
110	that might be attached to such commands.)

112	Line break rules

114	Line breaks (CRLF pairs in standard network representation) are handled
115	specially. In particular, isolated CRLF pairs are translated into a
116	single SPACE character. Sequences of N consecutive CRLF pairs, however,
117	are translated into N-1 actual line breaks. This permits long lines of
118	data to be represented in a natural looking manner despite the frequency
119	of line-wrapping in Internet mailers. When preparing the data for mail
120	transport, isolated line breaks should be inserted wherever necessary to
121	keep each line shorter than 80 characters. When preparing such data for
122	presentation to the user, isolated line breaks should be replaced by a
123	single SPACE character, and N consecutive CRLF pairs should be presented
124	to the user as N-1 line breaks.

126	Thus text/enriched data that looks like this:

128	     This is
129	     a single
130	     line

132	     This is the
133	     next line.

135	     This is the
136	     next section.

138	should be displayed by a text/enriched interpreter as follows:

140	     This is a single line
141	     This is the next line.

143	     This is the next section.

145	The formatting commands, not all of which will be implemented by all
146	implementations, are described in the following sections.

148	Formatting Commands

150	The text/enriched formatting commands all begin with  and
151	end with , affecting the formatting of the text between
152	those two tokens. The commands are described here, grouped according to
153	type.

155	Parameter Command

157	Some of the formatting commands may require one or more associated
158	parameters. The "param" command is a special formatting command used to
159	include these parameters.

161	     Param
162	          Marks the affected text as command parameters, to be
163	          interpreted or ignored by the text/enriched interpreter,
164	          but not to be shown to the reader. The "param" command
165	          always immediately follows some other formatting command,
166	          and the parameter data indicates some additional
167	          information about the formatting that is to be done. The
168	          syntax of the parameter data (whatever appears between
169	          the initial "" and the terminating "") is
170	          defined for each command that uses it. However, it is
171	          always required that the format of such data must not
172	          contain nested "param" commands, and either must not use
173	          the "<" character or must use it in a way that is
174	          compatible with text/enriched parsing. That is, the end
175	          of the parameter data should be recognizable with either
176	          of two algorithms: simply searching for the first
177	          occurrence of "" or parsing until a balanced
178	          "" command is found. In either case, however, the
179	          parameter data should not be shown to the human reader.

181	Font-Alteration Commands

183	The following formatting commands are intended to alter the font in
184	which text is displayed, but not to alter the indentation or
185	justification state of the text:

187	     Bold
188	          causes the affected text to be in a bold font. Nested
189	          bold commands have the same effect as a single bold
190	          command.

192	     Italic
193	          causes the affected text to be in an italic font. Nested
194	          italic commands have the same effect as a single italic
195	          command.

197	     Underline
198	          causes the affected text to be underlined. Nested
199	          underline commands have the same effect as a single
200	          underline command.

202	     Fixed
203	          causes the affected text to be in a fixed width font.
204	          Nested fixed commands have the same effect as a single
205	          fixed command.

207	     FontFamily
208	          causes the affected text to be displayed in a specified
209	          typeface. The "fontfamily" command requires a parameter
210	          that is specified by using the "param" command. The
211	          parameter data is a case-insensitive string containing
212	          the name of a font family. Any currently available font
213	          family name (e.g. Times, Palatino, Courier, etc.) may be
214	          used. This includes font families defined by commercial
215	          type foundries such as Adobe, BitStream, or any other
216	          such foundry. Note that implementations should only use
217	          the general font family name, not the specific font name
218	          (e.g. use "Times", not "TimesRoman" nor
219	          "TimesBoldItalic"). When nested, the inner "fontfamily"
220	          command takes precedence. Also note that the "fontfamily"
221	          command is advisory only; it should not be expected that
222	          other implementations will honor the typeface information
223	          in this command since the font capabilities of systems
224	          vary drastically.

226	     Color
227	          causes the affected text to be displayed in a specified
228	          color. The "color" command requires a parameter that is
229	          specified by using the "param" command. The parameter
230	          data can be one of the following:

232	               red
233	               blue
234	               green
235	               yellow
236	               cyan
237	               magenta
238	               black
239	               white

241	          or an RGB color value in the form:

243	               ####,####,####

245	          where '#' is a hexadecimal digit '0' through '9', 'A'
246	          through 'F', or 'a' through 'f'. The three 4-digit
247	          hexadecimal values are the RGB values for red, green, and
248	          blue respectively, where each component is expressed as
249	          an unsigned value between 0 (0000) and 65535 (FFFF). The
250	          default color for the message is unspecified, though
251	          black is a common choice in many environments. When
252	          nested, the inner "color" command takes precedence.

254	     Smaller
255	          causes the affected text to be in a smaller font. It is
256	          recommended that the font size be changed by two points,
257	          but other amounts may be more appropriate in some
258	          environments. Nested smaller commands produce ever
259	          smaller fonts, to the limits of the implementation's
260	          capacity to reasonably display them, after which further
261	          smaller commands have no incremental effect.

263	     Bigger
264	          causes the affected text to be in a bigger font. It is
265	          recommended that the font size be changed by two points,
266	          but other amounts may be more appropriate in some
267	          environments. Nested bigger commands produce ever bigger
268	          fonts, to the limits of the implementation's capacity to
269	          reasonably display them, after which further bigger
270	          commands have no incremental effect.

272	While the "bigger" and "smaller" operators are effectively inverses, it
273	is not recommended, for example, that "" be used to end the
274	effect of "". This is properly done with "".

276	Since the capabilities of implementations will vary, it is to be
277	expected that some implementations will not be able to act on some of
278	the font-alteration commands. However, an implementation should still
279	display the text to the user in a reasonable fashion. In particular, the
280	lack of capability to display a particular font family, color, or other
281	text attribute does not mean that an implementation should fail to
282	display text.

284	Fill/Justification/Indentation Commands

286	Initially, text/enriched text is intended to be displayed fully filled
287	(that is, using the rules specified for replacing CRLF pairs with spaces
288	or removing them as appropriate) with appropriate kerning and
289	letter-tracking, and using the maximum available margins as suits the
290	capabilities of the receiving user agent software.

292	The following commands alter that state. Each of these commands force a
293	line break before and after the formatting environment if there is not
294	otherwise a line break. For example, if one of these commands occurs
295	anywhere other than the beginning of a line of text as presented, a new
296	line is begun.

298	     Center
299	          causes the affected text to be centered.

301	     FlushLeft
302	          causes the affected text to be left-justified with a
303	          ragged right margin.

305	     FlushRight
306	          causes the affected text to be right-justified with a
307	          ragged left margin.

309	     FlushBoth
310	          causes the affected text to be filled and padded so as to
311	          create smooth left and right margins, i.e., to be fully
312	          justified.

314	     ParaIndent
315	          causes the running margins of the affected text to be
316	          moved in. The recommended indentation change is the width
317	          of four characters, but this may differ among
318	          implementations. The "paraindent" command requires a
319	          parameter that is specified by using the "param" command.
320	          The parameter data is a comma-seperated list of one or
321	          more of the following:

323	          Left
324	               causes the running left margin to be moved to the
325	               right.

327	          Right
328	               causes the running right margin to be moved to the
329	               left.

331	          In
332	               causes the first line of the affected paragraph to
333	               be indented in addition to the running margin. The
334	               remaining lines remain flush to the running margin.

336	          Out
337	               causes all lines except for the first line of the
338	               affected paragraph to be indented in addition to the
339	               running margin. The first line remains flush to the
340	               running margin.

342	     Nofill
343	          causes the affected text to be displayed without filling.
344	          That is, the text is displayed without using the rules
345	          for replacing CRLF pairs with spaces or removing
346	          consecutive sequences of CRLF pairs. However, the current
347	          state of the margins and justification is honored; any
348	          indentation or justification commands are still applied
349	          to the text within the scope of the "nofill".

351	The "center", "flushleft", "flushright", and "flushboth" commands are
352	mutually exclusive, and, when nested, the inner command takes
353	precedence.

355	The "nofill" command is mutually exclusive with the "in" and "out"
356	parameters of the "paraindent" command; when they occur in the same
357	scope, their behavior is undefined.

359	The parameter data for the "paraindent" command my contain multiple
360	occurances of the same parameter (i.e. "left", "right", "in", or "out").
361	Each occurance causes the text to be further indented in the manner
362	indicated by that parameter. Nested "paraindent" commands cause the
363	affected text to be further indented according to the parameters. Note
364	that the "in" and "out" parameters for "paraindent" are mutually
365	exclusive; when they appear together or when nested "paraindent"
366	commands contain both of them, their behavior is undefined.

368	For purposes of the "in" and "out" parameters, a paragraph is defined as
369	text that is delimited by line breaks after applying the rules for
370	replacing CRLF pairs with spaces or removing consecutive sequences of
371	CRLF pairs. For example, within the scope of an "out", the line
372	following each CRLF is made flush with the running margin, and
373	subsequent lines are indented. Within the scope of an "in", the first
374	line following each CRLF is indented, and subsequent lines remain flush
375	to the running margin.

377	Whether or not text is justified by default (that is, whether the
378	default environment is "flushleft", "flushright", or "flushboth") is
379	unspecified, and depends on the preferences of the user, the
380	capabilities of the local software and hardware, and the nature of the
381	character set in use. On systems where full justification is considered
382	undesirable, the "flushboth" environment may be identical to the default
383	environment. Note that full justification should never be performed
384	inside of "center", "flushleft", "flushright", or "nofill" environments.
385	Note also that for some non-ASCII character sets, full justification may
386	be fundamentally inappropriate.

388	Note that [RFC-1563] defined two additional indentation commands,
389	"Indent" and "IndentRight". These commands did not force a line break,
390	and therefore their behavior was unpredictable since they depended on
391	the margins and character sizes that a particular implementation used.
392	Therefore, their use is deprecated and they should be ignored just as
393	other unrecognized commands.

395	Markup Commands

397	Commands in this section, unlike the other text/enriched commands are
398	declarative markup commands. Text/enriched is not intended as a full
399	markup language, but instead as a simple way to represent common
400	formatting commands. Therefore, markup commands are purposely kept to a
401	minimum. It is only because each was deemed so prevalent or necessary in
402	an e-mail environment that these particular commands have been included
403	at all.

405	     Excerpt
406	          causes the affected text to be interpreted as a textual
407	          excerpt from another source, probably a message being
408	          responded to. Typically this will be displayed using
409	          indentation and an alternate font, or by indenting lines
410	          and preceding them with "> ", but such decisions are up
411	          to the implementation. Note that as with the
412	          justification commands, the excerpt command implicitly
413	          begins and ends with a line break if one is not already
414	          there. Nested "excerpt" commands are acceptable and
415	          should be interpreted as meaning that the excerpted text
416	          was excerpted from yet another source. Again, this can be
417	          displayed using additional indentation, different colors,
418	          etc.

420	          Optionally, the "excerpt" command can take a parameter by
421	          using the "param" command. The format of the data is
422	          unspecified, but it is intended to uniquely identify the
423	          text from which the excerpt is taken. With this
424	          information, an implementation should be able to uniquely
425	          identify the source of any particular excerpt, especially
426	          if two or more excerpts in the message are from the same
427	          source, and display it in some way that makes this
428	          apparent to the user.

430	     Lang
431	          causes the affected text to be interpreted as belonging
432	          to a particular language. This is most useful when two
433	          different languages use the same character set, but may
434	          require a different font or formatting depending on the
435	          language. For instance, Chinese and Japanese share
436	          similar character glyphs, and in some character sets like
437	          UNICODE share common code points, but it is considered
438	          very important that different fonts be used for the two
439	          languages, especially if they appear together, so that
440	          meaning is not lost. Also, language information can be
441	          used to allow for fancier text handling, like spell
442	          checking or hyphenation.

444	          The "lang" command requires a parameter using the "param"
445	          command. The parameter data can be any of the language
446	          tags specified in [RFC-1766], "Tags for the
447	          Identification of Languages". These tags are the two
448	          letter language codes taken from [ISO-639] or can be
449	          other language codes that are registered according to the
450	          instructions in the Langauge Tags RFC. Consult that memo
451	          for further information.

453	Balancing and Nesting of Formatting Commands

455	Pairs of formatting commands must be properly balanced and nested. Thus,
456	a proper way to describe text in bold italics is:

458	     the-text

460	or, alternately,

462	     the-text

464	but, in particular, the following is illegal text/enriched:

466	     the-text

468	The nesting requirement for formatting commands imposes a slightly
469	higher burden upon the composers of text/enriched bodies, but
470	potentially simplifies text/enriched displayers by allowing them to be
471	stack-based. The main goal of text/enriched is to be simple enough to
472	make multifont, formatted email widely readable, so that those with the
473	capability of sending it will be able to do so with confidence. Thus
474	slightly increased complexity in the composing software was deemed a
475	reasonable tradeoff for simplified reading software. Nonetheless,
476	implementors of text/enriched readers are encouraged to follow the
477	general Internet guidelines of being conservative in what you send and
478	liberal in what you accept. Those implementations that can do so are
479	encouraged to deal reasonably with improperly nested text/enriched data.

481	Unrecognized formatting commands

483	Implementations must regard any unrecognized formatting command as
484	"no-op" commands, that is, as commands having no effect, thus
485	facilitating future extensions to "text/enriched". Private extensions
486	may be defined using formatting commands that begin with "X-", by
487	analogy to Internet mail header field names.

489	In order to formally define extended commands, a new Internet document
490	should be published.

492	White Space in Text/enriched Data

494	No special behavior is required for the SPACE or TAB (HT) character. It
495	is recommended, however, that, at least when fixed-width fonts are in
496	use, the common semantics of the TAB (HT) character should be observed,
497	namely that it moves to the next column position that is a multiple of
498	8. (In other words, if a TAB (HT) occurs in column n, where the leftmost
499	column is column 0, then that TAB (HT) should be replaced by 8-(n mod 8)
500	SPACE characters.) It should also be noted that some mail gateways are
501	notorious for losing (or, less commonly, adding) white space at the end
502	of lines, so reliance on SPACE or TAB characters at the end of a line is
503	not recommended.

505	Initial State of a text/enriched interpreter

507	Text/enriched is assumed to begin with filled text in a variable-width
508	font in a normal typeface and a size that is average for thecurrent
509	display and user. The left and right margins are assumed to be maximal,
510	that is, at the leftmost and rightmost acceptable positions.

512	Non-ASCII character sets

514	One of the great benefits of MIME is the ability to use different
515	varieties of non-ASCII text in messages. To use non-ASCII text in a
516	message, normally a charset parameter is specified in the Content-type
517	line that indicates the character set being used. For purposes of this
518	RFC, any legal MIME charset parameter can be used with the text/enriched
519	Content-type. However, there are two difficulties that arise with regard
520	to the text/enriched Content-type when non-ASCII text is desired. The
521	first problem involves difficulties that occur when the user wishes to
522	create text which would normally require multiple non-ASCII character
523	sets in the same text/enriched message. The second problem is an
524	ambiguity that arises because of the text/enriched use of the "<"
525	character in formatting commands.

527	Using multiple non-ASCII character sets

529	Normally, if a user wishes to produce text which contains characters
530	from entirely different character sets within the same MIME message (for
531	example, using Russian Cyrillic characters from ISO 8859-5 and Hebrew
532	characters from ISO 8859-8), a multipart message is used. Every time a
533	new character set is desired, a new MIME body part is started with
534	different character sets specified in the charset parameter of the
535	Content-type line. However, using multiple character sets this way in
536	text/enriched messages introduces problems. Since a change in the
537	charset parameter requires a new part, text/enriched formatting commands
538	used in the first part would not be able to apply to text that occurs in
539	subsequent parts. It is not possible for text/enriched formatting
540	commands to apply across MIME body part boundaries.

542	[RFC-1341] attempted to get around this problem in the now obsolete
543	text/richtext format by introducing different character set formatting
544	commands like "iso-8859-5" and "us-ascii". But this, or even a more
545	general solution along the same lines, is still undesirable: It is
546	common for a MIME application to decide, for example, what character
547	font resources or character lookup tables it will require based on the
548	information provided by the charset parameter of the Content-type line,
549	before it even begins to interpret or display the data in that body
550	part. By allowing the text/enriched interpreter to subsequently change
551	the character set, perhaps to one completely different from the charset
552	specified in the Content-type line (with potentially much different
553	resource requirements), too much burden would be placed on the
554	text/enriched interpreter itself.

556	Therefore, if multiple types of non-ASCII characters are desired in a
557	text/enriched document, one of the following two methods must be used:

559	  1. For cases where the different types of non-ASCII text can be
560	     limited to their own paragraphs with distinct formatting, a
561	     multipart message can be used with each part having a Content-Type
562	     of text/enriched and a different charset parameter. The one caveat
563	     to using this method is that each new part must start in the
564	     initial state for a text/enriched document. That means that all of
565	     the text/enriched commands in the preceding part must be properly
566	     balanced with ending commands before the next text/enriched part
567	     begins. Also, each text/enriched part must begin a new paragraph.

569	  2. If different types of non-ASCII text are to appear in the same line
570	     or paragraph, or if text/enriched formatting (e.g. margins,
571	     typeface, justification) is required across several different types
572	     of non-ASCII text, a single text/enriched body part should be used
573	     with a character set specified that contains all of the required
574	     characters. For example, a charset parameter of "UNICODE-1-1-UTF-7"
575	     as specified in [RFC-1642] could be used for such purposes. Not
576	     only does UNICODE contain all of the characters that can be
577	     represented in all of the other registered ISO 8859 MIME character
578	     sets, but UTF-7 is fully compatible with other aspects of the
579	     text/enriched standard, including the use of the "<" character
580	     referred to below. Any other character sets that are specified for
581	     use in MIME which contain different types of non-ASCII text can
582	     also be used in these instances.

584	Use of the "<" character in formatting commands

586	If the character set specified by the charset parameter on the
587	Content-type line is anything other than "US- ASCII", this means that
588	the text being described by text/enriched formatting commands is in a
589	non-ASCII character set. However, the commands themselves are still the
590	same ASCII commands that are defined in this document. This creates an
591	ambiguity only with reference to the "<" character, the octet with
592	numeric value 60. In single byte character sets, such as the ISO-8859
593	family, this is not a problem; the octet 60 can be quoted by including
594	it twice, just as for ASCII. The problem is more complicated, however,
595	in the case of multi-byte character sets, where the octet 60 might
596	appear at any point in the byte sequence for any of several characters.

598	In practice, however, most multi-byte character sets address this
599	problem internally. For example, the UNICODE character sets can use the
600	UTF-7 encoding which preserves all of the important ASCII characters in
601	their single byte form. The ISO-2022 family of character sets can use
602	certain character sequences to switch back into ASCII at any moment.
603	Therefore it is specified that, before text/enriched formatting
604	commands, the prevailing character set should be "switched back" into
605	ASCII, and that only those characters which would be interpreted as "<"
606	in plain text should be interpreted as token delimiters in
607	text/enriched.

609	The question of what to do for hypothetical future character sets that
610	do not subsume ASCII is not addressed in this memo.

612	Minimal text/enriched conformance

614	A minimal text/enriched implementation is one that converts "<<" to "<",
615	removes everything between a  command and the next balancing
616	 command, removes all other formatting commands (all text
617	enclosed in angle brackets), and, outside of  environments,
618	converts any series of n CRLFs to n-1 CRLFs, and converts any lone CRLF
619	pairs to SPACE.

621	Notes for Implementors

623	It is recognized that implementors of future mail systems will want rich
624	text functionality far beyond that currently defined for text/enriched.
625	The intent of text/enriched is to provide a common format for expressing
626	that functionality in a form in which much of it, at least, will be
627	understood by interoperating software. Thus, in particular, software
628	with a richer notion of formatted text than text/enriched can still use
629	text/enriched as its basic representation, but can extend it with new
630	formatting commands and by hiding information specific to that software
631	system in text/enriched  constructs. As such systems evolve, it
632	is expected that the definition of text/enriched will be further refined
633	by future published specifications, but text/enriched as defined here
634	provides a platform on which evolutionary refinements can be based.

636	An expected common way that sophisticated mail programs will generate
637	text/enriched data is as part of a multipart/alternative construct. For
638	example, a mail agent that can generate enriched mail in ODA format can
639	generate that mail in a more widely interoperable form by generating
640	both text/enriched and ODA versions of the same data, e.g.:

642	     Content-type: multipart/alternative; boundary=foo

644	     --foo
645	     Content-type: text/enriched

647	     [text/enriched version of data]
648	     --foo Content-type: application/oda

650	     [ODA version of data]
651	     --foo--

653	If such a message is read using a MIME-conformant mail reader that
654	understands ODA, the ODA version will be displayed; otherwise, the
655	text/enriched version will be shown.

657	In some environments, it might be impossible to combine certain
658	text/enriched formatting commands, whereas in others they might be
659	combined easily. For example, the combination of  and 
660	might produce bold italics on systems that support such fonts, but there
661	exist systems that can make text bold or italicized, but not both. In
662	such cases, the most recently issued (innermost) recognized formatting
663	command should be preferred.

665	One of the major goals in the design of text/enriched was to make it so
666	simple that even text-only mailers will implement enriched-to-
667	plain-text translators, thus increasing the likelihood that enriched
668	text will become "safe" to use very widely. To demonstrate this
669	simplicity, an extremely simple C program that converts text/enriched
670	input into plain text output is included in Appendix A.

672	Extensions to text/enriched

674	It is expected that various mail system authors will desire extensions
675	to text/enriched. The simple syntax of text/enriched, and the
676	specification that unrecognized formatting commands should simply be
677	ignored, are intended to promote such extensions.

679	An Example

681	Putting all this together, the following "text/enriched" body fragment:

683	     From: Nathaniel Borenstein 
684	     To: Ned Freed 
685	     Content-type: text/enriched

687	     Now is the time for all
688	     good men
689	     (and <) to
690	     come

692	     to the aid of their

694	     redbeloved
695	     country.

697	     By the way,
698	     I think that left<

700	     should REALLY be called

702	     left<
703	     and that I am always right.

705	     -- the end

707	represents the following formatted text (which will, no doubt, look
708	somewhat cryptic in the text-only version of this document):

710	     Now is the time for all good men (and ) to come
711	     to the aid of their

713	     beloved country.
714	     By the way, I think that
715	          
716	     should REALLY be called
717	          
718	     and that I am always right.
719	     -- the end

721	where the word "beloved" would be in red on a color display.

723	Security Considerations

725	Security issues are not discussed in this memo, as the mechanism raises
726	no security issues.

728	Author's Address

730	For more information, the authors of this document may be contacted via
731	Internet mail:

733	                            Peter W. Resnick
734	                          QUALCOMM Incorporated
735	                           6455 Lusk Boulevard
736	                        San Diego, CA 92121-2779
737	                         Phone: +1 619 587 1121
738	                          FAX: +1 619 658 2230
739	                      e-mail: presnick@qualcomm.com

741	                              Amanda Walker
742	                      InterCon Systems Corporation
743	                           950 Herndon Parkway
744	                            Herndon, VA 22070
745	                         Phone: +1 703 709 5500
746	                          FAX: +1 703 709 5555
747	                       e-mail: amanda@intercon.com

749	Acknowledgements

751	The authors gratefully acknowledge the input of many contributors,
752	readers, and implementors of the specification in this document.
753	Particular thanks are due to Nathaniel Borenstein, the original author
754	of RFC 1563.

756	References

758	[RFC-1341]
759	     Borenstein, N., Freed, N., "MIME (Multipurpose Internet Mail
760	     Extensions): Mechanisms for Specifying and Describing the Format of
761	     Internet Message Bodies", 06/11/1992.

763	[RFC-1521]
764	     Borenstein, N., Freed, N., "MIME (Multipurpose Internet Mail
765	     Extensions) Part One: Mechanisms for Specifying and Describing the
766	     Format of Internet Message Bodies", 09/23/1993.

768	[RFC-1523]
769	     Borenstein, N., "The text/enriched MIME Content-type", 09/23/1993.

771	[RFC-1563]
772	     Borenstein, N., "The text/enriched MIME Content-type", 01/10/1994.

774	[RFC-1642]
775	     Goldsmith, D., Davis, M., "UTF-7 - A Mail-Safe Transformation
776	     Format of Unicode", 07/13/1994.

778	[RFC-1766]
779	     Alvestrand, H., "Tags for the Identification of Languages",
780	     03/02/1995.

782	[RFC-1866]
783	     Berners-Lee, T., Connolly, D., "Hypertext Markup Language - 2.0",
784	     11/03/1995.

786	Appendix A--A Simple enriched-to-plain Translator in C

788	One of the major goals in the design of the text/enriched subtype of the
789	text Content-Type is to make formatted text so simple that even
790	text-only mailers will implement enriched-to-plain-text translators,
791	thus increasing the likelihood that multifont text will become "safe" to
792	use very widely. To demonstrate this simplicity, what follows is a
793	simple C program that converts text/enriched input into plain text
794	output. Note that the local newline convention (the single character
795	represented by "\n") is assumed by this program, but that special CRLF
796	handling might be necessary on some systems.

798	#include 
799	#include 
800	#include 
801	#include 

803	main() {
804	        int c, i, paramct=0, newlinect=0, nofill=0;
805	        char token[62], *p;

807	        while ((c=getc(stdin)) != EOF) {
808	                if (c == '<') {
809	                        if (newlinect == 1) putc(' ', stdout);
810	                        newlinect = 0;
811	                        c = getc(stdin);
812	                        if (c == '<') {
813	                                if (paramct <= 0) putc(c, stdout);
814	                        } else {
815	                                 ungetc(c, stdin);
816	                                 for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) {
817	                                        if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c;
818	                                 }
819	                                 *p = '\0';
820	                                 if (c == EOF) break;
821	                                 if (strcmp(token, "param") == 0)
822	                                         paramct++;
823	                                 else if (strcmp(token, "nofill") == 0)
824	                                         nofill++;
825	                                 else if (strcmp(token, "/param") == 0)
826	                                         paramct--;
827	                                 else if (strcmp(token, "/nofill") == 0)
828	                                         nofill--;
829	                         }
830	                } else {
831	                        if (paramct > 0)
832	                                ; /* ignore params */
833	                        else if (c == '\n' && nofill <= 0) {
834	                                if (++newlinect > 1) putc(c, stdout);
835	                        } else {
836	                                if (newlinect == 1) putc(' ', stdout);
837	                                newlinect = 0;
838	                                putc(c, stdout);
839	                        }
840	                }
841	        }
842	        /* The following line is only needed with line-buffering */
843	        putc('\n', stdout);
844	        exit(0);
845	}

847	It should be noted that one can do considerably better than this in
848	displaying text/enriched data on a dumb terminal. In particular, one can
849	replace font information such as "bold" with textual emphasis (like
850	*this* or _T_H_I_S_). One can also properly handle the text/enriched
851	formatting commands regarding indentation, justification, and others.
852	However, the above program is all that is necessary in order to present
853	text/enriched on a dumb terminal without showing the user any formatting
854	artifacts.

856	Appendix B--A Simple enriched-to-HTML Translator in C

858	It is fully expected that other text formatting standards like HTML and
859	SGML will supplant text/enriched in Internet mail. It is also likely
860	that as this happens, recipients of text/enriched mail will wish to view
861	such mail with an HTML viewer. To this end, the following is a simple
862	example of a C program to convert text/enriched to HTML. Since the
863	current version of HTML at the time of this document's publication is
864	HTML 2.0 defined in [RFC-1866], this program converts to that standard.
865	There are several text/enriched commands that have no HTML 2.0
866	equivalent. In those cases, this program simply puts those commands into
867	processing instructions; that is, surrounded by "". As in
868	Appendix A, the local newline convention (the single character
869	represented by "\n") is assumed by this program, but special CRLF
870	handling might be necessary on some systems.

872	#include 
873	#include 
874	#include 
875	#include 

877	main() {
878	        int c, i, paramct=0, nofill=0;
879	        char token[62], *p;

881	        while((c=getc(stdin)) != EOF) {
882	                if(c == '<') {
883	                        c = getc(stdin);
884	                        if(c == '<') {
885	                                fputs("<", stdout);
886	                        } else {
887	                                ungetc(c, stdin);
888	                                for (i=0, p=token; (c=getc(stdin)) != EOF && c != '>'; i++) {
889	                                        if (i < sizeof(token)-1) *p++ = isupper(c) ? tolower(c) : c;
890	                                }
891	                                *p = '\0';
892	                                if(c == EOF) break;
893	                                if(strcmp(token, "/param") == 0) {
894	                                        paramct--;
895	                                        putc('>', stdout);
896	                                } else if(paramct > 0) {
897	                                        fputs("<", stdout);
898	                                        fputs(token, stdout);
899	                                        fputs(">", stdout);
900	                                } else {
901	                                        putc('<', stdout);
902	                                        if(strcmp(token, "nofill") == 0) {
903	                                                nofill++;
904	                                                fputs("pre", stdout);
905	                                        } else if(strcmp(token, "/nofill") == 0) {
906	                                                nofill--;
907	                                                fputs("/pre", stdout);
908	                                        } else if(strcmp(token, "bold") == 0) {
909	                                                fputs("b", stdout);
910	                                        } else if(strcmp(token, "/bold") == 0) {
911	                                                fputs("/b", stdout);
912	                                        } else if(strcmp(token, "italic") == 0) {
913	                                                fputs("i", stdout);
914	                                        } else if(strcmp(token, "/italic") == 0) {
915	                                                fputs("/i", stdout);
916	                                        } else if(strcmp(token, "fixed") == 0) {
917	                                                fputs("tt", stdout);
918	                                        } else if(strcmp(token, "/fixed") == 0) {
919	                                                fputs("/tt", stdout);
920	                                        } else if(strcmp(token, "excerpt") == 0) {
921	                                                fputs("blockquote", stdout);
922	                                        } else if(strcmp(token, "/excerpt") == 0) {
923	                                                fputs("/blockquote", stdout);
924	                                        } else {
925	                                                putc('?', stdout);
926	                                                fputs(token, stdout);
927	                                                if(strcmp(token, "param") == 0) {
928	                                                        paramct++;
929	                                                        putc(' ', stdout);
930	                                                        continue;
931	                                                }
932	                                        }
933	                                        putc('>', stdout);
934	                                }
935	                        }
936	                } else if(c == '>') {
937	                        fputs(">", stdout);
938	                } else {
939	                        if(c == '\n' && nofill <= 0 && paramct <= 0) {
940	                                while((i=getc(stdin)) == '\n') fputs("
", stdout);
941	                                ungetc(i, stdin);
942	                        }
943	                        putc(c, stdout);
944	                }
945	        }
946	        /* The following line is only needed with line-buffering */
947	        putc('\n', stdout);
948	        exit(0);
949	}