Network Working Group C. Newman Internet Draft: The Text/Paragraph Media Type N. Freed Document: draft-newman-mime-textpara-00.txt Innosoft February 1998 The Text/Paragraph Media Type Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Introduction The text/plain media type is defined to represent plain text where the CRLF sequence represents a line break [MIME-IMT]. Many modern computer systems have a different concept of ``plain text'' from the systems where the text/plain media type originated. These modern systems usually use a proportional-spaced font and use CRLF to represent paragraph breaks. Numerous software products have erroneously labelled this media type as text/plain. In order to correct this interoperability problem, the text/paragraph media type is defined. [NOTE: This proposal may be discussed publicly on the ietf- 822@imc.org mailing list. The subscription address is ietf-822- request@imc.org.] 1. Conventions Used in this Document Newman [Page 1] Internet Draft The Text/Paragraph Media Type February 1998 The key words "REQUIRED", "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this document are to be interpreted as described in "Key words for use in RFCs to Indicate Requirement Levels" [KEYWORDS]. 2. The text/paragraph Media Type MIME media type name: text MIME subtype name: paragraph Required parameters: none Optional parameters: charset Encoding considerations: The text/paragraph media type is likely to include paragraphs longer than 1000 characters (including the terminating CRLF characters). Therefore the quoted-printable content transfer encoding is usually necessary for transport over systems with line length limits such as SMTP [SMTP]. If all paragraphs fit within the line length limits of the underlying transport, then the 7-bit or 8-bit content transfer encoding may be used if appropriate for the specified charset. Security considerations: This media type has the same security considerations as text/plain. Interoperability considerations: An agent which receives text/paragraph and doesn't understand it is REQUIRED to display it as text/plain by MIME [MIME-CONF] -- this could result in paragraphs being displayed as long lines which extend outside the visible display area, but many clients offer the ability to wrap such lines. This is an improvement over the current practice where the text/paragraph media type is mislabelled as text/plain and therefore the user agent has a choice between risking the previous problem or damaging text/plain content by word wrapping it. It is simple to convert text/paragraph to text/plain (by word wrapping near the 72nd character column), but there is no algorithmic method to reliably create a text/paragraph media type from a text/plain media type. There is also no algorithmic way to distinguish text/plain from text/paragraph when they are Newman [Page 2] Internet Draft The Text/Paragraph Media Type February 1998 unlabelled. Heuristics can often produce satifactory results in these cases, but when the heuristics fail, the results can be unpleasant. Published specification: The text/paragraph media follows the definition of the text/plain media type as specified in [MIME-IMT] except that a CRLF represents the end of a paragraph rather than the end of a line. Additional information: Magic Number(s): none File extension(s): No clear distinction is made between text/plain and text/paragraph for file extensions. The ".txt" extension is used for both (this internet draft is text/plain). The ".asc" or ".ascii" extension is traditionally used for text/plain with the US-ASCII character set. Macintosh File Type Code(s): TEXT NOTE: the TEXT file type on MacOS is used for both text/paragraph and text/plain. The system application "SimpleText" generates a text/paragraph file with type TEXT (which also may contain MacOS-specific out-of-band markup in the resource fork). Text editors such as those which come with compilers use text/plain. Intended usage: COMMON 3. User-Visible Differences between text/plain and text/paragraph The text/plain media type is not intended to be word wrapped. In fact, word wrapping text/plain can make the text difficult to read, as it results in situations like the following: The text/plain media type is not intended to be word wrapped. In fact, word wrapping text/plain can make the text difficult to read, as it results in situations like the following: The text/plain media type is not intended to be word wrapped. In fact, word wrapping text/plain can make the text difficult to read, Newman [Page 3] Internet Draft The Text/Paragraph Media Type February 1998 as it results in situations like the following: ... The text/paragraph media type, on the other hand, is intended to be word wrapped for display and will not cause this problem. There is an unwritten convention that text/plain is displayed in a fixed-width font, thus permitting the use of ASCII-art to represent graphics and tables. The text/paragraph media type is suitable for display in a proportional width font and for line-wrapping, and thus it is not suitable for ASCII-art (such as is commonly used in message signatures). A common convention in Internet messages with the text/plain media type is to use the ">" character at the beginning of a line to quote text from a previous text/plain message when generating a reply. Agents receiving text/paragraph in a message MAY wish to consider this convention when displaying it. It is important to note that the conversion from text/paragraph to text/plain is algorithmicly a one-way process and thus is not to be done when unnecessary. 4. Requirements User agents MUST offer an option to display text/plain without line wrapping and SHOULD display it in a fixed-width font. User agents MAY offer an option to word wrap text/plain to deal with the unfortunately common practice of mislabelling text/paragraph as text/plain. User agents SHOULD word wrap text/paragraph on display and MAY display it in either a proportional or fixed-width font. Because the algorithm for quality word wrapping can vary by language, generating agents SHOULD include a Content-Language header [LANG]. Non-Internet mail systems often use paragraph-based text formats. A gateway from such a system MUST NOT label paragraph-based text as text/plain. Instead, it MAY label such paragraph-based text as text/paragraph (or a suitable markup format) or convert to text/plain by word wrapping near the 72nd column. 5. Security Considerations This media type introduces no security considerations beyond those Newman [Page 4] Internet Draft The Text/Paragraph Media Type February 1998 which apply to text/plain. Considerations that apply to both follow. Some text display processors will scan text media types for recognizable sequences such as URLs or even nonstandard embedded commands. Embedded commands can have the same sorts of security issues as PostScript. (The discussion of the "application/PostScript" media type [MIME-IMT] considers these risks in detail.) In general, it may be possible to specify commands that perform unauthorized file operations or, make changes to the display processor's environment that affect subsequent operations. Such nonstandard commands are often added for debugging purposes. However, implementors should never assume that existance of such command can be kept secret and as such should make sure that any nonstandard commands with the potential to cause harm are disabled by default. Plain text (or other subtypes of text displayed as plain text) can contain embedded control characters and escape sequences which also have the potential to change the display processor environment in ways that adversely affect subsequent operations. Possible effects include, but are not limited to, locking the keyboard, changing display parameters so subsequent displayed text is unreadable, or even changing display parameters to deliberately obscure or distort subsequent displayed material so that its meaning is lost or altered. Display processors should either filter such material from displayed text or else make sure to reset all important settings after a given display operation is complete. Some terminal devices have keys whose output when pressed can be changed by sending the display processor a character sequence. If this is possible the display of a text object containing such character sequences could reprogram keys to perform some illicit or dangerous action when the key is subsequently pressed by the user. In some cases not only can keys be programmed, they can be triggered remotely, making it possible for a text display operation to directly perform some unwanted action. As such, the ability to program keys should be blocked either by filtering or by disabling the ability to program keys entirely. 6. References [KEYWORDS] Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, Harvard University, March 1997. [LANG] Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, UNINETT, March 1995. Newman [Page 5] Internet Draft The Text/Paragraph Media Type February 1998 [MIME-IMT] Freed, Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, Innosoft, First Virtual, November 1996. [MIME-CONF] Freed, Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples", RFC 2046, Innosoft, First Virtual, November 1996. [SMTP] Postel, "Simple Mail Transfer Protocol", RFC 821, Information Sciences Institute, August 1982. 7. Author's Address Chris Newman Innosoft International, Inc. 1050 Lakes Drive West Covina, CA 91790 USA Email: chris.newman@innosoft.com Ned Freed Innosoft International, Inc. 1050 Lakes Drive West Covina, CA 91790 USA Email: ned.freed@innosoft.com Newman [Page 6]