INTERNET-DRAFT Maurizio Codogno draft-codogno-mime-nntp8bit-00.txt CSELT Expires: February 11, 1999 Date: August 06, 1998 The MIME application/nntp8bit Content-type Status of this Memo This document is an Internet Draft; Internet Drafts are working documents of the Internet Engineering Task Force (IETF) its Areas, and Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. They may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". Please check the abstract listing in each Internet Draft directory for the current status of this or any other Internet Draft. To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract The application/nntp8bit content-type is proposed and defined as an efficient and simple way to transmit raw ("binary") data over an NNTP connection, taking into account the foreseeable limitations of that standard. 1. Introduction Usenet News [NNTP, NEWS] are a very popular data transmission format: at the time of writing, there are tens of thousands of different discussion groups, and the traffic generated per site could be as much as 10 GB/day. The vast majority of the data is composed by binary files (images, audio or video clips, software programs...) which comprise up to 90% of the global traffic. Unfortunately, the two main ways used to codify binary data, that is UUENCODE and MIME application/octet-stream with Content-Transfer-Encoding base64, add a 33% overhead on the dimension of the file sent. The new specifics of the NNTP protocol which are worked up now [NEWNNTP] require an 8-bit-wide channel, and the companion new definition for Usenet Message Format [USEFOR] does not object to the presence of 8-bit data. There is however a problem, which does not Codogno Expires February 1999 [Page 1] Internet Draft application/nntp8bit August, 1998 alloy to send raw data directly: it is not possible to have in the body of an article an ASCII NUL (0x00) character, and ASCII CR and LF (0x0d, 0x0a) must appear together. Moreover, each line in the body must be at most 998 octets long, and must end with the CR-LF sequence (not counted in the 998 octets limit). A rather simple way to cope with these limitation is to develop a MIME Content Type which codes the text in such a way to comply with this. This solution has been preferred to the definition of a new Content Transfer Encoding because it is simple to have the former working: if a newsreader does not understand the format, it is possible to save the article and process it with an external filter. 2. application/nntp8bit Registration Information The following form is copied from RFC 1590, Appendix A: registration of the new media type will be duly performed. To: IANA@isi.edu Subject: Registration of new Media Type content-type/subtype Media Type name: application Media subtype name: nntp8bit Required parameters: Type, a media type/subtype Optional parameters: Name, the name of the file Encoding considerations: it must be encoded "8bit" or "binary". Security considerations: NONE Published specification: RFC-REL (this document). Person & email address to contact for further information: Maurizio Codogno CSELT CF/IM Dept. Via G. Reiss Romoli, 274 I-10148 Torino TO Italy +39 011 228 6132 3. Definition of the coding Since it is expected that, at least in the beginning, the MIME type application/nntp8bit would not be commonly deployed, the specification of the coding has deliberately kept simple. Moreover, it can be supposed that most binary files sent by Usenet News are already compressed: therefore, it was thought that it is simple just to escape offending characters. A single exception has been Codogno Expires February 1999 [Page 2] Internet Draft application/nntp8bit August, 1998 made: since there may be the case that someone sends uncompressed files, and it seems that they contain a large amount of NUL characters, NUL is coded with a single octet. Since no chunk of data between CRLF pairs can be longer than 998 octets, it is also necessary to add CRLF pairs in suitable places. The coding algorithm, written in pseudo-C, runs as follow: ----------------- cut ---------------------- int nchar=0; char c, NUL=0x00, CR=0x0d, LF=0x0a; char X80=0x80, X81=0x81, X8A=0x8a, X8D=0x8d; while ((c=getchar()) != EndOfFile) { if (c == NUL) { printf("%c",X80); nchar++; } else if (c == CR) { printf("%c%c",X81,X8D); nchar+=2; } else if (c == LF) { printf("%c%c",X81,X8A); nchar+=2; } else if (c == X80) { printf("%c%c",X81,X80); nchar+=2; } else if (c == X81) { printf("%c%c",X81,X80); nchar+=2; } else { printf("%c",c); nchar++; } if (nchar >= 997) { printf("%c%c",CR,LF); nchar=0; } } ----------------- cut ---------------------- while the uncoding algorithm is the following: ----------------- cut ---------------------- char c, NUL=0x00, CR=0x0d, LF=0x0a; char X80=0x80, X81=0x81, X8A=0x8a, X8D=0x8d; while ((c=getchar()) != EndOfFile) { if (c == CR) c=getchar(); /* eat CRLF */ else if (c == X80) printf("%c",NUL); else if (c == X81) { c=getchar(); /* get escaped char */ if (c == X80) printf("%c",X80); else if (c == X81) printf("%c",X81); else if (c == X8A) printf("%c",LF); else if (c == X8D) printf("%c",CR); } else printf("%c",c); } ----------------- cut ---------------------- Codogno Expires February 1999 [Page 3] Internet Draft application/nntp8bit August, 1998 Note that a real implementation should of course check for malformed input data, and return correspondingly an error message. The overhead induced by this coding can be roughly measured as follows: - four octets out of 256 are coded with two octects, increasing the total dimension by 1.6% on average; - there are two extra octets each 997 or 998, adding a further 0.2%; - there is the MIME header overhead, which is negligible for large files. It is therefore possible to code a typical article with just 2% overhead, rather than the 33% of UUENCODE or base64 encoding. 4. User Agent Requirements User agents that do not recognize application/nntp8bit shall, in accordance with [MIME], treat the entire entity as application/octet-stream. This is ok, since the data may then be saved as an external file which can be processed offline. MIME User Agents that recognize application/nntp8bit will decode the stream of data and present it to the user as a file with content defined in the Type parameter. 4.1 Recursion MIME is a recursive structure. Hence one must expect an application/nntp8bit entity to contain other application/nntp8bit entities. When a application/nntp8bit entity is being processed for display or storage, any enclosed application/nntp8bit entities shall be processed as though they were being stored. 5. Further work It could be possible to define a way to process articles split before transmission, because of their large size. Two possible ways to do this are - add a MIME optional parameter which says which part of the file is being sent - use an escape sequence "0x81 0xnn", with nn going from 01 to 79, at the beginning of the stream data to indicate which part is being sent. The latter system limits the dimension of the complete file being sent, but it is more compact. Codogno Expires February 1999 [Page 4] Internet Draft application/nntp8bit August, 1998 6. Security considerations It may be possible to prepare a coded stream which can execute malicious programs, if a newsreader cannot understand this MIME Media Type. It has however to be noted that the specifications for Usenet message would allow such a message anyway, so no new security issue should be added. 7. Acknowledgments [I hope someone in the USEFOR IETF group will help me!] The author, however, take full responsibility for all errors contained in this document. 8. References [MIME] Borenstein, N. and Freed, N., "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies", June 1992, RFC 1341. [NEWS] Horton, M., Adams, R., "Standard for Interchange of USENET Messages", December 1987, AT&T Bell Labs and Center for Seismic Studies, RFC 1036. [NEWNNTP] Barber, S. "Network News Transport Protocol", work in progress, ftp://ds.internic.net/internet-drafts/draft-ietf- nntpext-base-04.txt [NNTP] Kantor, B., Lapsley, P., "Network News Transfer Protocol", February 1986, U.C. San Diego and U.C. Berkeley, RFC 977. [USEFOR] Ritter, D., N., "User Article Format", work in progress, ftp://ds.internic.net/internet-drafts/draft-ietf-usefor- article-01.txt 9. Author's address Maurizio Codogno CSELT CF/IM Dept. Via G. Reiss Romoli, 274 I-10148 Torino TO Italy +39 011 228 6132 Codogno Expires February 1999 [Page 5]