INTERNET DRAFT EXPIRES SETP 1998 INTERNET DRAFT Network Working Group Heiko W.Rupp Experimental 8.3.1998 A Protocol for the Transmission of Net News Articles over IP multicast. Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Abstract Mcntp (Multicast News Transfer Protocol) provides a way to use the IP multicast infrastructure to transmit NetNews articles between news servers. Doing so will reduce the bandwidth that is actually needed for transmission of articles which is mostly done via NNTP. This does not affect how news reading clients communicate with servers. Overview and Rationale NetNews are bulk data that are produced in large quantities every day around the world. Distribution of NetNews on the Internet are usually distributed with NNTP[1]. In order to get a fast and redundant distribution many news servers communicate with many others, thus imposing a higher load on the underlying network than necessary. Assume the following scenario: +--------- R1 / S -- A ------- B -------- R2 \ +--------- R3 A sender S which wants to transmit articles via NNTP to receivers R1...R3 will thus transmit them three times across the link from A to B. With IP Multicast[2], an efficient way to distribute datagrams to groups of users exists in the Internet. Thus articles would traverse the link A to B only once, thus reducing load on that link. This cannot be done with existing news transfer technology, as it is Rupp [Page 1] based on TCP[10] which cannot be multicasted. The protocol described in this memo is designed to put news articles into datagrams and distribute them via IP multicast to receivers that are interested in the specific newsgroup. For more information about NetNews, refer to [1] and [7]. Protocol overview This paragraph will show how news articles are propagated with Mcntp. Basically, three parties are involved: + Multicast directory service, MD, coordinates the assignment between multicast and news groups. + A Multicast sender, MS, that sends news articles over an IP multicast infrastructure + A Multicast receiver, MR, gets packets from the IP multicast infrastructure and processes them further. So this can be seen as follows: directory +---------+ directory +------------- > | | ----------------+ \|/ | MD | \|/ +---------+ | | +--------+ | | +---------+ | | | MS | | MR | | | ------------ articles -------> | | +---------+ +--------+ MS and MR can be implemented into existing news server software, or can be implemented as separate processes that communicate with the news servers (e.g. via NNTP); this does not matter to the protocol. MD can either be implemented within MS, or as separate processes that communicate with each other. A practical way is to have on MD per sender host so that communication between MD and MS is fast and reliable, while not too many resources are needed. The protocol itself consist of two parts that will be presented in the next two chapters -- Distribution of articles and the directory service. Packet format -- Distribution of articles To send articles via IP multicast, they have to be encapsulated into Rupp [Page 2] UDP packets. The following diagram shows how this can be done: +---------------------------------------+ | Magic | +----+----+----+----+---------+---------+ | Ver|Rev |Comp|Cryp|Reserved | Offset | +----+----+----+----+---------+---------+ | Original length | +---------------------------------------+ | Length as sent | +---------------------------------------+ | Sender-ID | +---------------------------------------+ | Message-id | +---------------------------------------+ | Data | +---------------------------------------+ All entries are in network byte order. The fields have the following meaning and types: + Magic (32-bit): The String ``McNt'' + Ver (4-bit): Protocol version -- currently 1 + Rev (4-bit): Protocol revision -- currently 1 + Comp (4-bit): Compression method used. Currently are only 2 methods defined: 0 Article is not compressed 1 Article is compressed via zlib [8] + Cryp (4-bit): Encryption method used. See below + Reserved (8-bit): Reserved for future extensions. + Offset (8-bit): Offset of article data from packet start + Original length (32-bit): Length of article before compression, encryption and signing. + Length as sent (including digital signature) (32-bit): Size of the ``Data'' field (see below) + Sender-ID: Identification of the sender host terminated by a Rupp [Page 3] null byte (see below). + Message-ID: The message id of the article in the form it is defined in RFC 1036 [7], terminated by a null byte. + Data: The signed article data after possible compression and encryption. This memo does not specify a encryption method for the case that the field ``Crypt'' is set to anything other than 0; the involved parties (i.e. the senders of encrypted news and their receivers) have to agree on a method they want to use. If encryption and compression is used then the article data is first to be compressed and then the result to be encrypted. All articles must be signed before sending them off the net. This is accomplished by running the RIPEMD-160 message digest [11] algorithm over the (possibly compressed and encrypted) article and then RSA- encrypting the message digest with the private key that is suitable for sender-id. The receiver decrypts the signature of the article with the public key of sender-id and runs RIPEMD-160 over the data to see if it has been altered on the way. An article with an invalid signature or a non matching message digest has to be thrown away. The sender-id can be the path entry or the hostname of the sending site; there can also be more than one key pair per site e.g. to have different keys for different newsgroups. The sender-id has to be treated in a case independent manner. Encryption of the message digest is done the following way. The 20 Bytes RIPEMD-160 message digest and the first 28 bytes of the (possibly compressed and encrypted) message are tacked together to form a 48 Bytes buffer. This buffer is then encrypted with the right RSA private key and prepended to the original message without the first 28 bytes: +----------------+---------------------------------+ | Message digest | message | +----------------+-----------+---------------------+ 1 28 n | \ +-----------------------+----------------------------------+ | Signature | message without first 28 Bytes | +-----------------------+----------------------------------+ To send an article off, it is encapsulated and then just sent to the appropriate multicast group. There is no feedback from the receiver to the sender when an article is received. Rupp [Page 4] Directory service In order to get a relation between newsgroups and multicast groups, a directory service exists; this has been referenced as MD above. When a sender MS wants to propagate a news group, it asks the directory service for a multicast group it can use to distribute articles, waits for the reply, and starts to send. The directory server registers this group in its tables and periodically distributes this table over IP multicast. For this purpose, the multicast group ``mcntp-directory.mcast.net'' has been officially been assigned by the IANA. The UDP port which announcements are sent to, has officially been assigned by the IANA as UDP port number 5418 with the name ``mcntp''. Announcements should not be sent too often to keep traffic low, but often enough that new receivers don't have to wait to long to be able to receive articles. Once a minute is assumed to be a good value here. Announcements can be sent less often if they are transmitted immediately after a change in the directory. If more than one directory server is involved (e.g. if there is more than one sender site), the directory servers have to listen to announcement packets on ``mcntp-directory.mcast.net''. If it does not receive a packet after five times the waiting period (e.g., five minutes) it can consider itself alone on the net and can choose the multicast groups as it wishes. See below on usage scenarios which further explain this. Groups that are local to an organisation (e.g. an ISP) or should stay within their bounds, must be transported within the range of the administratively scoped multicast addresses [12]. When a receiver (MR) wants to receive a newsgroup, it listens on ``mcntp-directory.mcast.net'' for announcements, parses them, and then joins the appropriate multicast groups. Multicast groups that are no longer in use (e.g. because the sender has stopped working) must be removed from the announcement. Rupp [Page 5] The format of those announcement packets is: +-----------+------+-----+--------+ | Magic | Vers | Rev | Offset | +-----------+------+-----+--------+ | Length | +---------------------------------+ | rmd160 | +---------------------------------+------+ | Sender-ID | pad1 | +-----------------+------+---+-----------+---+ -+ | Multicast group | Port |TTL| Newsgroup |pad| | +-----------------+------+---+-----------+---+ | ... repeat ... | NG lines +-----------------+------+---+-----------+---+ | | Multicast group | Port |TTL| Newsgroup |pad| | +-----------------+------+---+-----------+---+ -+ All numbers are in network byte order. The fields have the following meaning and types: + Magic (16-bit): The Bytes 0xabba. + Vers (4-bit): Protocol version (see below). + Rev (4-bit): Protocol revision (currently 1). + Offset (8-bit): Offset of NG-lines from packet start. + Length (32-bit): Total packet length. + rmd160 (160-bit): RIPEMD-160 message digest over the rest of the packet. + Sender-ID : Identification of sender host, terminated by a null byte (see below). + Pad1: Padding to next 4-Byte boundary filled with null bytes. + Multicast group (32-bit *): The associated multicast group. + Port (16-bit): UDP Port to use for this group. + TTL (8-bit): Time to live for multicast packets. + Newsgroup: Name of the Newsgroup(s), terminated by a null Rupp [Page 6] byte. See also below. + Pad: Padding of the string to the next 4-bytes boundary filled with null bytes. The protocol version (Vers) is currently 1 for IPv4 and 11 for IPv6. The multicast group field (*) is 32 bit in size for IPv4 and 128-bit for IPv6 in size. The length field is 32 bit in size to support IPv6 jumbo datagrams. The sender-ID is normally the fully qualified domain name of the hosts that sends the announcement. As is common practice with NetNews, this can also be the (possibly shorter) entry that the host puts in the ``Path:'' header when an article passes through it. This entry has to be treated in a case independent manner. The rmd160 is computed over the sender-id field and all lines with newsgroup to multicast group relations in the packet with the RIPEMD-160 message digest algorithm. The lines with newsgroup to multicast group relations are repeated as often as needed to announce all groups. The TTL can be used by clients to find out if packets that come from this source can reach them, or if the sender is too far away. Note that all entries have to fit into one UDP packet. The sender-id and the newsgroups entries are padded to the next 4-bytes boundary in order to make processing easier. TTL values of articles have to be chosen, especially for use on the MBONE, in a way that newsgroups that are only of local relevance (e.g. campus groups or groups local to a town) are not distributed out of their normal distribution area. As already mentioned above, articles that are only of a local meaning or of local relevance, must be distributed within the administratively scoped group range [12]. The relation between multicast and newsgroups can range from one multicast group per newsgroup over one multicast group per news hierarchy (e.g. comp.*) to all articles in only one group. As current implementations of kernels and routers get inefficient with too many multicast groups, the use of one multicast group per newsgroup is deprecated. Rupp [Page 7] Reliability Considerations As UDP is a unreliable service, provisions for reliable distribution of articles are needed. There exist some approaches to reliable multicast (XTP [4], KLG [5] RMTP [6] and others) which all suffer from some problem or other. Specifically, additional hard- or software is needed and usually requires kernel modification. As there is already a reliable transport of NetNews via NNTP, there is no need for a reliable transport via IP multicast: articles need not be in order, so it is no problem if one is missing in the multicast. Since articles need not arrive in order, lost or missing articles can easily be transmitted via an additional NNTP feed. As UDP packets can be at maximum 64kBytes in size and every Mcntp packet has to fit in one UDP packet, there is no provision given to distribute news articles larger than about 63kBytes in size (other than compressing them). This does not matter much in practise as recent research has shown that more than 95% off all news articles are smaller 64kBytes [9]. The remaining 5% can still be transferred via NNTP. Some hosts may have problems in receiving UDP packets as large as 64kBytes, so in practical use article sizes of 16kBytes would be appropriate. These are still over 90% of all articles. Usage Scenarios These scenarios show how mcntp can be used in daily use. The main difference between local and MBone wide usage are the multicast groups that are used for distribution as stated above. For a local use within an organisation there could be one central sending site that redistributes all news articles it receives via mcntp. No further action is needed. When more than one directory server (MD) gets involved, directory servers must wait on startup for announcement packets from other MD processes, register the contained groups in its tables and make decisions involving that tables. Decisions can be divided into the following: Use If the group in which the sender (MS) wants to send is already distributed over multicast, then the articles are distributed in the existing group else a new multicast group is used. For example: if de.* is already distributed over multicast group a.b.c.d then use that group. Rupp [Page 8] New Always create own multicast groups that don't clash with the ones that are already existing. For example: if comp.* is already distributed on group a.b.c.e and the sender (MS) wants to distribute comp.foo, don't use group a.b.d.e, but create a new one. Standby Only send articles for a specified newsgroup when no one other is doing it. This can be used to implement backup functionality. For example: Sender A is sending comp.*. If now a directory packet arrives at site B, which no longer has comp.* in it, B can start to send comp.*. When it sees again announcements from A, then it stops the distribution of comp.*. For use in an environment, where multiple organisations are involved (e.g. on the MBone), the following could deployed: everyone that is participating utilizes the ``use'' method described above. It only sends articles that are locally produced (e.g. customers) and which are not distributed via mcntp by another site. No articles received from news peers should be distributed that way. After some delay (at least 10 seconds), articles which are distributed via mcntp are offered to peers over nntp as usual. The set of groups that is distributed must be negotiated between the involved organisations. With the current Usenet groups this could be: - rec.* - comp.*,news.*,gnu.* - talk.*,misc.* - humanities.*,sci.*,soc.* - alt.* Usage over Satellite connections While in some regions of the world, terrestial bandwidth is cheap, there are other regions where this is not the case. But those regions can be reached by satellite beams. There are already some NetNews over satellite mechanisms in place which often have their proprietary protocol in place. With mcntp, transport of articles can go over the same equipement, which is in place for IP communications. A possible setup can be found in [13]. This setup has also the advantage that no backchannel is needed, which allows the use of small and cheap anten- nas on the receiver side. Rupp [Page 9] Summary The distribution of NetNews articles via IP multicast can be a way to decrease the network bandwidth used to distribute them. Articles are delivered fast via a nonreliable protocol; later, the holes are filled via a reliable, already existing protocol. Compression of articles can further reduce the network Load. With encryption private news groups can be established on a public IP multicast infrastructure. A prototype of a reference implementation [13] already shows that Mcntp is fast and can be used as an alternative to classical transports. The use of zlib for compressing articles shows a reduction of transferred volume (including protocol headers) to about 65% of the original article volumes. In cooperation with Orion Network Systems [14], mcntp has proven its use for distribution of NetNews over a unidirectional satellite connection. Security Considerations With the classical NNTP based distribution, every host on the path of an article keeps track of it in the logfiles, making it possible to find the sender of forged or abusive articles with the aid of the administrators of the newshosts along the path. For the distribution of NetNews over IP multicast, this is no longer true: routers don't log packets flowing by and as the sender address of IP packets can be forged, a sender can't be traced. This fact can be used to inject forged news articles without being traceable. To prevent the unnoticed injection of articles, a mcntp receiver only accepts articles from senders that it trusts. This trust is build by digitally signing the article with the private key of the sender and verifying the signature at the receiver site. Receivers have to accept only articles with good signatures The RIPEMD-160 message digest algorithm has been chosen, as it is more secure than MD5 while still being fast enough. The RSA encryption algorithm has been chosen as there exist reference implementations for usage inside US (from RSA Inc.) and outside (rsaeuro by J.S.A.Kapp). The key size for the RSA algorithm must be at least 512 bit in size to prevent cracking of the key. Rupp [Page 10] References [1] RFC 977 -- B. Kantor, P. Lapsley, "Network News Transfer Protocol: A Proposed Standard for the Stream-Based Transmission of News". [2] RFC 1112 -- S. Deering, "Host extensions for IP multicasting", 08/01/1989. [3] RFC 768 -- J. Postel, "User Datagram Protocol", 08/28/1980. [4] XTP -- W. T. Strayer, D.J. Dmepsey, B.C. Weaver, "XTP: The Xpress Transfer Protocol", Addison-Wesley [5] KLG -- M. Hofmann, "Zuverlaessige Kommunikation in heterogenen Netzen", Thesis at "Institut fuer Telematik, CS Dept. Univ Karlsruhe" [6] RMTP -- Lin, John C., Paul Sanjoy, "RMTP: A Reliable Multicast Transport Protocol". [7] RFC 1036 -- M. Horton, R. Adams, "Standard for interchange of USENET messages", 12/01/1987. [8] RFC 1950 -- L. Deutsch, J. Gailly, "ZLIB Compressed Data Format Specification version 3.3", 05/23/1996. [9] http://www.pilhuhn.de/mcntp/histo/ -- Some Statistics about size distribution of NetNews [10] RFC 793 -- J. Postel, "Transmission Control Protocol", 09/01/1981. [11] H. Dobbertin, A. Bosselaers, B. Preneel, "RIPEMD-160: A Strengthened Version of RIPEMD" 04/18/1996. An earlier version appeared in "Fast Software Encryption,LNCS 1039" Springer Verlag, 1996, pp. 71-82. [12] [draft-ietf-mboned-admin-ip-space-04.txt/number of rfc] David Meyer, "Administratively Scoped IP Multicast", [date of rfc] [13] http://www.pilhuhn.de/mcntp/ [14] http://www.onsi.com/ Rupp [Page 11] Author's Address Heiko W.Rupp Gerwigstr. 5 D-76131 Karlsruhe Phone: +49 721 9661524 EMail: hwr@pilhuhn.de Rupp [Page 12] INTERNET DRAFT EXPIRES SEPT 1998 INTERNET DRAFT