idnits 2.17.1 draft-bormann-mnnp-nndp-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([RFC0977,RFC1036]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 1998) is 9537 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC0977' on line 32 looks like a reference -- Missing reference section? 'RFC1036' on line 32 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Carsten Bormann 3 Expires: September 1998 Universitaet Bremen TZI 4 March 1998 6 Network News Distribution Protocol: Architecture and Design Guidelines 7 draft-bormann-mnnp-nndp-00.txt 9 Status of this memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its areas, 13 and its working groups. Note that other groups may also distribute 14 working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet-Drafts as reference 19 material or to cite them other than as ``work in progress.'' 21 To learn the current status of any Internet-Draft, please check the 22 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 Distribution of this document is unlimited. 29 Abstract 31 This document describes an architecture and a set of protocols for 32 distributing Netnews [RFC0977, RFC1036] via IP multicast enabled 33 networks. The architecture is designed to be useful in the global 34 Internet. In particular, it allows multiple news servers to 35 cooperate on multicasting each new article only once. To facilitate 36 scalability to tens of thousands of news servers, it also provides 37 for receive-only multicast participants (that continue to send 38 articles via conventional NNTP). 40 This document is a submission to the IETF MNNP working group. 41 Comments are solicited and should be addressed to the working groups' 42 mailing list at ietf-mnnp@va.pubnix.com and/or the author. 44 1. Introduction 46 Netnews (or Usenet news) is one of the more important systems for 47 electronic communication that make up what is now loosely called 48 ``the Internet'' in the media. Usenet operates by flood-distributing 49 messages called articles between participating systems, called news 50 servers. The Usenet is experiencing growth problems as with any 51 other element of the thriving Internet environment. 53 It is widely recognized that NNTP, the article distribution system in 54 use in the Usenet, is running into scaling problems. Some ISPs are 55 reporting numbers of between 7 and 12 % for the NNTP contribution to 56 their backbone traffic -- this for a data stream that is less than 64 57 kbit/s in total (see below). 59 As Usenet is fundamentally a multicasting system, an obvious approach 60 is to apply the emerging Internet network layer multicasting 61 technology to Usenet distribution. One experiment described in the 62 literature, MUSE [firehose paper], transmitted Usenet articles as UDP 63 multicast packets between participating sites. While this experiment 64 was moderately successful, it suffered from packet loss problems 65 (that increase exponentially with the number of fragments generated 66 from one article). Also, a scalable security architecture was not 67 defined for this experiment. 69 This document defines an architecture and sketches two protocols to 70 make network layer multicasting more useful for news distribution. 71 The architecture will, in reference to an earlier experiment 72 [newscaster] be called Newscaster-2 or simply Newscaster; the two 73 protocols will be called NNDP (Network News Distribution Protocol) 74 and NNDCP (Network News Distribution Coordination Protocol), 75 respectively. 77 1.1. Benefits of multicasting Netnews 79 Distributing Netnews via network layer multicast provides a number of 80 benefits. For ISPs, Newscaster can help to significantly reduce the 81 backbone NNTP load: Each article traverses each link (in the best 82 case) only once instead of traversing the backbone links multiple 83 times, once to each target news server. 85 One other benefit of Newscaster will be reduced article propagation 86 times -- while current NNTP servers can be very fast, Newscaster 87 replaces multiple unicast hops between news servers by a single 88 multicast hop. As propagation times currently measure on the order 89 of hours, a reduction to the order of minutes would be a nice 90 achievement; a reduction below that (to seconds) is, however, not 91 intended. (As a side benefit, Newscaster will reduce the link 92 bandwidth consumed by a leaf news receiver by using batching and 93 compression and by reducing the NNTP/TCP/IP overhead incurred per 94 article.) 96 1.2. Basic Assumptions 98 This document makes a number of assumptions about the basic technical 99 parameters of the Netnews system. We assume a total number of new 100 news articles to be distributed per day in the few hundred thousands, 101 i.e., one to a few articles per second. We also assume that the 102 total volume of those articles is on the order of hundreds of 103 megabytes per day, i.e., tens to a few hundreds of kbit/s. 104 Newscaster-2 is scalable beyond those numbers, but not infinitely so. 105 [In particular, ``similar'' problems with different technical 106 parameters (such as live stock price feeds) are not necessarily 107 supported as efficiently as the actual worldwide Netnews system; 108 solving such similar problems is explicitly a non-goal of the 109 architecture.] 111 In addition, we assume that the concept of News servers that receive 112 a full feed of news articles continues to be useful. On-demand 113 retrieval of news articles from neighboring servers is an interesting 114 concept but outside the scope. We believe that most News servers 115 will want to receive most of the articles in the Netnews system; 116 Newscaster does not support elaborate mechanisms to receive a 117 specific subset of articles that cover exactly the newsgroups that 118 are ``subscribed'' by a News server. (Newscaster does support 119 partitioning the global news-feed into a few general subsets, such as 120 alt.* and comp.*/sci.*.) 122 One very important point in the design of a multicast Netnews 123 distribution system is that, even if it takes off quickly, News 124 server administrators will not simply turn off their existing, well- 125 understood and robust system of NNTP feeds. To make a feature out of 126 what could be considered a bug, the Newscaster system is intended to 127 work with and be supplementary to the NNTP system. Newscaster-based 128 news servers continue to speak NNTP to neighboring systems, using 129 NNTP as a background scheme to fill in articles that it might have 130 missed in the multicast distribution. Therefore, Newscaster can be a 131 much more light-weight protocol as it needs not be 100 % reliable. 133 1.3. The multiple-entry problem 135 Given that Newscaster is not replacing, but supplementing NNTP, and 136 that the Newscaster system will for a long time be only a subset of 137 the global Netnews system, the two distribution mechanisms need to 138 cooperate. The most significant problem here is that a single news 139 article may be flood-distributed from its source via NNTP and reach 140 multiple Newscaster systems at about the same time (observations in 141 the live network show that this now often happens for multiple well- 142 connected news servers within a second). As, in a multicast 143 scenario, there is no way to ask all the receivers whether they 144 already have received an article, this, without further mechanisms, 145 would mean that Newscasters regularly send multiple redundant copies 146 of a single article. 148 This document proposes a coordination protocol between Newscaster 149 systems to decide which Newscaster system distributes a particular 150 article. The coordination protocol is separate from the distribution 151 protocol; receive-only sites need not be involved in the coordination 152 protocol. Note that correctness of the coordination protocol is not 153 a prerequisite to correctness of the overall system, only to its 154 efficiency, i.e., an occasional slip (multiple transmission of one 155 article) is tolerable. 157 2. The Newscaster Architecture 159 2.1. Protocols 161 Newscaster assumes an underlying IP multicast network such as the 162 experimental Mbone and/or the operational IP multicast networks being 163 deployed by many ISPs. The multicast network is assumed to be able 164 to sustain a rate-controlled low-bandwidth stream of packets for 165 extended periods; the only form of congestion control envisaged is 166 that receivers can drop out if they experience consistent congestion. 168 To achieve a degree of performance in the presence of losses in the 169 experimental Mbone, some form of error control is required. To 170 achieve good scalability without router support, the distribution 171 protocol only uses forward error correction; as news servers gain 172 multicast connectivity, they simply can start listening to the feed 173 without having to send any (unicast or multicast) data. 175 The coordination protocol does not need to be as scalable as the 176 distribution protocol: It will be hard to impossible to coordinate 177 between a few tens of thousand news servers, and various features of 178 the distribution protocol (batching, compression, digital signatures) 179 argue for limiting the number of active Newscaster servers. We 180 assume that new articles travel via NNTP to the nearest active 181 Newscaster system and are multicast from there to the rest of the 182 world. 184 Appendix A defines a preliminary coordination protocol based on a 185 multicast transport protocol called MTP-2. (This protocol is a 186 version of MTP (RFC1301) that was developed further to be more useful 187 in WANs. It allows multicasting a sequence of arbitrary size 188 messages, each of which can consist of one or more multicast packets. 189 The MTP-2 protocol provides a global sequencing of the messages, as 190 well as global rate control.) 192 Other coordination protocols may be defined. Passive, receive-only 193 Newscaster systems need not be aware of the coordination protocol 194 being used -- they only need to understand the distribution protocol. 195 In particular, the distribution protocol can be used from a single 196 source to a local (e.g., per-ISP) set of receivers; the coordination 197 protocol then becomes trivial. 199 2.2. Operation of active Newscasters 201 A news server actively participating in the Newscaster system is 202 simply called a Newscaster. The set of cooperating Newscasters is 203 called the Newscaster Web. The entire Web is a single news system 204 from the point of view of RFC1036 Path headers. For the global 205 Newscaster Web, the name of the news system as it occurs in the Path 206 header is "newscaster-2.mcast.net". Additional local Newscaster Webs 207 can be created, if needed, under different names. 209 Each Newscaster examines each article it receives via NNTP or other 210 means whether it already contains a Newscaster Path header entry and 211 immediately removes it from further consideration in the Newscaster 212 Web if this is the case (in the INN implementation of the Netnews 213 protocols, this is done automatically if the outgoing link is 214 identified by the Web name, e.g. "newscaster-2.mcast.net"). 216 Those articles that do not contain a Newscaster Path header entry are 217 then prepared for being multicast into the Web. Several such 218 articles will generally be sent together as a batch. The 219 coordination protocol is used to decide, for each article, whether it 220 is actually this Newscaster which will distribute the article. At 221 the service interface, an implementation of a coordination protocol 222 receives a set of message-ids (a tentative batch) as input and 223 returns a (possibly empty) subset of the message-ids to be sent in an 224 actual batch. In general, each Newscaster should have only one set 225 of articles in progress with the coordination protocol at any point 226 in time. Further articles arriving during processing by the 227 coordination protocol should be collected for a future tentative 228 batch. Also, Newscasters should wait a few seconds for further 229 articles to arrive before submitting a new batch to the coordination 230 protocol. 232 Actual batches are then formed out of the articles selected according 233 to RFC 1036, section 4.3. They are then compressed using the gzip 234 format (RFC1952) and digitally signed (see below). Finally, they are 235 distributed using the distribution protocol. 237 2.3. Security 239 Any system that transports Netnews must provide some basic security 240 against spoofing attacks. Since the multicasting system itself 241 provides only very limited assurances that a source address is 242 correct, we resort to cryptographic measures. 244 Simple shared-secret authentication is not scalable -- in a 245 production version, thousands of News server administrators would 246 have to be in possession of the key. Instead, a public key system is 247 used, based on a web-of-trust security policy. 249 In the current NNTP system, each news server administrator trusts its 250 neighbor news server administrators to institute a good local usage 251 policy and to respond to incidents in a manner that helps to preserve 252 the integrity of the news system. The transitive closure of this web 253 of trust equals the actual connectivity of the news system. If a 254 news administrator misbehaves, he runs the risk of being 255 disconnected. 257 The Newscaster security policy attempts to mimic this existing policy 258 by cryptographic means. Instead of creating NNTP links to 259 ``neighboring'' systems, a news administrator creates certificates 260 for all the Newscasters that she trusts. These certificates are 261 regularly distributed in a newsgroup that is reserved for this 262 purpose (such as, news.config.newscaster), ensuring they can be 263 received even by sites that are not yet in possession of all the 264 certificates. Every receive-only system has to trust one or more 265 sites (e.g., the Newscaster equivalent of a ``well-connected site'') 266 to root its certificate chain. If a receiver of a Newscaster batch 267 does not find a certificate chain that verifies the signature of the 268 batch, it discards the batch. 270 * Issue *: What type of key system and digital signature is used? 271 Newscaster should provide relatively fast signature checking with 272 modest, but (due to batching) not necessarily stellar signing 273 performance. The author would tend to use RFC1991 type (PGP) 274 formats, using RSA and MD5. 276 3. NNDP: The distribution protocol 278 The NNDP distribution protocol is used to distribute payloads to all 279 receivers. Payloads will generally be small to a few dozen 280 kilobytes, but may be much larger in case a large article needs to be 281 transferred. The job of the distribution protocol is to: 283 - partition the payload into packets that can be multicast without 284 being fragmented on the way. We assume an Internet-wide MTU of 285 1280 (based on the IPv6 MTU) and save 80 bytes for header 286 overhead (IP, UDP, other), leaving 1200 bytes for the 287 distribution protocol data. 289 - add forward error correction. We use Vandermonde matrices as 290 implemented by Luigi Rizzo 291 [http://www.iet.unipi.it/~luigi/vdm.tgz]. The amount of error 292 correction to be added is a system parameter: For small batches, 293 we always add at least one FEC packet. For larger batches, the 294 FEC overhead is defined by a constant expansion factor. (This 295 factor could be chosen to match the TCP equation at the rate 296 intended.) For very large batches, the batch is split into 297 units which are independently subjected to FEC (packets from all 298 units of a batch are interleaved to spread out the 299 transmission). 301 - multicast the data at a defined rate (leaky bucket model). It 302 is the job of the coordination protocol to assign a rate to each 303 batch to be sent. (The rate should be relatively low to space 304 out the packets, allowing FEC to work around burst losses.) 306 - enable reassembly/erasure processing at the receiver. The 307 batches are tagged by a unique, 80-bit global ID, which is 308 assigned by the coordination protocol (e.g., global source 309 ID/sequence number). (Note that reassembly errors are not 310 catastrophic, as an incorrectly reassembled batch will be 311 rejected at signature check.) Each packet carries a total batch 312 size, a unit number within the batch, a packet number within the 313 unit, and the number of packets to be sent per unit (N). 315 distribution protocol packet layout 316 0 1 2 3 317 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 318 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 319 | global ID | 320 + + 321 | | 322 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 323 | | N | 324 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 325 | pkt idx | unit idx | 326 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 327 | total batch size | 328 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 329 | rate | 330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 331 | data | 332 | .... | 333 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 335 (For a discussion of the rate parameter, see NNDCP below.) 337 * Issue *: What is a good unit size? E.g., 128 KB? Should we 338 actually use the TCP equivalence equation to compute an expansion 339 factor from the rate? 341 4. Acknowledgments 343 This document has been prompted by the discussions in the MNNP BOF at 344 the Washington IETF. In particular, the author would like to thank 345 Joe Malcolm for the thought-provoking discussions at this IETF. 347 5. References 349 TBD 351 6. Addresses 353 6.1. Working Group 355 [The MNNP working group is in creation.] 357 6.2. Author's address 358 Carsten Bormann 359 Universitaet Bremen FB3 TZI 360 Postfach 330440 361 D-28334 Bremen, GERMANY 362 cabo@tzi.org 363 phone +49.421.218-7024 364 fax +49.421.218-7000 366 7. Annex A: MTP-2 based coordination protocol 368 When a batch is being prepared, a short MTP-2 message (an 369 announcement) is sent that just contains the message IDs of the 370 articles in the batch. When this message has been transmitted in the 371 MTP-2 Web and all lower-numbered messages have arrived, the 372 Newscaster removes those articles from the batch that have been 373 announced in lower-numbered announcements. This, in the steady state 374 case, makes it unlikely that two Newscasters will be transmitting the 375 same article concurrently. However, Newscasters that return after a 376 multicast outage would start to transmit old articles (that they have 377 received via NNTP while other systems got them via Newscaster). To 378 minimize the impact of such late-comers on the Newscast efficiency, 379 Newscasters only newscast articles they have newly received while 380 being active in the Web (i.e., no spooling). 382 For IPv4, the global ID of a batch is composed of the concatenation 383 of the IP address of the MTP-2 master at the time of receiving the 384 announcement and the 24-bit MTP-2 sequence number, filled with zeroes 385 at the end. 387 Rate control is performed in the following way: Each Newscaster is 388 aware of the total system rate defined for the Web (e.g., 128 389 kbit/s). Newscasters that are transmitting batches share this 390 bandwidth by setting up short-term reservations. Each Newscaster 391 also maintains a running idea of all the reservations currently in 392 effect. Upon reception of an announcement, the receiving newscaster 393 considers half the unreserved system rate to be reserved for the 394 announcer. This reservation is corrected by the actual rate used by 395 the sender, once an NNDP packet is received for this batch (rate 396 field). The sender of a batch is allowed to use up to half of what 397 it considers to be the unreserved rate at the time it receives its 398 own announcement for this batch. Each Newscaster deletes a 399 reservation for a batch once the sender should have stopped sending 400 data, according to its actual chosen rate and the size of the batch 401 as indicated in the NNDP packets, or (if no NNDP packets were 402 received at all), after a timeout of T_SEND (T_SEND is initially set 403 to 15 seconds). Newscasters avoid using silly rates (i.e., less than 404 a very small fraction of the system rate for a large batch). 406 8. Annex B: Newscasters: Active vs. Passive 408 Given that there are tens of thousands of news servers in operation, 409 and that NNDCP is intended to work between maybe a thousand active 410 Newscasters, the question immediately comes to mind which news 411 servers should be active Newscasters and which should only listen to 412 the global Netnews distribution. In essence, this is of course a 413 judgment call, which may be guided by: 415 - Multicast connectivity. An active Newscaster obviously needs to 416 be able to source multicast traffic, not just receive it. Given 417 the current tendency of ISPs to charge extra for multicast 418 sourcing, many news servers may not want to become active 419 Newscasters. 421 - Path lengths. While the Newscaster architecture takes out many 422 hops from the Netnews distribution paths, an article needs to 423 traverse NNTP hops up to the first active Newscaster before it 424 can be efficiently multicast to the rest of the world. Often, a 425 (topological) region will want to maintain at least one active 426 Newscaster to minimize those path lengths. 428 - Maintaining the web of trust. Maintainers of active Newscasters 429 need to actively work on maintaining their position in the web 430 of trust that is used as the security foundation of Newscaster.