idnits 2.17.1 

draft-ietf-rmt-design-space-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 19 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 309 instances of weird spacing in the document.  Is it really
     formatted ragged-right, rather than justified?

  ** There are 8 instances of too long lines in the document, the longest one
     being 3 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 14 has weird spacing: '...This  document...'

  == Line 22 has weird spacing: '...and  may  be  ...'

  == Line 38 has weird spacing: '...lticast  is  r...'

  == Line 39 has weird spacing: '...lutions  havin...'

  == Line 40 has weird spacing: '...atively  small...'

  == (304 more instances...)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 1999) is 9081 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'KCW98' is defined on line 763, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BC94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CDT99'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FLST98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FJM95'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ha99'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HC97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'KCW98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PPV98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ri97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RV97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RMWT98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'WHA98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'WKM94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'WGL97'


     Summary: 8 errors (**), 0 flaws (~~), 10 warnings (==), 16 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                                      RMT WG
2	INTERNET-DRAFT                                            M. Handley, ACIRI
3	draft-ietf-rmt-design-space-00.txt                     B. Whetten, Talarian
4	                                                       R. Kermode, Motorola
5	                                                            S. Floyd, ACIRI
6	                                                         L. Vicisano, Cisco
7	                                                             16th June 1999
8	                                                          Expires: Dec 1999

10	       The Reliable Multicast Design Space for Bulk Data Transfer

12	Status of this Memo

14	This  document  is an Internet-Draft and is in full conformance with all
15	provisions of Section 10 of RFC2026.

17	Internet-Drafts are working documents of the Internet  Engineering  Task
18	Force (IETF), its areas, and its working groups.  Note that other groups
19	may also distribute working documents as Internet-Drafts.

21	Internet-Drafts are draft documents valid for a maximum  of  six  months
22	and  may  be  updated,  replaced, or obsoleted by other documents at any
23	time.  It is inappropriate to use Internet- Drafts as reference material
24	or to cite them other than as "work in progress."

26	The list of current Internet-Drafts can be accessed at
27	http://www.ietf.org/ietf/1id-abstracts.txt

29	The list of Internet-Draft Shadow Directories can be accessed at
30	http://www.ietf.org/shadow.html.

32	Copyright Notice

34	Copyright (C) The Internet Society (1999).  All Rights Reserved.

36	Abstract

38	The design space for reliable multicast  is  rich,  with  many  possible
39	solutions  having been devised.  However, application requirements serve
40	to constrain this design space to a  relatively  small  solution  space.
41	This  document  provides an overview of the design space and the ways in
42	which application constraints affect possible solutions.

44	1.  Introduction

46	The term ``general purpose reliable multicast protocol'' is something of
47	an  oxymoron.   Different  applications have different requirements of a
48	reliable multicast protocol, and these requirements constrain the design
49	space  in  ways  that two applications with differing requirements often
50	cannot share a single solution.  There are however many successful reli-
51	able multicast protocol designs that serve more special purpose require-
52	ments well.

54	In this document we attempt to review the design space for reliable mul-
55	ticast  protocols  intended  for bulk data transfer.  The term bulk data
56	transfer should be taken as having broad meaning - the main  limitations
57	are that the data stream is continuous and long lived - constraints nec-
58	essary for the forms of congestion control we currently understand.  The
59	purpose  of  this  review is to gather together an overview of the field
60	and to make explicit the constraints imposed by  particular  mechanisms.
61	The aim is to provide guidance to the standardization process for proto-
62	cols and protocol building blocks.  In doing this, we cluster  potential
63	solutions into a number of loose categories - real protocols may be com-
64	posed of mechanisms from more than one of these clusters.

66	The main constraint on solutions is imposed by  the  need  to  scale  to
67	large  receiver  sets.  For small receiver sets the design space is much
68	less restricted.

70	2.  Application Constraints

72	Application requirements for reliable multicast (RM) are  as  broad  and
73	varied  as  the  applications  themselves.   However, there are a set of
74	requirements that significantly affect the design of an RM protocol.   A
75	brief list includes:

77	o    Does  the application need to know that everyone received the data?

79	o    Does  the  application  need  to  constrain   differences   between
80	     receivers?

82	o    Does the application need to scale to large numbers of receivers?

84	o    Does the application need to be totally reliable?

86	o    Does the application need to provide low-delay delivery?

88	o    Does the application need to provide time-bounded delivery?
89	o    Does the application need many interacting senders?

91	o    Is the application data flow intermittent?

93	o    Does the application need to work in the public Internet?

95	o    Does  the application need to work without a return path (eg satel-
96	     lite)?

98	o    Does the application need to provide secure delivery?

100	In the context of standardizing bulk data  transfer  protocols,  we  can
101	rule out applications with multiple interacting senders and intermittent
102	data flows.  It is not that these applications are unimportant, but that
103	we do not yet have effective congestion control for such applications.

105	2.1.  Did everyone receive the data?

107	Some  applications  have  a strong requirement for confirmation that all
108	the receivers got the data, or if not, to be informed of which  specific
109	receivers failed to receive all the data.  Examples include applications
110	where receivers pay for data, and reliable file-system replication.

112	Other applications do not have such a requirement.  An  example  is  the
113	distribution of free software.

115	If  the application does need to know that everyone got the data, then a
116	positive acknowledgment must be received from every  receiver,  although
117	it  may be possible to aggregate these acknowledgments.  If the applica-
118	tion needs to know precisely which receivers failed  to  get  the  data,
119	additional constraints are placed on acknowledgment aggregation.

121	Note that this requirement is on application data unit (ADU) acknowledg-
122	ment and not necessarily on packet acknowledgment.

124	2.2.  Constraining differences

126	Some applications need to constrain  differences  between  receivers  so
127	that  the  data reception characteristics for all receivers falls within
128	some range.  An example is a stock price feed, where it is  unacceptable
129	for  a  receiver  to  suffer delivery that is delayed significantly more
130	than any other receiver.

132	This requirement is difficult to satisfy  without  harming  performance.
133	Typically  solutions  involve  not sending more than a limited amount of
134	new data until positive acknowledgments have been received from all  the
135	receivers.   Such  a  solution does not cope with network and end-system
136	failures well.

138	2.3.  Receiver Set Scaling

140	There are many applications for RM that do not need to  scale  to  large
141	numbers  of  receivers.  For such applications, a range of solutions may
142	be available that are not available for applications  where  scaling  to
143	large receiver sets is a requirement.

145	Receiver  set scaling is one of the most important constraining require-
146	ments, because it places strict limits on the mechanisms used to achieve
147	reliability.

149	In  a  very  small  system,  it  may be acceptable to have the receivers
150	acknowledge every packet.  This approach provides the  sender  with  the
151	maximum  amount  of  information  about  reception conditions at all the
152	receivers.

154	For larger systems, such ``flat ACK'' schemes cause  acknowledge  implo-
155	sions  at the sender.  Attempts have been made to reduce this problem by
156	sending aggregate ACKs infrequently [RMWT98, BC94], but it is very  dif-
157	ficult  to  incorporate effective congestion control into such protocols
158	because of the spareness of feedback.

160	Using negative acknowledgments (NACKs)  instead  of  ACKs  reduces  this
161	problem  to  one  of NACK implosion (only from the receivers missing the
162	packets), and because the sender really only needs to know that at least
163	one receiver is missing data, various NACK suppression mechanisms can be
164	applied.

166	An alternative to NACKs is ACK aggregation, which can be done by arrang-
167	ing  the  receivers into a logical tree, so that each leaf sends ACKs to
168	its parent which aggregates them, and passes them on up the tree.  Tree-
169	based protocols scale well, but tree formation can be problematic.

171	Other ACK topologies such as rings are also possible, but are often more
172	difficult to form and maintain than trees are.  An alternative  strategy
173	is  to  add mechanisms to routers so that they can help out in achieving
174	reliability or in reducing the cost of achieving reliability.

176	All these solutions improve receiver set scaling, but they all have lim-
177	its  of  one form or another.  One class of solutions scales to an infi-
178	nite number of receivers  by  having  no  feedback  channel  whatsoever.
179	These  solutions  take the initial data and encode it using an FEC-style
180	mechanism.  This encoded data is transmitted  in  a  continuous  stream.
181	Receivers  then  join  the  session  and receive packets until they have
182	sufficient packets to decode the original  data,  at  which  point  they
183	leave the session.

185	Thus,  it is clear that the intended scale of the session constrains the
186	possible solutions.  All solutions will work for  very  small  sessions,
187	but  as  the intended receive set increases, the range of possible solu-
188	tions that can be deployed safely decreases.

190	2.4.  Total vs Semi-reliable

192	Many applications require delivery to be totally reliable; if any of the
193	data is missing, none of the data is useful.  File transfer applications
194	are a good example of applications requiring total reliability.

196	However, some applications do not need total reliability.  An example is
197	audio  broadcasting,  where  missing  packets  reduce the quality of the
198	received audio but do not render it  unusable.   Such  applications  can
199	sometimes get by without any additional reliability over native IP reli-
200	ability, but often having a semi-reliable multicast protocol  is  desir-
201	able.

203	2.5.  Time-bounded Delivery

205	Many  applications just require data to be delivered to the receivers as
206	fast as possible.  They have no absolute deadline for delivery.

208	However, some applications have hard delivery constraints - if the  data
209	does  not arrive at the receiver by a certain time, there is no point in
210	delivering it at all.  Such time-boundedness may be as a result of real-
211	time constraints such as with audio or video streaming, or as the result
212	of new data superseding old data.  In both cases, the requirement is for
213	the  application to have a greater degree of control over precisely what
214	the application sends at which time than might be required with applica-
215	tions such as file transfer.

217	Time-bounded delivery usually also implies a semi-reliable protocol, but
218	the converse does not necessarily hold.

220	3.  Network Constraints

222	The properties of the network in which the application is being deployed
223	may themselves constrain the reliable multicast design space.

225	3.1.  Internet vs Intranet

227	In  principle the Internet and intranets are the same.  In practice how-
228	ever, the fact that an intranet is under one administration might  allow
229	for solutions to be configured that can not easily be done in the public
230	Internet.  Thus, if the data is of very high value, it might  be  appro-
231	priate to enhance the routers to provide assistance to a reliable multi-
232	cast transport protocol.  In the public Internet, it  is  unlikely  that
233	the  additional  expense  required  to support this state in the routers
234	would be acceptable.

236	3.2.  Return Path

238	In principle, when feedback is required from  receivers,  this  feedback
239	can  be  multicast or unicast.  Multicast feedback has advantages, espe-
240	cially in NACK-based protocols where it is valuable  for  NACK  suppres-
241	sion.  However, it is not clear at this time whether all ISPs will allow
242	all members of a session to send to that session.  If multicast feedback
243	is  not allowed, then unicast feedback can almost always be substituted,
244	although often at the expense of additional messages and mechanisms.

246	Some networks may not allow any form of feedback however.   The  primary
247	example  of this occurs with satellite broadcasts where the back channel
248	may be very narrow or even non-existent.  For such networks the solution
249	space  is  very  constrained  -  only  FEC-based encodings have any real
250	chance of working.  If the receivers  are  direct  satellite  receivers,
251	then  no  congestion control is needed, but it is dangerous to make such
252	assumptions because it is possible for a satellite  hop  to  feed  down-
253	stream  networks.  Thus, congestion control still needs to be considered
254	with solutions that do not have a return path.

256	3.3.  Network Assistance

258	A reliable multicast protocol must involve  mechanisms  running  in  end
259	hosts,  and  must involve routers forwarding multicast packets.  However
260	under some circumstances, it is possible  to  rely  on  some  additional
261	degree  of  assistance  from  network elements.  Broadly speaking we can
262	cluster RM protocols into four classes depending on the degree  of  sup-
263	port received from other network elements.

265	No Additional Support
266	    The  routers  merely  forward  packets,  and  only  the  sender  and
267	    receivers have any reliable multicast protocol state.

269	Layered Approaches
270	    Data is split across  multiple  multicast  groups.   Receivers  join
271	    appropriate  groups  to receive only the traffic they require.  This
272	    may in some cases require fast join or leave functionality from  the
273	    routers, and may require more forwarding state in the routers.

275	Server-based Approaches
276	    Additional  nodes  are used to assist with data delivery or feedback
277	    aggregation.  These additional nodes might not be normal senders  or
278	    receivers,  and  may be present on the distribution or feedback tree
279	    only to provide assistance to the reliable multicast protocol.  They
280	    would not otherwise receive the multicast traffic.

282	Router-based Approaches
283	    With  router-based  approaches, routers on the normal data distribu-
284	    tion tree from the sender to the receivers assist in the delivery of
285	    data  or  feedback  aggregation  or  suppression.   As  routers  can
286	    directly influence multicast routing, they have  more  control  over
287	    which   traffic  goes  to  which  group  members  than  server-based
288	    approaches.  However routers do not normally have a large amount  of
289	    spare memory or processing power, which restricts how much function-
290	    ality can be placed in the routers.  In  addition,  router  code  is
291	    normally more difficult to upgrade than application code, so router-
292	    based approaches need to be very general as they are more  difficult
293	    to deploy and to change.

295	4.  The RM Solution Space

297	RM solutions can be roughly categorized as using one or more of the fol-
298	lowing techniques:

300	o   Data packet acknowledgment.

302	o   Negative acknowledgment of missing data packets.

304	o   Redundancy allowing not all packets to be received.

306	These techniques themselves can be usefully subdivided, so that  we  can
307	examine  the  parts of the requirement space in which each mechanism can
308	be deployed.

310	4.1.  ACK-based Mechanisms

312	The simplest ACK-based mechanism involves every receiver sending an  ACK
313	packet  for  every data packet it receives.  Such mechanisms are limited
314	to very small receiver groups by the implosion of ACKs received  at  the
315	sender,  and for this reason they are impractical for most applications.

317	Putting multiple ACKs into a single data  packet  [RMWT98]  reduces  the
318	implosion  problem  by  a  constant  amount,  allowing  slightly  larger
319	receiver groups.  However a limit is soon reached  whereby  feedback  to
320	the  sender is too infrequent for sender-based congestion control mecha-
321	nisms to work reliably.

323	Arranging the receivers into a ring [WKM94] whereby an ``ACK-token''  is
324	passed around the ring prevents the implosion problem for data.  However
325	ring creation and maintenance may itself be problematic.  Also  if  ring
326	creation does not take into account network topology (something which is
327	difficult to achieve in practice), then the number of ACK packets cross-
328	ing  the  network  backbone  for each data packet sent may increase O(n)
329	with the number of receivers.

331	4.1.1.  Tree-based ACK Mechanisms

333	Arranging the receivers into a tree [MWB+98,  KCW98]  whereby  receivers
334	generate  ACKs to a parent node, which aggregates those ACKs to its par-
335	ent in turn, is both more robust and more easily configured than a ring.
336	The  ACK-tree  is typically only used for ACK-aggregation - data packets
337	are multicast from the sender to the receivers  as  normal.   Trees  are
338	easier  to  construct  than  rings because more local information can be
339	used in their construction.  Also they can be more fault  tolerant  than
340	rings  because  node failures only affect a subset of receivers, each of
341	which can easily and locally decide to by-pass  its  parent  and  report
342	directly  to  the node one level higher in the tree.  With good ACK-tree
343	formation, tree-based ACK mechanisms have the potential to be one of the
344	most scalable RM solutions.

346	To  be  simple to deploy, tree-based protocols must be self-organizing -
347	the receivers must form the tree themselves using local information in a
348	scalable  manner.   Such  mechanisms  are possible, but are not trivial.
349	The main scaling limitations of tree-based protocols therefore come from
350	the  tree  formation and maintenance mechanisms rather than from the use
351	of ACKs.  Without such a scalable and  automatic  tree-formation  mecha-
352	nism, tree-based protocols must rely on manual configuration, which sig-
353	nificantly limits their applicability (often to intranets) and  (due  to
354	the complexity of configuration) their scalability.

356	Orthogonal  to  the  issue  of  tree  formation  is the issue of subtree
357	retransmission.  With appropriate router mechanisms, or the use of  mul-
358	tiple  multicast  groups,  it is possible to allow the intermediate tree
359	nodes to retransmit missing data to the nodes below  them  in  the  tree
360	rather than relying on the original sender to retransmit the data.  This
361	relies on there being a good correlation at the point of the  intermedi-
362	ate node between the ACK tree and the actual data tree, as well as there
363	being a mechanism to constrain the retransmission  to  the  subtree.   A
364	good  automatic tree formation mechanism combined with the use of admin-
365	istrative scoped multicast groups might provide such a solution. Without
366	such  tree  formation mechanisms, subtree retransmission is difficult to
367	deploy in large groups in the  public  internet.   This  could  also  be
368	solved by the use of transport-level router mechanisms to assist or per-
369	form retransmission, although existing router mechanisms  [FLST98]  sup-
370	port NACK-based rather than ACK-based protocols.

372	Another  important  issue  is the nature of the aggregation performed at
373	interior nodes on the ACK-tree.  Such nodes could:

375	1.  aggregate ACKs by sending a single ACK when all their children  have
376	    ACKed,

378	2.  aggregate ACKs by listing all the children that have ACKed,

380	3.  send an aggregated ACK with a NACK-like exception list.

382	For  data packets, 1. is clearly more scalable, and should be preferred.
383	However if the sender needs to know exactly which receivers received the
384	data, 2. and 3. provide this information.  Fortunately, there is usually
385	no need to do this on a per-packet basis, but rather on a per-ADU basis.
386	Doing  1.  on  a per packet basis, and 3. on a per ADU basis is the most
387	scalable solution for applications that need this information, and  suf-
388	fers virtually no disadvantage compared to the other solutions used on a
389	per-packet basis.

391	4.2.  NACK-based mechanisms

393	Instead of sending an ACK for every data packet received, receivers  can
394	send  a  negative  acknowledgment (NACK) for every data packet they dis-
395	cover they did not receive.  This has a number of advantages  over  ACK-
396	based mechanisms:

398	o   The  sender no longer needs to know exactly how many receivers there
399	    are.  This removes the topology-building phase needed for  ring-  or
400	    tree-style ACK-based algorithms.

402	o   Fault-tolerance is made somewhat simpler by making receivers respon-
403	    sible for reliability.

405	o   Sender state can be significantly reduced because  the  sender  does
406	    not need to keep track of the receivers state.

408	o   Only  a single NACK is needed from any receiver to indicate a packet
409	    that is missing by any number of receivers.  Thus  NACK  suppression
410	    is possible.

412	The  disadvantages  are that it is more difficult for the sender to know
413	that it can free transmission buffers, and that additional session level
414	mechanisms are needed if the sender really needs to know if a particular
415	receiver actually received all the data.  However for many applications,
416	neither of these is an issue.

418	4.2.1.  NACK Suppression

420	The key differences between NACK-based protocols is in how NACK-suppres-
421	sion is performed.  The goal is for only one NACK to  reach  the  sender
422	(or  a  node that can resend the missing data) as soon as possible after
423	the loss is first noticed, and for only one copy of the missing data  to
424	be received by those nodes needing retransmission.

426	Different  mechanisms  come close to satisfying these goals in different
427	ways.

429	o   SRM [FJM95] uses random timers  weighted  by  the  round  trip  time
430	    between  the  sender and each node missing the data.  This is effec-
431	    tive, but requires computing the RTT to each  receiver  before  sup-
432	    pression works properly.

434	o   NTE  [HC97]  uses  a sender-triggered mechanism based on random keys
435	    and sliding masks.  This does not require random timers,  and  works
436	    for  very large sessions, but makes it difficult to provide the con-
437	    stant low-level stream of feedback needed to perform congestion con-
438	    trol.

440	o   AAP  [Ha99]  uses  exponentially  distributed  random  timers and is
441	    effective for large sessions without needing to compute the  RTT  to
442	    each receiver.

444	o   PGM [FLST98] and LMS [PPV98] use additional mechanisms in routers to
445	    suppress duplicate NACKs.

447	The most general of these mechanisms is probably exponentially  weighted
448	random  timers.   Although  SRM  style timers can reduce feedback delay,
449	they are harder to use correctly in situations where all  the  RTTs  are
450	not  known,  or where the number of respondees is unknown.  In contrast,
451	exponentially weighted random timers work well across a large  range  of
452	session sizes with good worst case delay characteristics.

454	Either  form  of  random  timer  based  mechanism can be supplemented by
455	router-support where it is available.  Sender triggered NACK  mechanisms
456	are more difficult to integrate with router-based support mechanisms.

458	4.3.  Redundancy

460	Some RM protocols can be designed so as to not need explicit reliability
461	mechanisms except in comparatively rare cases.  An example is in a  mul-
462	ticast  game, where the position of a moving object is continuously mul-
463	ticast.  This positional stream does not require additional  reliability
464	because  a  new position superseding the old one will be sent before any
465	retransmission could take place.  However, when the moving object inter-
466	acts  with  other  objects or stops moving, then an explicit reliability
467	mechanism is required to reliably send the  interaction  information  or
468	last position.

470	It  is  not just games that can be built in this manner - the NTE shared
471	text editor[HC97] uses just such a mechanism with changes to a  line  of
472	text.   For every change the whole line is sent, and so long as the user
473	keeps typing no explicit reliability mechanism  is  needed.   The  major
474	advantage  of  redundancy  is  that  it  is not susceptible to spatially
475	uncorrelated packet loss.  With a traditional ACK or NACK  based  proto-
476	col,  the probability of any particular packet being received by all the
477	receivers in a large group can be very low.  This leads to high retrans-
478	mission rates.  In contrast, redundant streams do not suffer as the size
479	of the receiver group increases -  different  receivers  lose  different
480	packets, but this does not increase network traffic.

482	4.4.  Packet-level Forward Error Correction

484	Forward  Error Correction (FEC) is a well known technique for protecting
485	data against corruption.  For reliable multicast it is  most  useful  in
486	the form of erasure codes.

488	The simplest form of packet-level FEC is to take a group of packets that
489	is to be sent, and to XOR the packets together  to  form  a  new  packet
490	which  is  also sent.  If there were three original packets plus the XOR
491	packet sent, then if a receiver is missing any one of the original  data
492	packets,  but receives the XOR packet, then it can reproduce the missing
493	original packet.

495	More general erasure codes exist[Ri97] that allow the  generation  of  n
496	packets  from  k  original  data  packets.  In such cases, so long as at
497	least k of the n packets are received, then the  original  data  can  be
498	reproduced.

500	Using  erasure  codes to repair packet loss is a significant improvement
501	over simple retransmission because the dependency on which packets  have
502	been  lost  is  removed.  Thus, the amount of repair traffic required to
503	repair spatially uncorrelated packet loss is considerably lessened.

505	We can divide packet-level FEC schemes into two  categories:  pro-active
506	FEC and reactive FEC.  The difference between the two is that the latter
507	relies on feedback from the receivers as to how many packets  were  lost
508	by the worst case receiver, and then only that number of FEC packets are
509	sent.  These FEC packets will then also serve  to  repair  loss  at  the
510	other receivers that are missing fewer packets.

512	To  apply  reactive  FEC the sender must group data packets into rounds,
513	and the receivers report via ACKs or NACKs how many packets are  missing
514	from each round.  With NACKs, only the receiver missing the most packets
515	need send a NACK for this round, so this is used to  weight  the  random
516	timers in the NACK calculation.

518	Reactive FEC is very effective at reducing the repair traffic for packet
519	loss.  However, it requires that the data  to  be  sent  is  split  into
520	rounds, which can add to end-to-end latency.  For bulk-data applications
521	this is typically not a problem, but this may be an issue  for  interac-
522	tive applications where redundancy may be a better solution.

524	4.5.  Layered FEC

526	An  alternative  use of packet level FEC is possible when data is spread
527	across several multicast groups[RV97].  In such cases,  the  original  k
528	data  packets are used to generate n FEC packets, where n is much larger
529	than k.  The n encoded packets are then striped across  multiple  multi-
530	cast  groups,  and  repeated  transmitted.   When  a  receiver wishes to
531	receive the original data it joins one or more of the multicast  groups,
532	and  receives  the  FEC  packets.   Once it has received k different FEC
533	packets, the receiver can then leave all the multicast groups and recon-
534	struct the original data.

536	The  primary  importance  of such a layering is that it allows different
537	receivers to be able to receive the traffic at different rates according
538	to  the  available  capacity.   Such  schemes do not require any form of
539	feedback from the receivers to the sender, and therefore  the  need  for
540	reliability  does  not constrain the size of the receiver set.  However,
541	to perform adequate network congestion control using receiver joins  and
542	leaves  in this manner may require coordination between members that are
543	behind the same congested link from the sender.  As a  result,  although
544	such  schemes  do  not  require  feedback  to  the  sender, they are not
545	entirely free of feedback.

547	5.  Congestion Control Mechanisms

549	The basic delivery model of the Internet  is  best-effort  service.   No
550	guarantees  are  given as to throughput, delay or packet loss.  End-sys-
551	tems are expected to be adaptive, and to reduce their transmission  rate
552	to  a  level  appropriate  for  the  congestion  state  of  the network.
553	Although increasingly the Internet will start to support reserved  band-
554	width  and  differentiated  service classes for specialist applications,
555	unless an end-system knows explicitly that it has reserved bandwidth, it
556	must still perform congestion control.

558	Broadly speaking, there are five classes of single-sender multicast con-
559	gestion control solution:

561	o   Sender-controlled, one group.

563	    A single multicast group is used for  data  distribution.   Feedback
564	    from  the  group  members is used to control the rate of this group.
565	    The goal is to transmit at a rate dictated by the slowest  receiver.

567	o   Sender-controlled, multiple groups.

569	    One  initial  multicast group is adaptively subdivided into multiple
570	    subgroups with subdivisions centered on  congestion  points  in  the
571	    network.   Application-level  relays buffer data from a group nearer
572	    the original sender, and retransmit it at a slower rate into a group
573	    further  from the original sender.  In this way, different receivers
574	    can receiver the data at different rates.   Sender-based  congestion
575	    control  takes  place  between  the  members of a subgroup and their
576	    relay.

578	o   Receiver-controlled, one group.

580	    A single  multicast  group  is  used  for  data  distribution.   The
581	    receivers  determine  if  the sender is transmitting too rapidly for
582	    the current congestion state of the  network,  and  they  leave  the
583	    group if this is the case.

585	o   Receiver-controlled, layered organization.

587	    The  sender stripes data across multiple multicast groups simultane-
588	    ously.  Receivers join and leave these layered groups  depending  on
589	    their  measurements  of the congestion state of the network, so that
590	    the amount of data being received is always appropriate.

592	o   Router-based congestion control.

594	    It is possible to add additional mechanisms to multicast routers  to
595	    assist  in  multicast  congestion  control.   Such  mechanisms could
596	    include:

598	    o   Conditional joins (a multicast join that specifies a  loss  rate
599	        above which it is acceptable for the router to reject the join).

601	    o   Router filtering of traffic that exceeds a reasonable rate.

603	    o   Fair queuing schemes combined with end-to-end adaptation.

605	    Router-based schemes generally require more state in network routers
606	    than  has traditionally been acceptable for backbone routers.  Thus,
607	    in the near-term, such schemes are only likely to be applicable  for
608	    intranet solutions.

610	For reliable multicast protocols, it is important to consider congestion
611	control at the same time as reliability is being considered.   The  same
612	feedback  mechanisms  that are required to provide reliability will nor-
613	mally be used to provide feedback for congestion control, although often
614	the  timing constraints for congestion control will determine the timing
615	of reliability feedback, or will require additional feedback messages to
616	be sent.

618	In  the  case  of  receiver-based  congestion control, FEC is the likely
619	choice for reliability for bulk-data transfer as receiver-based  conges-
620	tion-control  operates without feedback from the receiver to the sender.

622	6.  Security Considerations

624	Generally  speaking,  security  considerations  have  relatively  little
625	effect  on  constraining  the design space for reliable multicast proto-
626	cols.  The primary issues constraining the design space are all  related
627	to  receiver-set  scaling.  For authentication of the source and of data
628	integrity, receiver-set scaling is not a  significant  issue.   However,
629	for  data encryption, key distribution and particularly re-keying may be
630	significantly affected by receiver-set scaling.  Tree  and  graph  based
631	re-keying  solutions[WHA98,WGL97]  would  appear to be appropriate solu-
632	tions to these problems.  It is not clear however  that  such  re-keying
633	solutions  need  to  directly affect the design of the data distribution
634	part of a reliable multicast protocol.

636	The primary question to consider for the security of reliable  multicast
637	protocols  is the role of third-parties.  If nodes other than the origi-
638	nal source of the data are allowed to send or resend data packets,  then
639	the  security  model  for  the protocol must take this into account.  In
640	particular, it must be clear whether such third parties are  trusted  or
641	untrusted.   A  requirement for trusted third parties can make protocols
642	difficult to deploy on the Internet.

644	Untrusted third parties (such as receivers that retransmit the data) may
645	be  used  so  long  as the data authentication mechanisms take this into
646	account.  Typically this means that the original sender digitally  signs
647	and  timestamps  the data, and that the third parties resend this signed
648	timestamped payload unmodified.

650	Unlike unicast protocols, denial-of-service attacks on multicast  trans-
651	port  state  are  easy if the protocol design does not take such attacks
652	into account.  This is because any receiver can join  the  session,  and
653	can  then  produce  feedback  that  influences the progress of a session
654	involving many other receivers.  Hence protection against denial-of-ser-
655	vice  attacks  on reliable multicast protocols must be carefully consid-
656	ered.  A receiver that requests retransmission of every packet, or  that
657	refuses  to acknowledge packets in an ACK-based protocol can potentially
658	bring a reliable multicast session to a standstill.  Senders  must  have
659	appropriate policy to deal with such conditions, and if necessary, evict
660	the receiver from the group.  A single receiver masquerading as a  large
661	number  of receivers may still be an issue under such circumstances with
662	protocols  that  support  NACK-like  functionality.   Providing   unique
663	``keys''  to  each  NACKer when they first NACK using a unicast response
664	might potentially prevent such attacks.

666	Denial-of-service attacks caused by traffic flooding are  however  some-
667	what  easier to protect against than with unicast.  Unwanted senders can
668	simply be pruned from the distribution tree using the mechanisms  imple-
669	mented in IGMP v3[CDT99].

671	7.  Conclusions

673	In this document we present an overview of the design space for reliable
674	multicast within the context of one-to-many  bulk-data  transfer.  Other
675	flavors  of  multicast  application are not considered in this document,
676	and hence the overview given should not be considered inclusive  of  the
677	design  space for protocols that fall outside the context of one-to-many
678	bulk-data transfer. During the course of this overview,  we  have  reaf-
679	firmed the notion that the process of reliable multicast protocol design
680	is affected by a number of factors that render the generation of a  "one
681	size  fits  all solution" moot. These factors are then described to show
682	how an application's needs serve to constrain the set of available tech-
683	niques  that  may  be  used  to create a reliable multicast protocol. We
684	examined a number of basic techniques and to show how well they can meet
685	the needs of certain types of applications.

687	This  document  is  intended  to  provide guidance to the IETF community
688	regarding the standardization of reliable multicast protocols for  bulk-
689	data  transfer.  Given the degree to which application requirements con-
690	strain reliable multicast solutions, and the diverse set of applications
691	that  need  to be supported, it should be clear that any standardization
692	work should take great pains to be future-proof.   This  would  seem  to
693	imply  not standardizing complete reliable multicast transport protocols
694	in one pass, but rather examining the degree to which such protocols are
695	separable  into  functional  building  blocks,  and  standardizing these
696	blocks separately to the maxmimum degree  that  makes  sense.   Such  an
697	approach allows for protocol evolution, and allows applications with new
698	constraints to be supported with maximal reuse of  existing  and  tested
699	mechanisms.

701	8.  Acknowledgments

703	This  document  represents  an overview of the reliable multicast design
704	space.  The ideas presented are not those of the authors, but  are  col-
705	lected  from  the varied presentations and discussions in the IRTF Reli-
706	able Multicast Research Group.  Although they are too numerous  to  list
707	here,  we  thank  everyone who has participated in these discussions for
708	their contributions.

710	9.  Author's Addresses

712	Mark Handley, Sally Floyd
713	ATT Center for Internet Research at ICSI,
714	International Computer Science Institute,
715	1947 Center Street, Suite 600,
716	Berkeley, CA 94704, USA
717	mjh@aciri.org, floyd@aciri.org

719	Brian Whetten
720	Talarian Corporation,
721	333 Distel Circle,
722	Los Altos, CA 94022, USA
723	whetten@talarian.com
724	Roger Kermode
725	Motorola Australian Research Centre
726	Level 3, 12 Lord St,
727	Botany  NSW  2019,
728	Australia.
729	ark008@email.mot.com

731	Lorenzo Vicisano
732	Cisco Systems,
733	170 West Tasman Dr.
734	San Jose, CA 95134, USA
735	lorenzo@cisco.com

737	10.  References

739	[BC94] K. Birman, T. Clark.  ``Performance of the Isis Distributed  Com-
740	     puting  Toolkit.''   Technical Report TR-94-1432, Dept. of Computer
741	     Science, Cornell University.

743	[CDT99] B. Cain, S. Deering, A. Thyagarajan, ``Internet Group Management
744	     Protocol,  Version 3'', Internet Draft, Work-in-progress, Feb 1999.

746	[FLST98] D. Farinacci, A. Lin, T. Speakman, and A. Tweedly, ``PGM  reli-
747	     able  transport  protocol specification,'' Internet Draft, Internet
748	     Engineering Task Force, Aug. 1998. Work in progress.

750	[FJM95] S. Floyd, V. Jacobson, S. McCanne, ``A Reliable Multicast Frame-
751	     work  for  Light-weight  Sessions  and Application Level Framing'',
752	     Proc ACM SIGCOMM 95, Aug 1995 pp. 342-356.

754	[Ha99] M. Handley,  ``Multicast  address  allocation  protocol  (AAP),''
755	     Internet  Draft, Internet Engineering Task Force, Jun 1999. Work in
756	     progress.

758	[HC97] M. Handley and J. Crowcroft, ``Network text editor (NTE) a  scal-
759	     able  shared  text  editor  for MBone,'' ACM Computer Communication
760	     Review, vol. 27, pp. 197-208,  Oct.  1997.  ACM  SIGCOMM'97,  Sept.
761	     1997.

763	[KCW98]  M. Kadansky, D. Chiu, and J. Wesley, ``Tree-based reliable mul-
764	     ticast (TRAM),'' Internet Draft, Internet Engineering  Task  Force,
765	     Nov. 1998. Work in progress.

767	[MWB+98]  T.  Montgomery, B. Whetten, M. Basavaiah, S. Paul, N. Rastogi,
768	     J. Conlan, and T. Yeh, ``THE RMTP-II  PROTOCOL,''  Internet  Draft,
769	     Internet Engineering Task Force, Apr. 1998. Work in progress

771	[PPV98]  C.  Papadopoulos, G. Parulkar, and G. Varghese, ``An error con-
772	     trol scheme for large-scale multicast applications,''  in  Proceed-
773	     ings  of  the Conference on Computer Communications (IEEE Infocom),
774	     (San Francisco, California), p. 1188, March/April 1998.

776	[Ri97] L. Rizzo, ``Effective erasure codes for reliable computer  commu-
777	     nication  protocols,'' ACM Computer Communication Review, vol.  27,
778	     pp. 24-36, Apr. 1997.

780	[RV97] L. Rizzo, L. Vicisano, ``A Reliable Multicast  data  Distribution
781	     Protocol  based  on  software FEC techniques'', Proc. of The Fourth
782	     IEEE Workshop on the Architecture and Implementation of  High  Per-
783	     formance  Communication  Systems (HPCS'97), Sani Beach, Chalkidiki,
784	     Greece June 23-25, 1997.

786	[RMWT98] K. Robertson, K. Miller, M. White, and A. Tweedly,  ``StarBurst
787	     multicast  file  transfer protocol (MFTP) specification,'' Internet
788	     Draft,  Internet  Engineering  Task  Force,  Apr.  1998.  Work   in
789	     progress.

791	[WHA98]  D.  Wallner, E. Hardler, R. Agee, ``Key Management for   Multi-
792	     cast:  Issues     and  Architectures'',  Internet     Draft,  Work-
793	     in-progress, Sept 1998.

795	[WKM94]  Brian Whetten, Simon Kaplan, and Todd Montgomery, ``A high per-
796	     formance totally ordered multicast protocol,'' research memorandum,
797	     Aug. 1994.

799	[WGL97] C.K. Wong, M. Gouda, S. Lam, ``Secure Group Communications Using
800	     Key Graphs,'' Technical Report TR  97-23,  Department  of  Computer
801	     Sciences, The University of Texas at Austin, July 1997.

803	11.  Full Copyright Notice

805	Copyright (C) The Internet Society (1998).  All Rights Reserved.

807	This document and translations of it may be copied and furnished to oth-
808	ers, and derivative works that comment on or  otherwise  explain  it  or
809	assist in its implementation may be prepared, copied, published and dis-
810	tributed, in whole or in part, without restriction of any kind, provided
811	that  the  above copyright notice and this paragraph are included on all
812	such copies and derivative works. However, this document itself may  not
813	be modified in any way, such as by removing the copyright notice or ref-
814	erences to the Internet Society or other Internet organizations,  except
815	as needed for the purpose of developing Internet standards in which case
816	the procedures for copyrights defined in the  Internet  languages  other
817	than English.

819	The  limited  permissions  granted  above  are perpetual and will not be
820	revoked by the Internet Society or its successors or assigns.

822	This document and the information contained herein is provided on an "AS
823	IS"  basis  and  THE  INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
824	FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED,  INCLUDING  BUT  NOT
825	LIMITED  TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
826	INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT-
827	NESS FOR A PARTICULAR PURPOSE."