idnits 2.17.1 draft-ietf-rmt-bb-pgmcc-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 2 characters in excess of 72. ** There are 37 instances of lines with control characters in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 640: '...eouts the sender MUST force the electi...' RFC 2119 keyword, line 667: '...N-marked packets, that ACK MUST report...' RFC 2119 keyword, line 789: '.... A PGMCC sender SHOULD compare the re...' RFC 2119 keyword, line 792: '...PGMCC sender SHOULD mark the current a...' RFC 2119 keyword, line 796: '...Also, the sender MAY include some form...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (12 July 2004) is 7228 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '2' on line 835 looks like a reference

  -- Missing reference section? '7' on line 853 looks like a reference

  -- Missing reference section? '6' on line 850 looks like a reference

  -- Missing reference section? '3' on line 839 looks like a reference

  -- Missing reference section? '1' on line 832 looks like a reference

  -- Missing reference section? '4' on line 843 looks like a reference

  -- Missing reference section? '5' on line 847 looks like a reference


     Summary: 8 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force 				  RMT WG
2	INTERNET-DRAFT					     Luigi Rizzo/U. Pisa
3	draft-ietf-rmt-bb-pgmcc-03.txt		       Gianluca Iannaccone/Intel
4							  Lorenzo Vicisano/Cisco
5								Mark Handley/UCL
6								    12 July 2004
7							   Expires: January 2005

9		    PGMCC single rate multicast congestion control:
10				 Protocol Specification

12	Status of this Document

14	This document is an Internet-Draft and is in full conformance with all
15	provisions of Section 10 of RFC2026.

17	Internet-Drafts are working documents of the Internet Engineering Task
18	Force (IETF), its areas, and its working groups.  Note that other groups
19	may also distribute working documents as Internet-Drafts.

21	Internet-Drafts are valid for a maximum of six months and may be
22	updated, replaced, or obsoleted by other documents at any time.  It is
23	inappropriate to use Internet-Drafts as reference material or to cite
24	them other than as a "work in progress".

26	The list of current Internet-Drafts can be accessed at
27	http://www.ietf.org/ietf/1id-abstracts.txt

29	To view the list Internet-Draft Shadow Directories, see
30	http://www.ietf.org/shadow.html.

32	This document is a product of the IETF RMT WG.	Comments should be
33	addressed to the authors, or the WG's mailing list at rmt@lbl.gov.

35					Abstract

37	     This document describes PGMCC, a single rate multicast
38	     congestion control scheme which is TCP-friendly and achieves
39	     scalability, stability and fast response to variations in
40	     network conditions.  PGMCC is suitable for both non-reliable
41	     and reliable data transfers.  It is mainly designed for NAK-
42	     based multicast protocols, and uses a window-based, TCP-like
43	     control loop using positive ACKs between one representative of
44	     the receiver group (the ACKER) and the sender.  The ACKER is
45	     selected dynamically and may change over time.

47	     PGMCC is made of two components: a window-based control loop,
48	     which closely mimics TCP behavior, and a fast and low-overhead
49	     procedure to select (and track changes of) the ACKER.  The
50	     scheme is robust to measurement errors, and supports fast
51	     response to changes in the receiver set and/or network
52	     conditions.

54				   Table of Contents

56	     1. Introduction. . . . . . . . . . . . . . . . . . . . . .   4
57	      1.1. Terminology. . . . . . . . . . . . . . . . . . . . .   4
58	     2. Protocol Overview . . . . . . . . . . . . . . . . . . .   4
59	      2.1. Packet Contents. . . . . . . . . . . . . . . . . . .   6
60	       2.1.1. Data Packets. . . . . . . . . . . . . . . . . . .   6
61	       2.1.2. Feedback Packets. . . . . . . . . . . . . . . . .   7
62	       2.1.3. Field sizes and formats . . . . . . . . . . . . .   8
63	      2.2. Window-based controller. . . . . . . . . . . . . . .   9
64	      2.3. Acker Selection. . . . . . . . . . . . . . . . . . .  11
65	       2.3.1. Initial Acker election. . . . . . . . . . . . . .  11
66	       2.3.2. Acker dropouts. . . . . . . . . . . . . . . . . .  12
67	      2.4. TCP Throughput Equation. . . . . . . . . . . . . . .  12
68	      2.5. RTT measurement. . . . . . . . . . . . . . . . . . .  13
69	       2.5.1. Explicit Timestamp. . . . . . . . . . . . . . . .  13
70	       2.5.2. Implicit timestamp. . . . . . . . . . . . . . . .  13
71	       2.5.3. Sequence numbers. . . . . . . . . . . . . . . . .  14
72	       2.5.4. Recommendations . . . . . . . . . . . . . . . . .  15
73	      2.6. Loss rate measurement. . . . . . . . . . . . . . . .  15
74	      2.7. Timeouts . . . . . . . . . . . . . . . . . . . . . .  16
75	      2.8. Interaction with feedback suppression
76	      schemes . . . . . . . . . . . . . . . . . . . . . . . . .  16
77	      2.9. Interaction with ECN . . . . . . . . . . . . . . . .  17
78	     3. Procedures - Sender . . . . . . . . . . . . . . . . . .  17
79	     4. Procedures -- Receiver. . . . . . . . . . . . . . . . .  18
80	     5. Security Considerations . . . . . . . . . . . . . . . .  19
81	     6. Authors' Addresses. . . . . . . . . . . . . . . . . . .  20
82	     7. Acknowledgments . . . . . . . . . . . . . . . . . . . .  20
83	     8. Full Copyright Statement. . . . . . . . . . . . . . . .  21

85	1.  Introduction

87	This document describes PGMCC, a single rate multicast congestion
88	control scheme which is TCP-friendly and achieves scalability, stability
89	and fast response to variations in network conditions.

91	PGMCC is designed for multicast sessions with one sender and one or more
92	receivers, and is a good match for transport protocols using negative
93	acknowledgements (NAKs) to collect feedback from the receivers.  The
94	congestion control scheme implemented by PGMCC closely mimics the
95	congestion-control behavior of TCP, as it uses a window-based control
96	loop which is run between the sender and a selected receiver called the
97	ACKER. The role of the ACKER is to provide timely feedback in the same
98	way as a TCP receiver; additionally, the ACKER is selected dynamically
99	among the receivers as the one which would experience the lowest
100	throughput if separate TCP sessions were run between the sender and each
101	of the receivers.

103	Scalability in PGMCC comes from the use of negative acknowledgements
104	(NAKs) for collecting feedback from receivers other than the ACKER.  As
105	a consequence, the usual techniques for NAK suppression and aggregation
106	can be used to reduce the amount of feedback to the source and improve
107	the scalability of the scheme.

109	PGMCC is designed to completely decouple congestion control from data
110	integrity. As a consequence, the scheme can work with both reliable data
111	transfer and unreliable communication protocols such as those used for
112	video or audio streaming.

114	While designed with multicast in mind, PGMCC can be equally used as a
115	replacement for TCP for unicast sessions which require a lower degree of
116	reliability than what TCP offers.

118	1.1.  Terminology

120	In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
121	"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
122	"OPTIONAL" are to be interpreted as described in RFC 2119 and indicate
123	requirement levels for compliant PGMCC implementations.

125	2.  Protocol Overview

127	PGMCC is based on two separate but complementary mechanisms:

129	  o A window-based control loop which closely emulates TCP congestion
130	    control.

132	    The window-based control loop is simply an adaptation of the TCP
133	    congestion control scheme to transport protocols where missing
134	    (because of network errors or congestion) data packets are not
135	    necessarily retransmitted, and so the congestion control scheme
136	    cannot rely on cumulative acknowledgements.  In PGMCC, the
137	    ``congestion window'' is simulated using a token-based scheme which
138	    permits congestion control to be decoupled from retransmission
139	    state. One of the receivers in the group operates as the ACKER, i.e.
140	    the node in charge of sending positive acknowledgements back to the
141	    source and thus controlling the rate of the transfer.

143	  o A procedure to select the ACKER.
144	    The purpose of this procedure is to make sure that, in presence of
145	    multiple receivers, the ACKER is dynamically selected to be the
146	    receiver which would have the lowest throughput if separate TCP
147	    sessions were run between the sender and each receiver.
148	    For the acker selection mechanism, PGMCC uses a throughput equation
149	    to determine the expected throughput for a given receiver as a
150	    function of the loss rate and round-trip time. Unlike other schemes
151	    [2], the TCP throughput equation is not used to determine the actual
152	    sending rate, which is completely controlled by the window-based
153	    control loop.

155	In principle, PGMCC's congestion control mechanism works as follows:

157	  o Receivers measure the loss rate and feed this information back to
158	    the sender, either in ACK or NAK messages.

160	  o The sender also uses these feedback messages to measure the round-
161	    trip time (RTT) to each receiver.

163	  o The loss rate and RTT are then fed into PGMCC's throughput equation,
164	    to determine the expected throughput to that receiver.

166	  o The sender then selects as the acker the receiver with the lowest
167	    expected throughput, as computed by the equation.

169	The dynamics of the acker selection mechanism are sensitive to how the
170	measurements are performed and applied. In the rest of this document we
171	suggest specific mechanisms to perform and apply these measurements.
172	Other mechanisms are possible, but it is important to understand how the
173	interactions between mechanisms affect the dynamics of PGMCC.

175	2.1.  Packet Contents

177	Before specifying the sender and receiver functionality, we describe the
178	information required by PGMCC to perform its tasks. This information is
179	carried in the data packets sent by the sender, and in the feedback
180	packets sent by the receiver.  As PGMCC will be used along with some
181	transport protocol, the actual data and feedback packets will contain
182	further information for use by the protocol itself. For this reason, we
183	do not specify packet formats, as these depend on the details of the
184	transport protocol used.

186	Note that the requirements of the transport protocol in terms of packet
187	generation may differ from those of PGMCC. As an example, most NAK-based
188	reliable multicast protocols do not use positive acknowledgements, but
189	PGMCC requires ACKs for clocking out data packets; unreliable transport
190	protocols might have no interest in generating NAKs for data integrity
191	purposes, yet PGMCC depends on NAKs reaching the data sender in order to
192	elect the ACKER.

194	Implementors may decide to insert PGMCC-related information in already
195	existing protocol packets whenever possible, but in cases such as the
196	ones described in the previous paragraph, it might be necessary to
197	define and generate new packets exclusively for congestion control
198	purposes. As an example, in a prototype implementation of PGMCC on top
199	of the PGM protocol [7], some of the information used by PGMCC is
200	already present in the original protocol packets, and PGMCC-specific
201	information is carried as PGM options in ODATA and NAK packets. However,
202	a new packet type has been defined for ACKs, which are generated
203	according to the rules defined in this document.

205	2.1.1.	Data Packets

207	Each data packet sent by the data sender contains the following
208	information:

210	  o A SEQUENCE NUMBER. This number is incremented by one for each data
211	    packet transmitted.  The field must be sufficiently large that it
212	    does not wrap causing two different packets with the same sequence
213	    number to be in the receiver's recent packet history at the same
214	    time.

216	  o A TIMESTAMP (or equivalent information, see Section 2.5) indicating
217	    when the packet with this sequence number has been sent.  There is
218	    no requirement for synchronized clocks between the sender and the
219	    receivers.	The timestamp is used to measure network round-trip
220	    times, so needs sufficient resolution for this task.  A resolution
221	    of 1ms would be adequate.

223	  o The ACKER IDENTITY, i.e. the identity of the receiver in charge of
224	    sending an acknowledgement for this data packet.  The ACKER is
225	    elected as a result of the process described in Section 2.3.
226	    A special value is used to indicate that no ACKER is designated for
227	    this packet -- this can happen at the beginning of a session or when
228	    the current ACKER leaves the group.  Receivers interpret this value
229	    as a request to elect a new acker.

231	2.1.2.	Feedback Packets

233	There are two types of feedback packets used by PGMCC: ACK packets and
234	NAK packets.
235	ACK packets are generated by the current ACKER, and are used to detect
236	loss or successful delivery of packets, and to regulate the throughput
237	accordingly. ACK packets also contain information used to determine the
238	TCP-equivalent throughput for the ACKER.
239	NAK packets are sent by any receiver who experiences loss.  They contain
240	information used to determine the TCP-equivalent throughput for that
241	receiver. In an actual protocol instantiation (such as PGM [7]), NAK
242	packets might also be used by the protocol to request the retransmission
243	of specific packets, and indicate the identity of the packet being
244	requested.

246	Both ACK and NAK packets are sent by data receivers, and contain the
247	following information:

249	  o The TIMESTAMP (or equivalent information) derived from the most
250	    recently received data packet according to one of the techniques
251	    described in Section 2.5.
252	    This value is used by the sender to measure the RTT to the receiver
253	    who generated this feedback packet.

255	  o ``p'', the receiver's current estimate of the LOSS RATE.  The loss
256	    rate is measured by receivers as described in Section 2.6

258	In addition to the above, ACK packets (sent by the acker designated in
259	the corresponding data packets) must also contain the following
260	information:

262	  o RX_MAX, the highest sequence number among received data packets
263	    (taking care to deal with sequence number wrapping correctly).

265	  o ACK_BITMAP, a bitmap indicating the receive status of the latest N
266	    (typically N=32) data packets with sequence numbers RX_MAX-(N-1) to
267	    RX_MAX.

269	This information is used by the sender to record which packets have been
270	received or lost, and manipulate the transmit window accordingly.  Note
271	that each ACK packet contains information about multiple packets, and
272	this increases the robustness of the scheme to loss of ACK packets.
273	This is necessary because ACKs are not sent reliably (unlike TCP's ACKs,
274	which are cumulative).

276	2.1.3.	Field sizes and formats

278	The following sizes and formats are suggested for the various fields
279	used by PGMCC and transmitted over the network:

281	  o SEQUENCE NUMBERS
282	    32 bit, unsigned, network order.

284	  o TIMESTAMPS
285	    32 bit, unsigned, network order.  A resolution of 1ms or better is
286	    desirable.

288	  o ACKER IDENTITY
289	    Same size and format of a network layer address (e.g. 32 bit for
290	    IPv4).  Note though that using an IP address for the Acker Identify
291	    will cause problems with NAT traversal.  Transport protocol
292	    designers might examine the SSRC mechanism used by RTP [6] as an
293	    alternative form of node identifier that could be used as Acker
294	    Identity.

296	  o LOSS RATE (``p'')
297	    16-bit unsigned integer, in network format, with 0 indicating no
298	    loss and 2^16-1 indicating 100% loss.

300	  o ACK BITMAP
301	    32-bit, in network format, with least significant bit indicating
302	    receive status of packet RX_MAX.

304	2.2.  Window-based controller

306	In a window-based congestion control scheme such as TCP, the
307	``congestion window'' represents, among other things, the maximum amount
308	of packets in flight at any time, which in turn controls the throughput
309	of the session. The sender keeps track of the actual number of packets
310	in flight, basing on its transmissions and the reception of
311	acknowledgements.

313	The sender may dynamically change the size of the window, according to
314	the congestion control scheme being used. In TCP, and PGMCC, an
315	``Additive Increase Multiplicative Decrease'' (AIMD) scheme is used: in
316	absence of loss, the window is increased by some fixed amount (typically
317	one packet) per round trip time (RTT), whereas upon loss the window is
318	reduced to a fraction of its original value (typically halved) in each
319	RTT in which a loss event is experienced.

321	In PGMCC the window is managed using a token-based mechanism, controlled
322	by two variables:

324	  o A ``Window Size'', W, which describes the current window size in
325	    packets.

327	  o A ``Token Count'', T, which indicates the number of packets that can
328	    be transmitted.  T is bounded above by W.  It is decremented every
329	    time a packet is transmitted, and incremented every time an ACK is
330	    received, according to the rules below.

332	Note that these two variables need to hold non-integer data.  Typically
333	a fixed point representation with at least 16 bits for both integer and
334	fractional parts would be acceptable for implementation purposes.

336	The information contained in each ACK is used to determine how many new
337	packets are acknowledged by that ACK, and whether there are
338	unacknowledged packets which were not reported in previous ACKs.  The
339	sender also schedules a timeout to react in case no ACKs are received.

341	The sender behaves as follows:

343	  o INITIALIZATION
344	    At startup, or after a timeout, both W and T are set to 1.

346	  o ACK RECEPTION, NO LOSS DETECTED
347	    If the incoming ACK reports new acknowledged packets, and no loss
348	    (as defined in the next paragraph) is detected, then the window is
349	    inflated by one packet per RTT.

351	    NOTE: during the slow-start phase, TCP opens the window
352	    exponentially up to the SSTHRESH value, which is computed by TCP
353	    according to the dynamics of the session and updated upon losses.

355	    We do recommend that PGMCC uses a similar strategy, but using a
356	    fixed, small value for SSTHRESH (e.g. 4 packets).  In fact, due to
357	    the dynamicity of the ACKER, which might change on every single
358	    packet, it is hard to compute a reliable estimate of the SSTHRESH
359	    without keeping state for multiple receivers, and the benefits are
360	    small in any event.

362	    In summary, the reaction to ACK reception on no loss modifies T and
363	    W as follows (here, N is the number of new packets acknowledged by
364	    the incoming ACK):

366	      if (W < SSTHRESH) then
367		  D = min(N, SSTHRESH - W) // use the first D acks for
368	    exp.opening
369		  N = N - D		       // and the remaining ones for
370	    linear opening
371		  T = T + 2*D
372		  W = W + D
373	      endif
374	      // do linear window opening with the remaining acks
375	      T = T + N * ( 1 + 1/W )
376	      W = W + N/W

378	  o PACKET TRANSMISSION
379	    One token is consumed for each packet transmitted:

381					 T = T - 1

383	  o ACK RECEPTION, LOSS DETECTED
384	    If the incoming ACK reports an unacknowledged data packet which is
385	    followed by at least 3 acknowledged data packets, then the packet is
386	    assumed to be lost and PGMCC reacts by halving the window, in the
387	    same way as TCP after 3 duplicate acknowledgements.  This is
388	    achieved by modifying T and W as follows:

390				   T = T - W/2 , W = W/2

392	    to simulate the multiplicative decrease.
393	    Additionally, all window manipulation is suspended for the
394	    subsequent RTT.  This is achieved by recording the current transmit
395	    sequence number, and canceling any further manipulation of the
396	    window until feedback is received for the next transmitted packet,
397	    or until a timeout occurs.

399	2.3.  Acker Selection

401	The ACKER selection process in PGMCC aims at locating the receiver which
402	would have the lowest throughput if each receiver were using a separate
403	TCP connection to transfer data.

405	Because the steady-state throughput of a TCP connection can be
406	characterized in a reasonably accurate way in terms of its loss rate and
407	round trip time [3], the throughput for each receiver can be estimated
408	by using these two parameters.

410	Whenever an ACK or NAK packet from any of the receivers reaches it, the
411	sender is able to compute the expected throughput T_i for that receiver
412	by using the equation shown in Section 2.4, with the round trip time RTT
413	and loss rate p and measured as described in Sections 2.5 and 2.6,
414	respectively.  At any given time, the sender stores the expected
415	throughput for the current ACKER, T_acker. This value is updated every
416	time an ACK or NAK from the current ACKER is received (note that, after
417	a new ACKER is selected, the sender will typically receive ACKs from the
418	old ACKER for one RTT, and the feedback from different ACKERs might be
419	interleaved if the paths leading to them have different round trip
420	times).

422	Whenever an ACK or NAK is received from another node i (a previous ACKER
423	or some other receiver), the expected throughput T_i for that node is
424	computed, and compared with T_acker.  Node i is selected as the new
425	acker if

427				   T_i < C * T_acker

429	where the constant C between 0 and 1 provides some hysteresis and avoids
430	too frequent oscillations in the choice of the ACKER.  A suggested value
431	for C is 0.75.

433	Note that, from an implementation point of view (see Section 2.4), it is
434	more convenient to compute T_i ^(-2), so the above equation must be
435	modified accordingly.

437	2.3.1.	Initial Acker election

439	Upon reception of a data packet reporting that no acker is currently
440	selected, receivers generate a dummy NAK report which is used to elect
441	the initial acker. The NAK is sent with the usual feedback suppression
442	mechanism dictated by the transport protocol (possibly with shorter time
443	constants) to avoid feedback implosion, and the sender will select the
444	source of the first incoming NAK as the new ACKER.

446	2.3.2.	Acker dropouts

448	If the ACKER decides to disconnect from the session, it can cause the
449	session to stop. To avoid this, it is recommended that an ACKER deciding
450	to leave the session informs the sender by sending an ACK packet (or a
451	duplicate) carrying an "ACKER_LEAVING" option.	The reception of this
452	packet by the sender will in turn trigger an initial acker election
453	phase.

455	2.4.  TCP Throughput Equation

457	Any realistic equation of TCP throughput as a function of loss event
458	rate and RTT should be suitable for use in PGMCC.  Unlike other schemes
459	[2] where the throughput equation directly controls the transmit rate,
460	in PGMCC the equation is used only for acker selection purposes, and the
461	throughput values are only compared among themselves.  As a consequence,
462	we can use the following equation, derived from the one presented in [3]
463	by setting RTO = 4 * RTT (as it is common practice):

465		 M = 1/T = RTT_i * sqrt(p) * (1 + 9*p * (1 + 32*(p)^2))

467	where

469	    M = 1/T is proportional to the inverse of the throughput for the
470	    receiver under consideration;

472	    RTT is the round trip time for the receiver under consideration;

474	    p is the loss rate for the receiver under consideration, between 0
475	    and 1.0;

477	and multiplying constants are omitted.

479	The above equation is accurate on a wide range of loss rates, and also
480	covers situations where retransmission timeouts have a significant
481	impact on the throughput of the protocol.

483	Note that when p=0, the equation yields 1/T = M = 0. This does not
484	constitute a problem as we can still compare the M values computed for
485	different receivers to determine the acker.  Also note that it is easier
486	to compute M^2 instead of M, because the former does not require the use
487	of sqrt().

489	In future, different throughput equations may be substituted for this
490	equation.  The requirement is that the throughput equation be a
491	reasonable approximation of the sending rate of TCP for conformant TCP
492	congestion control.

494	The parameters p and RTT need to be measured or calculated by a PGMCC
495	implementation.  The measurement of RTT is specified in Section 2.5; the
496	measurement of p is specified in Section 2.6.

498	2.5.  RTT measurement

500	In PGMCC, the RTT is measured by the sender making use of the timestamp
501	(or equivalent information) echoed back by each receiver in feedback
502	messages. Three procedures are possible to measure the RTT, as follows.
503	In no case is it required to have clock synchronization between sender
504	and receivers.

506	2.5.1.	Explicit Timestamp

508	This first technique relies on the transmission of a timestamp TS_j with
509	each data packet j.
510	The receiver will record the most recently received timestamp, and will
511	echo it back to the source when generating an ACK or a NAK. If the
512	feedback is delayed, the time elapsed between the reception of the
513	timestamp and the generation of the feedback should be added to the
514	echoed timestamp.
515	The sender computes the RTT by subtracting the received timestamp from
516	the current value of the clock.

518	The resolution of the timestamp value should be good enough for
519	reasonable precision measurement of typical network round trip times. If
520	receivers need to apply correction for delayed feedback, it is necessary
521	that receivers know the resolution of the timestamp clock. A suggested
522	value is 1ms.

524	2.5.2.	Implicit timestamp

526	With this technique, the sender will record a timestamp TS_j for each
527	transmitted data packet j, but the timestamp will not be transmitted
528	with the packet itself.
529	The receiver will record the most recently received sequence number, and
530	will echo it back to the source when generating an ACK or a NAK.
531	The sender computes the RTT by looking up the timestamp associated with
532	the sequence number received in the feedback packet, and subtracting it
533	from the current clock value.

535	If the feedback from the receiver is delayed, as it is commonly the case
536	for NAKs, the receiver can compute, and send back to the source, a
537	correction term corresponding to the time elapsed between the reception
538	of the timestamp and the generation of the feedback. The correction term
539	will then be subtracted by the sender in order to obtain the correct
540	estimate of the RTT.

542	This RTT measurement technique is equivalent to the previous one, but it
543	saves some space in data packets as the timestamp does not need to be
544	sent explicitly. Feedback packets might become larger if the correction
545	value is transmitted explicitly; but in many cases, the sequence number
546	will already be present for other reasons (e.g. ACK packets), and
547	wherever space is a concern the sequence number and the correction term
548	can be packed in a single 32-bit word without loss of precision.

550	2.5.3.	Sequence numbers

552	This technique is the least precise, but it does not rely on the
553	presence of a high resolution clock on the nodes.
554	The sender will not compute any timestamp, and just send data packets
555	with their sequence number j.
556	The receiver will record the most recently received sequence number, and
557	will echo it back to the source when generating an ACK or a NAK.
558	The sender computes the RTT as the difference between the most recently
559	sent sequence number and the sequence number received from the ACK or
560	NAK packet.

562	Note that in this case the RTT is not measured in seconds, but in
563	"sequence numbers", which are monotonically, but not uniformly,
564	increasing with time. The two measurements are equivalent if the sender
565	transmits at a constant rate. When the data rate changes over time (as
566	it is normally happens, given that PGMCC controls the actual data rate),
567	then the "measured" RTT values grow with the actual transmit rate. This
568	can influence the correctness of the results when comparing two
569	measurement done over different and only partially overlapping time (and
570	sequence number) intervals where the transmit rate incurs a significant
571	change.

573	2.5.4.	Recommendations

575	Whenever possible, the measurement of the RTT should be carried out
576	using either explicit or implicit timestamps, and by keeping track of
577	the "correction term" (the delay between data reception and feedback
578	generation).

580	If the receiver does not have a clock with a suitable resolution, the
581	correction term might not be present (or be inaccurate). In this case
582	the timestamps received by the sender on NAK packets might be in error,
583	in the worst case, by as much as the packet interarrival time.	This
584	error will normally not be present on ACK packets, which are sent
585	immediately.  A suitable correction should be applied by the sender in
586	order to avoid systematic errors.

588	The measurement based on sequence numbers is less accurate, but also
589	less sensitive to errors due to the lack of the correction term. In
590	fact, the measurement error induced by the lack of the correction term
591	can be at most one unit.  This suggests that, when the correction term
592	is not available, measurements based on sequence numbers should be
593	favoured.  Simulations have shown that the acker selection mechanism
594	performs moderately better when the RTT measurement is based on
595	timestamps, but performance is reasonably good also with measurements
596	based on sequence numbers.

598	2.6.  Loss rate measurement

600	The loss measurement in PGMCC is entirely performed by receivers.  The
601	measurement results do not directly influence the transmit rate, but are
602	only used for comparison purposes. As a consequence, the scheme is
603	reasonably robust to different measurement techniques, as long as they
604	are not influenced too strongly by single loss events.

606	The main method suggested for loss measurement is Exponentially Weighted
607	Moving Average (EWMA), which is formally equivalent to a single-pole
608	digital low pass filter applied to a binary signal x_i, where x_i = 1 if
609	packet i is lost, x_i = 0 if packet i is successfully received.

611	The loss rate p_i upon reception or detection of loss of packet i is
612	computed as

614			 p_i = c_p * p_{i-1} + (1 - c_p ) * p_i
615	 where the constant c_p between 0 and 1 is related to the bandpass of
616	the filter. Experiments have shown good performance with c = 500/65536,
617	and computations performed with fixed point arithmetic and 16 fractional
618	digits.

620	As an alternative to EWMA, the technique used in TFRC [2] can be
621	adopted. Simulations have shown a moderate improvement in the acker
622	selection mechanism by measuring loss using the TFRC loss estimator,
623	which is however slightly more expensive to compute than the EWMA loss
624	estimator in presence of packet reordering.

626	2.7.  Timeouts

628	When a packet is transmitted, the sender schedules a timeout to prevent
629	stalls upon loss of ACKs or disconnection of the ACKER.  In TCP, which
630	has a similar problem, the timeout value is computed by accumulating
631	statistics (SRTT and RTTVAR) on RTT samples, starting from a default
632	initial value (3s) when no RTT samples are available.

634	PGMCC can use a similar scheme to compute the timeouts, remembering that
635	upon ACKER changes (which may be very frequent), the computation of SRTT
636	and RTTVAR must be restarted from the beginning, unless the sender
637	decides to keep state for at least a small number of recent ackers.

639	Because the ACKER can leave the group without notifying the sender,
640	after a number of successive timeouts the sender MUST force the election
641	of a new ACKER.  We recommend this new election to be performed after
642	two successive timeouts.

644	2.8.  Interaction with feedback suppression schemes

646	Several schemes are used by NAK-based multicast protocols to reduce the
647	amount of feedback directed toward the source and make the protocol
648	scale with large populations of receivers.  Such schemes typically rely
649	on randomly delaying NAK generation, and suppressing pending NAKs when
650	an equivalent NAK or a retransmission is heard; or, intermediate nodes
651	such as routers can implement some form of feedback aggregation and
652	filtering.

654	Such schemes might prevent NAKs from potential ACKER candidates from
655	reaching the source. This filtering might impact the speed at which
656	PGMCC selects the correct ACKER, though initial experience from
657	simulations seem to suggest that PGMCC behavior is not severely affected
658	by NAK suppression schemes.

660	2.9.  Interaction with ECN

662	PGMCC can use ECN notifications in much the same way as actual losses,
663	and use such notifications to control the throughput of the session.

665	At the receiver, ECN-marked data packets can be considered as lost
666	packets for the purpose of loss rate computation and ACK/NAK generation.
667	If the ACKER sends an ACK for ECN-marked packets, that ACK MUST report
668	that the packet being acknowledged that was ECN marked.  Similarly the
669	ACKER must indicate in the ACK packet's received packets bitmap that the
670	packet was ECN-marked, or that the packet was lost.

672	We note that to support use of the ECN nonce, the ACK packet's received
673	packets bitmap would require two bits per packet being reported.

675	3.  Procedures - Sender

677	The following pseudo-code specifies the complete behavior of the sender
678	in PGMCC.

680	initialization:
681	  T = 1 ; W = 1 ;   /* initialize window and number of tokens */
682	  RETRY = 0 ;	    /* number of consecutive timeouts so far  */
683	  < initialize p, RTT for acker to default values >
684	  ACKER = NO_ACKER; /* no acker is known		      */
685	  < initialize sequence numbers >
686	  QUEUED = 0;	    /* packets waiting to be transmitted      */

688	on transmission request:
689	  send_packet() ;

691	on timeout expiration :
692	  T = 1 ; W = 1 ;   /* initialize window and number of tokens */
693	  if (RETRY < RETRY_MAX)
694	    RETRY = RETRY + 1
695	  else
696	    ACKER = NO_ACKER ; /* old acker is not valid anymore      */
697	  send_packet() ;

699	on ACK/NAK reception from receiver I :
700	  < compute p and RTT for source of this ACK, see Sec. 2.5 and 2.6 >
701	  RETRY = 0 ;
702	  if (ACKER == NO_ACKER) { /* select current as acker is no other known */
703	    ACKER = I ;
704	    T = T + 1 ;
705	  }
706	  if (ACKER != I)
707	    < select acker according to Sec. 2.3 > ;
708	  else {
709	    
710	    if (packet_type == ACK) {
711	      < update_window according to Sec.2.2 >
712	      send_packet ;
713	      if (ack_pending)
714		update_timeout ;
715	  }
716	}

718	send_packet() {
719	  if (QUEUED > 0 && T >= 1) {
720	    < transmit one packet >
721	    T = T - 1 ;
722	    QUEUED = QUEUED - 1 ;
723	  }
724	  if (  )
725	    
726	}

728	4.  Procedures -- Receiver

730	The following pseudo-code specifies the complete behavior of the
731	receiver in PGMCC.

733	A receiver only transmits an ACK packet when it receives a data packet
734	for which the receiver is designated as the ACKER by the data packet
735	itself.  A receiver can transmit a NAK packet after it has detected that
736	a data packet is missing and a suitable delay has passed, as dictated by
737	the feedback suppression rules of the protocol in use.

739	The data packet contains acknowledgement status about the most recent 32
740	sequence numbers known to the receiver.

742	on initialization/session setup:
743	  < initialize state variables and ACK bitmap >

745	on DATA packet reception:
746	  < update p measurement according to Sec.2.6 >
747	  < record timestamp and packet reception time >
748	  if (ACKER == this_node) {
749	    < send an immediate ACK >
750	  }
751	  if (  )
752	    < schedule a timeout for NAK transmission >

754	on NAK reception:
755	  < suppress any pending NAK transmission for the sequence
756	    number indicated in the NAK >

758	on timeout:
759	  if ( < there are missing and unacknowledged packets > ) {
760	    < send a NAK for one or more of the missing packets >
761	    < mark such packets as acknowledged >
762	    if (  )
763	      < schedule a timeout for NAK transmission >
764	  }

766	5.  Security Considerations

768	PGMCC is not a transport protocol in its own right, but a congestion
769	control mechanism that is intended to be used in conjunction with a
770	transport protocol.  Therefore security primarily needs to be considered
771	in the context of a specific transport protocol and its authentication
772	mechanisms.

774	Congestion control mechanisms can potentially be exploited to create
775	denial of service.  This may occur through spoofed feedback.  Thus any
776	transport protocol that uses PGMCC should take care to ensure that
777	feedback is only accepted from the receiver of the data.  The precise
778	mechanism to achieve this will however depend on the transport protocol
779	itself.

781	In addition, congestion control mechanisms may potentially be
782	manipulated by a greedy receiver that wishes to receive more than its
783	fair share of network bandwidth.  A receiver might do this by first
784	reporting inflated loss and RTT samples, in order to get selected as the
785	ACKER, and then generating ACK at the desired rate (including possibly
786	claiming to have received packets that in fact were lost due to
787	congestion).  Possible defenses against such a receiver could be based
788	on the sender verifying the correctness of the loss and RTT samples
789	supplied by the receiver. A PGMCC sender SHOULD compare the receiver
790	reports on loss rate and RTT with the information derived directly from
791	the incoming stream of ACKs.  In case of discrepancy of the reports, a
792	PGMCC sender SHOULD mark the current acker as ineligible and initiate a
793	new acker election. The decision on how large that discrepancy should be
794	before initiating a new acker election is left to the implementation.

796	Also, the sender MAY include some form of nonce that the receiver must
797	feed back to the sender to prove receipt. However, the details of such a
798	nonce would depend on the transport protocol, and in particular on
799	whether the transport protocol is reliable or unreliable.

801	6.  Authors' Addresses

803	   Luigi Rizzo
804	   luigi@iet.unipi.it
805	   Dip. Ing. dell'Informazione,
806	   Univ. di Pisa
807	   via Diotisalvi 2, 56122 Pisa, Italy

809	   Gianluca Iannaccone
810	   gianluca.iannaccone@intel.com
811	   Intel Research
812	   15 JJ Thomson Avenue, Cambridge CB3 0FD, UK

814	   Lorenzo Vicisano
815	   lorenzo@cisco.com
816	   cisco Systems, Inc.
817	   170 West Tasman Dr.,
818	   San Jose, CA, USA, 95134

820	   Mark Handley
821	   m.handley@cs.ucl.ac.uk
822	   University College London,
823	   Gower Street, London WC1E 6BT, UK

825	7.  Acknowledgments

827	We would like to acknowledge feedback and discussions on equation-based
828	congestion control with a wide range of people, including members of the
829	Reliable Multicast Research Group, the Reliable Multicast Transport
830	Working Group, and the End-to-End Research Group.

832	[1] Bradner, S., Key words for use in RFCs to Indicate Requirement
833	Levels (IETF RFC 2119) http://www.rfc-editor.org/rfc/rfc2119.txt

835	[2] Floyd, S., Handley, M., Padhye, J., Widmer, J., "Equation-Based
836	Congestion Control for Unicast Applications", ACM SIGCOMM 2000,
837	Stockholm, Aug. 2000

839	[3] Padhye, J. and  Firoiu, V. and Towsley, D. and Kurose, J., "Modeling
840	TCP Throughput: A Simple Model and its Empirical Validation", Proc ACM
841	SIGCOMM 1998.

843	[4] Mankin, A., Romanow, A., Brander, S., Paxson, V., "IETF Criteria for
844	Evaluating Reliable Multicast Transport and Application Protocols,"
845	RFC2357, June 1998.

847	[5] Rizzo, L., "pgmcc: a TCP-friendly single-rate multicast congestion
848	control scheme", ACM SIGCOMM 2000, Stockholm, Aug.2000

850	[6] Schulzrinne, H., Casner, S., Frederick, R., Jacobson, V., "RTP: A
851	Transport Protocol for Real-Time Applications", RFC 1889, Jan 1996.

853	[7] Speakman, T., Crowcroft, J., Gemmell, J., Farinacci, D. , Lin, S.,
854	Leshchiner, D., Luby, M., Montgomery, T. , Rizzo, L., Tweedly, A.,
855	Bhaskar, N., Edmonstone, R., Sumanasekera, R., Vicisano, L., PGM
856	Reliable Transport Protocol Specification, RFC 3208, December 2001.
857	rfc3208.txt also available at ftp://ftp.rfc-editor.org/in-
858	notes/rfc3208.txt

860	8.  Full Copyright Statement

862	Copyright (C) The Internet Society (2000).  All Rights Reserved.

864	This document and translations of it may be copied and furnished to
865	others, and derivative works that comment on or otherwise explain it or
866	assist in its implementation may be prepared, copied, published and
867	distributed, in whole or in part, without restriction of any kind,
868	provided that the above copyright notice and this paragraph are included
869	on all such copies and derivative works. However, this document itself
870	may not be modified in any way, such as by removing the copyright notice
871	or references to the Internet Society or other Internet organizations,
872	except as needed for the purpose of developing Internet standards in
873	which case the procedures for copyrights defined in the Internet
874	languages other than English.

876	The limited permissions granted above are perpetual and will not be
877	revoked by the Internet Society or its successors or assigns.

879	This document and the information contained herein is provided on an "AS
880	IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
881	FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
882	LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
883	INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
884	FITNESS FOR A PARTICULAR PURPOSE."