idnits 2.17.1 

draft-ietf-rmt-track-arch-00.txt:
  ** The Abstract section seems to be numbered

-(322): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(978): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(980): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == There are 3 instances of lines with non-ascii characters in the document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 1336 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 324 instances of too long lines in the document, the longest
     one being 6 characters in excess of 72.

  ** There are 25 instances of lines with control characters in the document.

  ** The abstract seems to contain references ([2], [3]), which it shouldn't.
      Please replace those with straight textual mentions of the documents in
     question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 158 has weird spacing: '...to-many   and...'

  == Line 563 has weird spacing: '...mission  and...'

  == Line 757 has weird spacing: '...cket is   cons...'

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The reliability semantics TRACK provides are defined by the binding
     between a receiver and its repair head.  When this binding is
     established, the repair head agrees to provide retransmission of missed
     packets for the receiver starting from a specific (receiver requested)
     sequence number.  At this time, the repair head MUST not have discarded
     any data packet starting from this sequence number.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     Subsequently, a repair head needs to discard older packets from its
     buffer from time to time. The following two factors influence when to
     discard an old packet: a) Stability - When all receivers immediately
     subordinate to the repair head have acknowledged receipt of a packet,
     that packet is   considered stable.  When the whole sub-tree of receivers
     below a repair head have received a packet, it is considered as "strictly
     stable".  TRACK provides no explicit support for this strict sense of
     stability (note this form of reliability is also referred to as
     "pessimistic reliability"). b) Sender recovery window - Each data packet
     carries two sequence numbers: one is the sequence number of the current
     data packet, and the other is the sender recommended sequence number
     where recovery should start from (smaller than the current sequence
     number). This pair of sequence numbers forms a sender-suggested recovery
     window. A repair head MUST not discard any packet before it becomes
     stable. Per binding agreement or session wide configuration, a repair
     head MAY be allowed to discard a packet when it moves outside of the
     sender recovery window.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 14, 2000) is 8680 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '1' on line 20 looks like a reference

  -- Missing reference section? '2' on line 858 looks like a reference

  -- Missing reference section? '3' on line 45 looks like a reference

  -- Missing reference section? '4' on line 71 looks like a reference

  -- Missing reference section? '16' on line 531 looks like a reference

  -- Missing reference section? '5' on line 633 looks like a reference

  -- Missing reference section? '6' on line 639 looks like a reference

  -- Missing reference section? '7' on line 639 looks like a reference

  -- Missing reference section? '8' on line 639 looks like a reference

  -- Missing reference section? '9' on line 885 looks like a reference

  -- Missing reference section? '10' on line 1006 looks like a reference

  -- Missing reference section? '11' on line 1017 looks like a reference

  -- Missing reference section? '13' on line 1033 looks like a reference

  -- Missing reference section? '12' on line 1141 looks like a reference


     Summary: 10 errors (**), 0 flaws (~~), 9 warnings (==), 16 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Reliable Multicast Transport (RMT) WG      B. Whetten
2	Internet Draft                             Talarian
3	Document: draft-ietf-rmt-track-arch-00.txt D.Chiu
4						   Sun Microsystems
5						   S.Paul
6						   Edgix
7						   Miriam Kadansky
8						   Sun Microsystems
9						   Gursel Taskale
10						   Talarian
11						   July 14, 2000

13			      TRACK ARCHITECTURE
14		A SCALEABLE REAL-TIME RELIABLE MULTICAST PROTOCOL

16	Status of this Memo

18	This document is an Internet-Draft and is in full conformance with all
19	provisions of Section 10 of RFC2026 [1].

21	Internet-Drafts are working documents of the Internet Engineering Task
22	Force (IETF), its areas, and its working groups. Note that other groups may
23	also distribute working documents as Internet-Drafts. Internet-Drafts are
24	draft documents valid for a maximum of six months and may be updated,
25	replaced, or become obsolete by other documents at any time. It is
26	inappropriate to use Internet- Drafts as reference material or to cite them
27	other than as "work in progress."

29	The list of current Internet-Drafts can be accessed at
30	http://www.ietf.org/ietf/1id-abstracts.txt
31	The list of Internet-Draft Shadow Directories can be accessed at
32	http://www.ietf.org/shadow.html.

34	1. Abstract

36	One of the protocol instantiations the RMT WG is chartered to create is a
37	TRee-based ACKnowledgement protocol (TRACK).  Rather than create a set of
38	monolithic protocol specifications, the RMT WG has chosen to break the
39	reliable multicast protocols in to Building Blocks (BB) and Protocol
40	Instantiations (PI).  A Building Block is a specification of the algorithms
41	of a single component, with an abstract interface to other BBs and PIs.  A
42	PI combines a set of BBs, adds in the additional required functionality not
43	specified in any BB, and specifies the specific instantiation of the
44	protocol. For more information, see the Reliable Multicast Transport
45	Building Blocks and Reliable Multicast Design Space documents [2][3].

47	The TRACK protocol instantiation (TRACK for short) is designed to reliably
48	and efficiently send data from a single sender to large groups of
49	simultaneous recipients in real time.  The term real-time is understood in
50	the industry as minimal latency including network propagation and
51	processing delays.  TRACK PI provides functions similar to the NACK PI, and
52	adds support for a tree-based hierarchy (in its simplest form may consist
53	of only the sender as the Repair Head) of Repair Heads (RH), which
54	increases scalability by providing aggregation of control traffic and local
55	retransmission of lost packets.  In addition to using negative
56	acknowledgements (NACKs) and forward error correction (FEC) for efficient
57	reporting and retransmission of lost packets, it also provides tree-based
58	ACKnowledgements (ACKs).  ACKs provide the Sender with confirmation of
59	delivery of data packets to the Receivers.  Like the NACK PI, it may also
60	take advantage of Generic Router Assist where available.

62	This document proposes a design rationale for the TRACK PI, an architecture
63	for TRACK, and a set of functional requirements TRACK has of other Building
64	Blocks.  This document is not a protocol instantiation specification.

66	2. Conventions Used in this Document

68	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
69	"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
70	document are to be interpreted as described in RFC-2119 [4].

72	3. Design Rationale and Protocol Requirements

74	This section discusses many of the requirements imposed on the design of
75	the TRACK PI, as well as a design rationale which guides the aspects where
76	there is flexibility in selecting from different potential design
77	decisions.

79	3.1 Private and Public Networks

81	TRACK is designed to work in private networks, controlled networks and in
82	the public Internet.  A controlled network typically has a single
83	administrative domain, has more homogenous network bandwidth, and is more
84	easily managed and controlled.  These networks have the fewest barriers to
85	IP multicast deployment and the most immediate need for reliable multicast
86	services.  Deployment in the Internet requires a protocol to span multiple
87	administrative domains, over vastly heterogeneous networks.  The IETF is
88	specifically chartered with producing standards for the Internet, so this
89	must be the primary target network type.  However, robust transport
90	protocols are grown, not created, and most of the short term deployment
91	experience will likely come from controlled networks.  Therefore, TRACK is
92	designed to support both.

94	3.2 Manual vs. Automatic Controls

96	Some networks can take advantage of manual or centralized tools for
97	configuring and controlling the usage of a reliable multicast group.  In
98	the public Internet the tools have to span multiple AS's where policies are
99	inconsistent.  Hence, it is preferable to design tools that are fully
100	distributed and automatic.    To address these requirements, TRACK supports
101	both manual and automatic algorithms for monitoring, management, and
102	configuration.

104	3.3 Heterogeneous Networks

106	While the majority of controlled networks are symmetrical and support many-
107	to-many multicast, in designing a protocol for the Internet, we must deal
108	with virtually all major network types.  These include asymmetrical
109	networks, satellite networks, networks where only a single node may send to
110	a multicast group, and wireless networks.  TRACK takes this into account by
111	not requiring any many-to-many multicast services. In addition, the
112	congestion control component used in TRACK will specifically deal with the
113	high bandwidth-delay product faced in many satellite networks and the high
114	link level loss rate faced by some wireless networks.  Finally, TRACK does
115	not assume that the topology used for sending control packets has any
116	congruence to the topology of the multicast address used for sending data
117	packets.

119	3.4 Use of Network Infrastructure

121	There is wide consensus that in order to scale a real-time reliable
122	multicast protocol, there must be some use made of the network
123	infrastructure (the routers and servers inside the network).  New software
124	that supports the transport layer typically would run in either the routers
125	or the servers in the network, or both.  Deployment of router software
126	(such as that in the Generic Router Assist BB) is a powerful solution, but
127	typically requires very long time cycles, is of necessity limited in
128	functionality, and requires a graceful upgrade path.  Server software (such
129	as the Repair Head control tree) is much easier to deploy, but may require
130	new hardware to be added to the network.

132	In controlled networks, particularly during the first deployment phases of
133	reliable multicast, it is reasonable to deploy servers that only support a
134	single application, or even to use selected end clients themselves to
135	perform the functions necessary for scalability.  For widely deployed
136	Internet infrastructure components, the server infrastructure is usually
137	dedicated to just the single protocol, but supports all instances of that
138	protocol running across that piece of the network.  Examples of this usage
139	model include DNS, DHCP, NNTP, and HTTP.  Therefore, the control nodes used
140	in TRACK are designed to be run both on dedicated network servers able to
141	support hundreds or thousands of simultaneous data sessions, as well as on
142	an end user computer.

144	A number of extensions to IP multicast, such as subtree multicast, NACK
145	suppression, ACK aggregation, tree configuration discovery, and higher
146	fidelity congestion control reports, have been proposed which can run in
147	the routers.  If deployed widely, these would make reliable multicast
148	protocols easier to configure and to scale more readily.  Some or all of
149	these features are being standardized as part of the Generic Router Assist
150	(GRA) component.  TRACK is designed to take advantage of GRA as it becomes
151	available, but not to require it.  Ubiquitous deployment of GRA would
152	likely reduce the number of dedicated TRACK servers needed for large scale
153	(i.e. more than 1000 Receiver) deployments, and improve the performance of
154	the protocol.

156	3.5 Targeted Application Types

158	Multicast applications can be divided into two classes, few-to-many   and
159	many-to-many.  Many-to-many applications include multi-user games, small
160	group conferencing, and computer supported collaborative work.  These
161	applications typically treat all members in a group as peers, require
162	special semantics such as total ordering of messages from multiple Senders,
163	and often have moderate scalability requirements.  Other protocols, such as
164	RMP, have been designed to support these many-many applications.

166	In line with the charter for RMT, TRACK focuses on one to many bulk data
167	distribution applications, such as multicast file transfer, electronic
168	software distribution, real time news and financial market data
169	distribution, "push" applications, audio/video/data streaming, distance
170	learning, and some types of server replication.

172	In order to meet these requirements, TRACK treats each Sender as an
173	independent entity, and provides no ordering or other shared state across
174	data sessions, although multiple data sessions can share the same control
175	infrastructure.  The protocol is designed to scale to at least many
176	thousands of simultaneous Receivers.  TRACK provides a strong, but fully
177	distributed membership protocol, which supports scaling to many thousands
178	of simultaneous Receivers while providing confirmed delivery on messages.
179	Similar to TCP, TRACK continuously streams data to receivers, performing
180	acknowledgement and retransmission of older data packets at the same time
181	that new data packets are being sent.  It also provides some special
182	support for real-time applications such as audio/video/data streaming and
183	live financial market data distribution.

185	Some real-time applications require jitter control for smooth playback.
186	This can be accomplished by using the unordered delivery option of TRACK
187	and performing jitter control in the application.  Typically, this requires
188	the application to maintain a separate buffer to smooth out the per packet
189	delay variations.

191	TRACK also supports sender-controlled recovery window.  In each data
192	packet, the sender may indicate to all receivers that data older than
193	certain sequence number are no longer worthy of recovering.  (See section
194	on "Delivery Semantics" for more details).  This mechanism helps the
195	transport better support applications that distribute content that ages
196	quickly, such as stock quotes.

198	3.6 IETF Mandated Criteria

200	In addition to the requirements imposed by the targeted network and
201	application types, TRACK is designed to meet all of the requirements
202	proposed by the IETF in RFC2357.

204	- Congestion Control.  TRACK includes provably safe and TCP-friendly
205	congestion control algorithms that also scale to large groups.

207	- Well-controlled, Scaleable Behavior.  TRACK includes carefully analyzed
208	algorithms that manage and smooth the control traffic and retransmissions.
209	These are key to avoiding NACK implosion, ACK implosion, and retransmission
210	implosion (the local recovery pathology).

212	- Security.  TRACK supports protection of the transport infrastructure,
213	through the use of lightweight authentication of control and data packets.

215	3.7 Graceful Evolution

217	Creating robust, universally applicable standard protocols takes a great
218	deal of time and protocol evolution.  While TRACK is being written as a
219	standard, it will have to continue to evolve as real world experience is
220	gained with the protocol, similar to how TCP has been tuned over almost 20
221	years of research and development.  TRACK addresses this through its use of
222	Building Blocks, which allow particular algorithms to be broken out in to
223	separate components with well defined interfaces.  This allows evolution of
224	these components, hopefully with little or no changes required to the rest
225	of the protocol.

227	TRACK also addresses evolution through its use of session parameters.
228	TRACK is presently dependent on a number of parameters which MUST be
229	configured throughout the tree for optimal operation.  TRACK provides
230	mechanisms to automatically distribute these parameters to all members of
231	the group, and OPTIONALLY provides mechanisms to dynamically change some of
232	these parameters during group operation.

234	TRACK also provides SNMP management and monitoring tools.  Over time,
235	deployment experiences will provide input on which values work best for
236	most deployments, leading to further refinements of the standard.

238	3.8 Algorithm Selection

240	The above design criteria applies to the general architecture of the
241	protocol.  Additional criteria were used for selecting the optimal
242	algorithms for different sets of functions.  These rationales are described
243	below, along with relevant functions.

245	4. Architectural Overview

247	4.1 TRACK Entities

249	4.1.1 Node Types

251	TRACK divides the operation of the protocol into three major entities:
252	Sender, Receiver, and Repair Head.  TRACK's Repair Head corresponds to the
253	Service Node described in the Tree-Building draft. It is assumed that
254	Senders and Receivers typically run as part of an application on an end
255	host client. Repair Heads MAY be components in the network infrastructure,
256	managed by different network managers as part of different administrative
257	domains, or MAY run on an end host client, in which case they function as
258	both Receivers and Repair Heads.  Absent of any automatic tree
259	configuration, it is assumed that the Infrastructure Repair Heads have
260	relatively static configurations, which consist of a list of nearby
261	possible Repair Heads.  Senders and Receivers, on the other hand, are
262	transient entities, which typically only exist for the duration of a single
263	data session. In addition to these core components, applications that use
264	TRACK are expected to interface with other services that reside in other
265	network entities, such as multicast address allocation, session
266	advertisement, network management consoles, DHCP, DNS, server level
267	multicast, and multicast key management.

269	4.1.2 Multicast Group Address

271	A multicast group address is a pair consisting of an IP multicast address
272	and a UDP port number.  It may optionally have a Time To Live (TTL) value,
273	although this value MUST only be used for providing a global scope to a
274	Data Session. Data multicast address and control multicast address are both
275	multicast group addresses.

277	4.1.3 Data Session
278	A Data Session is the unit of reliable delivery of TRACK.  It consists of a
279	sequence of sequentially numbered Data packets, which are sent by a single
280	Sender over a single Data Multicast Address.  They are delivered reliably,
281	with acknowledgements and retransmissions occurring over the Control Tree.
282	It is uniquely identified by a combination of a Session ID, sender's
283	address and port, and the multicast address and port.

285	A given Data Session is received by a set of zero or more Receivers, and a
286	set of zero or more of Repair Heads.  One or more Data Sessions MAY share
287	the same Data Multicast Address (although this is not recommended).  Each
288	TRACK node can simultaneously participate in multiple Data Sessions.  A
289	receiver MUST join all the Data Multicast Addresses and Control Trees
290	corresponding to the Data Streams it wishes to receive.

292	4.1.4 Control Tree

294	A Control Tree is a hierarchical communication path used to send control
295	information from a set of Receivers, through zero or more Repair Heads
296	(RHs), to a Sender.  Information from lower nodes are aggregated as the
297	information is relayed to higher nodes closer to the sender.  Each Data
298	Session uses a Control Tree.

300	Each RH in the control tree uses a separate multicast address for
301	communicating with its children.  Optionally, these RH multicast addresses
302	may be the same as the multicast address of the Data Channel.

304	4.1.5 Session ID

306	A Session ID is a 32-bit number (to be formally defined in the Common
307	Packet Header BB) chosen either by the application that creates the session
308	or selected by TRACK.  Senders and Receivers use the Session ID to
309	distinguish Data Streams.  A Sender may specify a Session ID in the range
310	from (2^31) to (2^32)-1.  Numbers in the range from 0 to (2^31)-1 are
311	reserved.  If a sender specifies 0 as the Stream ID, then TRACK randomly
312	assigns a Stream ID in the range from 1 to (2^31)-1.  If a Session ID is
313	selected that is already in use on a Control Tree, the new stream will
314	fail, and will need to select a new Session ID.

316	A session is uniquely identified by its Session ID, its sender's
317	address/port, and its Data Multicast Address and port.

319	4.1.6 Packet Sequence Numbers

321	A packet sequence number is a 32 bit number in the range from 1 through
322	2^32 � 1, which is used to specify the sequential order of a Data packet in
323	a Data Stream.  A sender node assigns consecutive sequence numbers to the
324	Data packets provided by the Sender application.  Zero is reserved to
325	indicate that the data session has not yet started.

327	4.1.7 Data Queue

329	A Data Queue is a buffer, maintained by a Sender or a Repair Head, for
330	transmission and retransmission of the Data packets provided by the Sender
331	application.  New Data packets are added to the data queue as they arrive
332	from the sending application, up to a specified buffer limit.  The
333	admission rate of packets to the network is controlled by flow and
334	congestion control algorithms.  Once a packet has been received by the
335	Receivers of a Data Stream, it may be deleted from the buffer.

337	4.1.8 Packet Types

339	TRACK defines a set of packets, which can be implemented either on top of
340	UDP or directly on top of IP.  All TRACK packets will conform to the Common
341	Packet Headers BB.  Each TRACK packet definition consists of a fixed
342	header, zero or more option headers, followed by data or control
343	information.

345	Data is carried in Data packets.  The same packet type is used both to
346	transmit Data the first time and for retransmissions of lost packets. A bit
347	in the packet header is set when the packet is a retransmission.  Each Data
348	packet has a Session ID and a sequence number, which identify the packet
349	and allow a receiver application to reconstruct the data stream from the
350	Data packets.
351	Receivers and Repair Heads unicast periodic status packets to their
352	parents.  An ACK is sent regularly to indicate the status of the Data
353	packets which have arrived and to furnish congestion control statistics
354	about the state of data reception at the node.  An ACK requests
355	retransmission of Data packets that have not been received.  An ACK also
356	acknowledges packets that have become stable.    A NACK is an ACK that is
357	used to request immediate recovery of lost Data packets.  ACKs and NACKs
358	have the same format, but ACKs are passed all the way up the tree, while
359	NACKs are only sent as far as needed to find a node which can provide all
360	the requested retransmissions.  A child will also send an ACK in response
361	to a NullData or Heartbeat packet if it has not sent an ACK within a
362	certain time interval.

364	TRACK uses the Tree-Building draft as a reference for building its repair
365	tree.  The following is a description of TRACK's implementation of tree
366	building that is consistent with that draft.

368	When a Receiver or Repair Head wishes to establish a repair service
369	relationship, it uses a Bind packet to bind to a parent Repair Head. A
370	parent sends an Accept or Reject after it processes a Bind packet.
371	The Reject message comes with a reason code that explains the reason for
372	rejection. The reason may indicate that the parent is not connected into
373	the tree yet, so that the receiver can try again later (see open issue).
374	If the parent sends an Accept, this constitutes Joining a session.

376	When a Receiver or Repair Head wishes to leave a session, it sends a Leave
377	request to its parent.  The parent replies with a LeaveConfirm packet, at
378	which time the child is allowed to leave.

380	A Repair Head or Sender periodically sends Heartbeat packets to notify its
381	child nodes that it is alive.

383	If a Sender has no data to send for a session, it periodically
384	multicasts a NullData packet on the Data Multicast Address.  NullData
385	packets inform receivers about the state of the Data Stream and the Sender.

387	If a child node is not operating normally, or a parent node restarts
388	after a failure and receives a packet from a child not in its child list,
389	then the parent node sends an Eject packet to the child node,
390	causing the child node to terminate its connection to the control tree.

392	4.2 Basic Operation of the Protocol

394	For each Data Session, TRACK provides sequenced, reliable delivery of data
395	from a single Sender to up to tens of thousands of Receivers.  A TRACK Data
396	Session consists of a network that has exactly one Sender node, zero or
397	more Receiver nodes and zero or more Repair Heads.

399	The figure below illustrates a TRACK Data Session with multiple Repair
400	Heads.

402	A Sender joins the TRACK tree and multicasts data packets on the Data
403	Multicast Address.  All of the nodes in the session subscribe to the class
404	D IP multicast address and UDP port associated with the Data Multicast
405	Address.

407	There is no assumption of congruence between the topology of the Data
408	Multicast Address and the topology of the Control Tree.

410				       -------> SD (Sender node)----->|
411				      ^^^                             |
412			 ACKs       /  |  \    Control                |
413			 and      /    |    \    Tree                 |
414			NACKs   /      |      \                       |
415			      /        |        \     (Repair         |
416			    /          |          \    Head           |
417			  /            |            \  nodes)         v
418			RH             RH            RH  <------------|
419			^^            ^^^            ^^               | Data
420		       / |           / | \           | \              | Channel
421		      /  |          /  |  \          |  \             |
422		     /   |         /   |   \         |   \            v
423		    R    R        R    R    R        R    R  <---------
424			       (Receiver Nodes)

426	A Receiver joins a Data Multicast Address to receive data.  A Receiver
427	periodically informs its parent about the packets that it has or has not
428	received by unicasting an ACK packet to the parent.  Each parent node
429	aggregates the ACKs from its child nodes and (if it is not the Sender)
430	unicasts a single aggregated ACK to its parent.   For lower latency
431	recovery in low loss networks, Receivers can also generate NACKs upon
432	detection of losses.  These have the same format as a ACK, but are only
433	passed up the tree as far as necessary in order to find a Repair Head that
434	can retransmit the packet.  The Repair Heads provide NACK suppression,
435	which provides traffic minimization benefits similar to ACK aggregation.

437	The Sender and each Repair Head have a multicast Local Control Channel to
438	their children.  This is used for transmitting Heartbeat packets that
439	inform their child nodes that the parent node is still functioning.  This
440	channel is also used to perform local retransmission of lost data packets
441	to just these children.  TRACK will still provide correct operation even if
442	multicast addresses are reused across multiple Data Sessions or multiple
443	Local Control Channels.  It is NOT RECOMMENDED to use the same multicast
444	address for multiple Local Control Channels serving any given Data Session.

446	The communication path forms a loop from the Sender to the Receivers,
447	through the Repair Heads back to the Sender.  Data and NullData packets
448	regularly exercise the downward data direction.  Heartbeat packets exercise
449	the downward control direction.  ACKs, NACKs, and HeartbeatResponse packets
450	regularly exercise the control tree in the upward direction.  This
451	combination constantly checks that all of the nodes in the tree are still
452	functioning correctly, and initiates fault recovery when required.

454	In addition to using ACKs, NACKs, and Repair Heads for scaleable loss
455	notification and retransmission, TRACK also supports the optional use of
456	Generic Router Assist (GRA) and integrated Forward Error Correction (FEC).
457	Two of the major functions of GRA are NACK suppression and dynamically
458	scoped local retransmission.  These functions, if enabled, are
459	independently deployed between each parent and its children.  For the
460	purpose of GRA NACK functions, each parent is considered to be a Sender and
461	the children of that parent are considered as the Receivers.

463	Retransmission requests, both NACKs and ACKs, contain selective bitmaps
464	indicating which packets need to be retransmitted.  If FEC is enabled,
465	these bitmaps provide enough information to determine the number of parity
466	packets to be sent rather than sending individual retransmissions.

468	4.3 Session Creation

470	Before a data session starts delivering data, the tree for the Data Session
471	needs to be created.  This process binds each Receiver to either a Repair
472	Head or the Sender, and binds the participating Repair Heads in to a loop-
473	free tree structure with the Sender as the root of the tree.  This process
474	requires tree configuration knowledge, which can be provided with some
475	combination of manual and/or automatic configuration.  The actual
476	algorithms for tree configuration will be part of the Automatic Tree
477	Configuration BB, and are discussed in the next section.

479	To start a data session, a Sender communicates to the Receivers, via either
480	an external service or through the application itself, the Data Multicast
481	Address that will be used for the Data Session.  It may advertise other
482	relevant session information such as whether or not Repair Heads should be
483	used, whether manual or automatic tree configuration should be used, the
484	time at which the session will start, and other protocol constants.  It may
485	also advertise certain hints for the tree configuration algorithms and
486	metrics. In this way, the Sender enforces a set of uniform Session
487	Configuration Parameters on all members of the session.

489	After receiving this out of band communication, the Receivers join the Data
490	Multicast Address, and attempt to bind to either the Sender or a local
491	Repair Head.  The tree configuration algorithms are responsible for
492	providing the Receiver with a list of one or more nodes which it will
493	attempt to bind to.  It will attempt to bind to the first node in the list,
494	and if this fails, it will move to the next one.  A Receiver only binds to
495	a single Repair Head or Sender, at a time, for each Data Session.

497	When a Repair Head has a Receiver bind to it for a given Data Session, it
498	then also binds to another Repair Head or to the Sender, depending on the
499	list given to it by the tree configuration algorithms.  The tree
500	configuration algorithms are responsible for ensuring that the tree is
501	formed without loops.

503	Once the Sender initiates tree building, it is also free to start sending
504	Data packets on the Data Multicast Address.  Repair Heads and Receivers may
505	start receiving these packets, but may not request retransmission or
506	deliver data to the application until they receive confirmation that they
507	have successfully bound to the group.

509	Some of the Session Configuration Parameters MAY be changed dynamically by
510	the Sender by advertising the changed values as part of the NullData
511	packets periodically sent through the tree.  If a given Session
512	Configuration Parameter must be the same at all nodes in order to provide
513	safe operation, it MUST NOT be dynamically changed once the Data Session
514	has started.

516	4.4 Tree Configuration

518	TRACK is designed to work either with manual configuration of the tree, or
519	with optional automatic tree configuration.  Tree configuration is
520	responsible for providing each Receiver and Repair Head with a list of one
521	or more appropriate parents to attempt to bind to.

523	The goals of automatic tree configuration are:

525	-       allow Receivers to automatically locate their best Repair Head(s), and
526	obtain the local control channel multicast address.
527	-       provide automatic configuration of the Repair Head with either Repair
528	Heads that are servers operating in the network, or with dynamically
529	selected receivers.

531	These algorithms are specified in the Tree Configuration BB [16]. In order
532	to make sure that TRACK can be standardized in a timely fashion, the
533	automatic tree configuration algorithms need to be separate from the rest
534	of the TRACK protocol, so that TRACK can be deployed even without these
535	algorithms. When these algorithms from the Tree Configuration BB are not
536	available, TRACK will use static configuration.

538	4.5 Data Transmission and Retransmission

540	Data is multicast by a Sender on the Data Multicast Address.
541	Retransmissions of data packets may be multicast by the Sender on the Data
542	Multicast Address or be multicast on a Local Control Channel by a Repair
543	Head.  In order to provide NACK suppression and to work with proactive FEC,
544	retransmissions are always multicast.  If Generic Router Assist is enabled,
545	the routers may provide NACK suppression and allow dynamically scoped
546	retransmission to just the subset of Receivers and Repair Heads that have
547	missed a packet.

549	A Repair Head joins all of the Data Multicast Addresses that any of its
550	descendants have joined.  A Repair Head is responsible for receiving and
551	buffering all data packets using the reliability semantics configured for a
552	stream.  As a simple to implement option, a Repair Head MAY also function
553	as a Receiver, and pass these data packets to an attached application.

555	For additional fault tolerance, a Receiver MAY subscribe to the multicast
556	address associated with the Local Control Channel of one or more Repair
557	Heads in addition to the multicast address of its parent.  In this case it
558	does not bind to this Repair Head or Sender, but will process
559	Retransmission packets sent to this address.  If the Receiver's Repair Head
560	fails and it transfers to another Repair Head, this minimizes the number of
561	data packets it needs to recover after binding to the new Repair Head.

563	There are two types of retransmissions: local retransmission  and
564	dynamically scoped retransmission.

566	4.5.1 Local Retransmission

568	If a Repair Head or Sender determines from its child node's ACKs or NACKs
569	that a Data packet was missed, the Repair Head retransmits the Data packet
570	or, if FEC is enabled, an FEC parity packet.  The Repair Head or Sender
571	multicasts the Retransmission packet on its multicast Local Control
572	Channel.  In the event that a Repair Head receives a retransmission and
573	knows that its children need this repair, it re-multicasts the
574	retransmission to its children.

576	The scope of retransmission is considered part of the Control Channel's
577	multicast address, and is derived during tree configuration.

579	4.5.2 Dynamically Scoped Retransmission

581	Dynamically Scoped Retransmission may be used on a network whose routers
582	support dynamically scoped retransmissions through Generic Router Assist.
583	Dynamically Scoped Retransmissions use soft state kept in the routers to
584	constrain the Retransmission to only the children that have requested them
585	through a NACK.  Dynamically Scoped Retransmissions are known to be
586	susceptible to router topology changes.  Therefore, only the first
587	retransmission of a packet is sent via this mechanism.  Thereafter, only
588	the above two mechanisms should be used.  This will allow the protocol to
589	provide connectivity even during router topology changes, albeit with less
590	efficiency.

592	4.6 Control Traffic Management

594	One of the largest challenges for scaleable reliable multicast protocols
595	has been that of controlling the potential explosion of control traffic.
596	There is a fundamental tradeoff between the latency with which losses can
597	be detected and repaired, and the amount of control traffic generated by
598	the protocol.  In conjunction with the dynamic global tree parameters,
599	TRACK provides a set of algorithms that carefully control and manage this
600	traffic, preventing control traffic explosion.

602	Despite their different names, ACKs and NACKs both function as selective
603	acknowledgements of the window of contiguous sequence numbers that have not
604	yet been fully acknowledged.  The only difference between the packet
605	headers is a single flag.

607	ACK packet frequency is controlled by setting a number of tree wide
608	parameters controlling their maximum rate of generation.  The primary
609	parameter is the ratio parameter, R, for the maximum number of ACK packets
610	to be generated per data packet sent.  The higher R is, the faster positive
611	acknowledgements will be generated all the way back to the sender.  This
612	induces more back-channel traffic.

614	ACKs MUST be enabled for any Data Session.  NACKs SHOULD be implemented as
615	part of any implementation, and MAY be enabled for any given Data Session.
616	If enabled, then on detection of a lost packet, a Receiver waits a random
617	interval before sending a NACK.  If the Receiver receives the retransmitted
618	data before the NACK timer expires, the Receiver cancels the NACK.  This
619	reduces the chance that multiple Receivers generate a NACK for the same
620	packet.

622	A Repair Head node multicasts a Data packet to its children as soon as it
623	gets a NACK request for that packet, unless it retransmitted that packet
624	previously in a configurable time period.  If it does not have the missing
625	packet, it forwards the NACK to its parent, and multicasts a control packet
626	to its children to suppress any further NACKs for that packet from them.
627	The Repair Head forwards only one NACK for a missing Data packet within a
628	specified period of time.  If more than one packet has been detected as
629	missing before the NACK is sent, the NACK will request all of the missing
630	packets.

632	NACKs are particularly good for providing real-time data distribution in
633	networks with low loss rates and short to moderate RTT times.  See [5] for
634	comparisons on the tradeoffs between ACKs and NACKs for low latency
635	recovery of lost packets.

637	4.7 Integrated Forward Error Correction

639	Work [6][7][8] has shown the benefits of incorporating reactive forward
640	error correction (FEC) into reliable multicast protocols.  This feature
641	encodes data packets with FEC algorithms, but does not transmit the parity
642	packets until a loss is detected.  The parity packets are then transmitted
643	and are able to repair different lost packets at different Receivers.  This
644	is a powerful tool for providing scalability in the face of independent
645	loss.  When implemented, it is a simple matter to also provide proactive
646	FEC which automatically transmits a certain percentage of parity packets
647	along with the data.  This is particularly useful when a high minimum error
648	rate is expected, or when low latency is particularly important.  Both of
649	these are optionally supported in TRACK.

651	FEC is organized around windows of packets.  TRACK Data packets include an
652	FEC offset window field, which identifies the offset of a given packet
653	within an FEC window.  Combined with the FEC session configuration
654	parameters, this allows receivers to decode a combination of Data and
655	parity packets, to generate each window of Data packets.  Proactive FEC
656	packets are parity packets sent as global retransmissions at the same time
657	a window of Data packets are sent.  Reactive FEC packets are sent either
658	from a Repair Head or a Sender, in response to requests for
659	retransmissions.  If using reactive FEC, a Repair Head must first have all
660	the packets in a window before it can respond to any request for
661	retransmission.  The ACK and NACK bitmaps, combined with the information in
662	the headers of the Data packets, provides each Repair Head with enough
663	information to determine which parity packets the RH must compute and send
664	in response to requests for retransmission.

666	4.8 Flow and Congestion Control

668	Flow and congestion control algorithms act to prevent the Senders from
669	overflowing the Receivers' buffers and to force them to share the network
670	fairly and safely with other TCP and RM connections.  TRACK uses a
671	combination of a transmission window for flow control, and the dynamic rate
672	control algorithms specified in the Congestion Control (CC) BB for
673	congestion control.  These algorithms have been proven to meet all the
674	requirements for flow and congestion control, including being safe for use
675	in a general Internet environment, and provably fair with TCP.

677	The Sender application provides the minimum and maximum rate limits as part
678	of the global parameters.  A Sender will not transmit at lower than the
679	minimum rate (except possibly during short periods of time when certain
680	slow receivers are being ejected), or higher than the maximum rate.  If a
681	Receiver is not able to keep up with the minimum rate for a period of time,
682	the CC BB algorithms will cause it to leave the group. Receivers that leave
683	the group MAY attempt to rejoin the group at a later time, but SHOULD NOT
684	attempt an immediate reconnection.

686	4.9 Notification of Confirmed Delivery

688	TRACK provides a simple membership count for each session.  This is
689	done by each repair head counting/aggregating its (subtree) membership
690	count and propagating it up the tree to the sender.  The propagation
691	up the tree is piggybacked on the regular TRACK (ACK and NACK) packets.

693	Depending on whether there are late joiners, and receiver and repair
694	head failures, this count may fluctuate over the duration of the session.

696	Whether this counting is done or not can be controlled by a session-wide
697	configuration parameter.

699	A complete list of receiver membership can only be obtained if each repair
700	head (including the sender) supports an SNMP interface that supports
701	getting membership ids.  Such SNMP support is optionally required for
702	dedicated repair servers (but not required of regular receivers).

704	4.10 Fault Detection and Recovery

706	4.10.1 Sender node failure detection

708	A Sender node that has no data to send will periodically send NullData
709	packets on the Data Multicast Address.  If a Receiver or a Repair Head
710	fails to receive Data packets or NullData packets for a session sent by the
711	Sender, the Receiver detects a Sender failure.

713	4.10.2 Repair Head failure detection

715	Each Repair Head node sends Heartbeat packets to its child nodes on its
716	multicast Control Tree.  If the child nodes do not receive any Heartbeats
717	from their parent Repair Head, they detect failure of the parent.

719	4.10.3 Receiver node failure detection

721	A Receiver node sends ACKs and (optionally) NACKs for each of the active
722	sessions that it has joined.  If none of the sessions are active, then the
723	Receiver sends HeartbeatResponse packets to its parent.

725	If a Receiver's parent node does not receive a ACK, NACK or a
726	HeartbeatResponse packet within a specified time interval, the parent
727	detects the failure of the Receiver and removes the child from its child
728	list.

730	4.10.4 Repair Head discovery

732	TRACK supports an option which allows the nodes in the TRACK tree to
733	acquire the addresses and location of its ancestors in the control tree and
734	the addresses of its parent's siblings.  If a TRACK node's parent fails,
735	then the node can use the acquired information to join an alternate control
736	node.

738	4.10.6 Recovery

740	When a child node detects failure of its parent node, it can try to
741	reconnect to an alternate Repair Head of the TRACK tree, or it can try to
742	reconnect directly to the Sender.

744	4.11 Reliability Semantics

746	The reliability semantics TRACK provides are defined by the binding
747	between a receiver and its repair head.  When this binding is established,
748	the repair head agrees to provide retransmission of missed packets for the
749	receiver starting from a specific (receiver requested) sequence number.  At
750	this time, the repair head MUST not have discarded any data packet starting
751	from this sequence number.

753	Subsequently, a repair head needs to discard older packets from its buffer
754	from time to time. The following two factors influence when to discard an
755	old packet:
756	a) Stability - When all receivers immediately subordinate to the repair
757	head have acknowledged receipt of a packet, that packet is   considered
758	stable.  When the whole sub-tree of receivers below a repair head have
759	received a packet, it is considered as "strictly stable".  TRACK
760	provides no explicit support for this strict sense of stability (note
761	this form of reliability is also referred to as "pessimistic
762	reliability").
763	b) Sender recovery window - Each data packet carries two sequence
764	numbers: one is the sequence number of the current data packet, and the
765	other is the sender recommended sequence number where recovery should
766	start from (smaller than the current sequence number). This pair of
767	sequence numbers forms a sender-suggested recovery window.
768	A repair head MUST not discard any packet before it becomes stable. Per
769	binding agreement or session wide configuration, a repair head MAY be
770	allowed to discard a packet when it moves outside of the sender
771	recovery window.

773	When a repair head's buffer is filled up and none of the packets can be
774	discarded (due to stability or recovery window requirements), newly arrived
775	packets must be discarded and recovered later.

777	A receiver SHOULD NOT try to recover packets outside of the sender
778	recovery window.

780	When a receiver loses its repair head due to network partition or
781	repair head crashing, the receiver MAY continue with the same reliability
782	service if it manages to find and re-affiliate with another repair head. If
783	the receiver fails to find an alternative repair head that can continue to
784	provide reliability service where the previous repair head left off, this
785	receiver MUST indicate failure to its application.

787	4.12 Ordering Semantics

789	TRACK offers two flavors of ordering semantics: Ordered or Unordered. One
790	of these is selected on a per session basis as part of the Session
791	Configuration Parameters.

793	Unordered service provides a reliable stream of packets, without
794	duplicates, and delivers them to the application in the order received.
795	This allows the lowest latency delivery for time sensitive applications.
796	It may also be used by applications that wish to provide its own jitter
797	control.

799	Ordered service provides TCP semantics on delivery. All packets are
800	delivered in the order sent, without duplicates.

802	4.13 SNMP Support

804	The Repair Heads and the Sender are designed to interact with SNMP
805	management tools.  This allows network managers to easily monitor and
806	control the sessions being transmitted.  All TRACK nodes have SNMP MIBs
807	defined.  SNMP support is optional for Receiver nodes, but is required for
808	all other nodes.

810	4.14 Late Join Semantics

812	TRACK offers three flavors of late join support:
813	a)      No Recovery
814	A receiver binds to a repair head after the session has started and
815	agrees to the reliability service starting from the sequence number in
816	the current data packet received from the sender.
817	b)      Continuation
818	This semantic is used when a receiver has lost its repair head and
819	needs to re-affiliate.  In this case, the receiver must indicate the
820	oldest sequence number it needs to repair in order to continue the
821	reliability service it had from the previous repair head.  The binding
822	occurs if this is possible.
823	c) No Late Join
824	For some applications, it is important that a receiver receives either
825	all data or no data (e.g. software distribution).  In this case option
826	(c) is used.

828	4.15 Application Signaling for Notification

830	TRACK provides two forms of application signaling for speedy
831	acknowledgement:
832	a) End of stream - this is done when the application has finished
833	sending all its data, and wants to finish the session.
834	b) Synch - this is done when the application comes to a point in its
835	data distribution that it wants to make sure all packets have been
836	received before proceeding further.  In this case the session is not
837	ending.

839	In both cases, the application SHOULD be able to signal this through its
840	transport API.  In turn, TRACK will carry the signal as a flag in its data
841	(or NullData) packets.  For case (a), the flag is set in the last data
842	packet of the session, and in additional NullData packets
843	carrying the last sequence number.  For case (b), the flag is set in
844	the data packet the application requires synch, and in additional
845	NullData packets sent prior to new data packets following the synch
846	sequence number.

848	Upon receiving "end of stream", a receiver must try to recover data packets
849	up to the indicated last sequence number and send its final ACK to its
850	repair head.  The receiver can then leave the repair head. When all the
851	packets up to the last packet become stable, the repair head can leave.

853	Upon receiving "synch", the receivers and repair heads perform the same
854	operations as in "end of stream" except they keep their binding.

856	5. Functional Specification for TRACK Requirements of Building Blocks

858	Work [2] provides a rationale for decomposing the RMT protocols in to
859	Building Blocks and Protocol Instantiations.  This section provides a
860	simple specification of the functions that TRACK requires from each of the
861	Building Blocks.  It also provides some basic description of the interfaces
862	between these components.

864	Since the following overlaps with what is done in the BBs, all of section 5
865	is for discussion purposes only, and is not meant to replace what is
866	specified in the supporting BBs.  The BBs will define the actual
867	algorithms.

869	5.1 NACK-based Reliability

871	This building block defines NACK-based loss detection/notification and
872	recovery.  The major issues it addresses are implosion prevention
873	(suppression) and NACK semantics (i.e. how packets to be retransmitted
874	should be specified, both in the case of selective and FEC loss repair).

876	The NACK suppression mechanisms used by TRACK are unicast NACKs with
877	multicast confirmation and exponentially distributed timers.  These
878	suppression mechanisms primarily need to both minimize delay while also
879	minimizing redundant messages.  They may also need to have special
880	weighting to work with Congestion Feedback.

882	5.1.1 NACK BB Algorithms

884	Exponential Back Off.  When a packet is detected as lost, an exponentially
885	distributed timer is set, based on the algorithms in [9].  This timer is
886	biased based on the input congestion weighting factor.  If either a packet
887	or an explicit suppression message with the same sequence number is
888	detected before the timer goes off, the timer is cancelled.

890	NACK Generation.  When a timer goes off, the protocol instantiation is
891	notified to generate a NACK for that sequence number.  The protocol
892	instantiation may, at its discretion, group multiple NACK notifications in
893	to a single NACK packet.  For TRACK, NACKs are implemented as a unicast
894	packet with a multicast confirmation response.

896	Response to a Retransmission Request.  When a Repair Head or other possible
897	retransmission agent receives the first NACK from another group member for
898	a given packet, it notifies the protocol instantiation to send either a
899	data retransmission or, if it doesn't have the packet for retransmission,
900	an optional suppression message.  It then sets an embargo timeout, tied to
901	the RTT to the furthest Receiver, during which other requests for the same
902	packet will be ignored.  The length of this embargo doubles each time that
903	a retransmission is sent.  This algorithm should also work with requests
904	for retransmissions that come in the form of ACKs, as the algorithms and
905	packet formats for both are identical, with the exception of the
906	suppression mechanisms used.

908	GRA Signaling.  A primary function of GRA is to do NACK
909	elimination/suppression and subcasting of repairs.  In order to do this,
910	the transport need to signal the GRA-enabled routers to turn on the
911	appropriate algorithms.  This algorithm has to deal with issues such as
912	router topology changes.  While not dealt with in detail here, this is a
913	very subtle issue, which will have to be dealt with carefully.

915	5.1.2 NACK BB Parameters

917	Congestion Weighting.  From Congestion Control BB.  This is a weighting
918	parameter for NACK suppression timers.  The exact algorithms for this are
919	still to be determined.

921	Loss Notification.  From TRACK protocol instantiation.  Notification at a
922	Sender that a packet has been detected as lost, and the sequence number of
923	that packet.

925	Retransmission Request.  From TRACK protocol instantiation.  When a NACK or
926	ACK with a request for retransmission is received, this needs to be passed
927	to the BB for handling retransmission requests.

929	GRA Enabled.  From PI.  Is GRA enabled in the network?

931	5.2 FEC Repair BB

933	This building block is concerned with packet level FEC repair.  It
934	specifies the FEC codec selection and the FEC packet naming (indexing) for
935	both reactive FEC and proactive FEC.

937	5.2.1 FEC BB Algorithms

939	FEC Input.  Receive a window of packets (not necessarily all at once), and
940	store pointers to them for use in the FEC Create Parity algorithm.

942	FEC Create Parity.  Given a window of packets, create and return a parity
943	packet.  If a window is not yet full, first call the FEC Flush function.
944	If there are no more parity packets that can be generated for this window,
945	then return an error or else return a parity packet that has already been
946	generated.  This uses one of a set of codecs, specified through the use of
947	codepoints.  For TRACK, it is expected that the codecs will operate over
948	relatively small windows, to work with real-time applications and
949	congestion control.

951	FEC Flush.  For a window that needs a parity packet, but is not yet full,
952	FEC flush creates all-zero packets for the rest of the packets in the
953	window.  No more calls to FEC input can be made for this window after FEC
954	flush has been called.

956	FEC Decode.  Given a set of received data and/or parity packets, decode the
957	window using the specified FEC codec.

959	5.2.2 FEC BB Parameters

961	Codec Code Point Index.  What is the codec being used for encoding and
962	decoding?  This is a fixed parameter per data stream.

964	FEC Window Size.   What is the number of packets in an FEC window?  This is
965	a fixed parameter per data stream.

967	FEC Maximum Parity.  This is the maximum number of parity packets that can
968	be generated over a given window size. This is a fixed parameter per data
969	stream.

971	Data Packet Sequence Number.  This is the sequence number of a data packet.
972	This is input to FEC Input from the PI.

974	FEC Window Offset.  For a given packet, what is the offset in to an FEC
975	window?  This is associated with each Data packet that uses FEC.  It is a
976	header field on each Data packet sent.  It is sequential over each packet
977	in a window, unless a Flush occurs on a partially full window.  In that
978	case, the window offset of this last packet is set to FEC Window Size�1.
979	For parity packets, the FEC Window Offset starts at FEC Window Size, and
980	goes up to FEC Window Size + FEC Maximum Parity�1.  This is returned from
981	FEC Input and from FEC Create Parity.

983	5.3 Congestion Control BB

985	TRACK uses a source-based rate regulation algorithm, with a single rate
986	provided to all the Receivers in the session.

988	The following set of algorithms and parameters is a subset of those needed
989	for a full implementation, but give an idea of what is required.

991	5.3.1 Congestion Control BB Algorithms

993	Initialization.  A number of transport-wide parameters must be fed to each
994	of the nodes in the group, such as minimum rate, maximum rate, data segment
995	size, etc.

997	Receiver Measurements.  The Receiver must keep track of its average loss
998	rate, and RTT to the Sender.  We will call these measures "congestion
999	reports".

1001	Receiver Feedback.   The Receivers must feed these congestion back to the
1002	Sender, piggybacked on both NACKs and ACKs.

1004	Hierarchical Aggregation.  Restricted worst edge aggregation should be used
1005	to aggregate the congestion reports in the ACKs and/or NACKs being fed up
1006	the tree [10].  Every time that an ACK or NACK is generated, this algorithm
1007	should be called to fill in the appropriate fields.  Every time an ACK or
1008	NACK is received, this algorithm should be called to process the congestion
1009	control fields in the packet.  This algorithm must also be notified every
1010	time a new child joins or leaves at a Repair Head or Sender.

1012	Sender Rate Control.  Based on the congestion reports received, the Sender
1013	must change its sending rate.

1015	TCP Friendly Equation.  Given values for RTT, DataSize, and LossRate, this
1016	generates a target throughput rate according to a modified version of the
1017	complex TCP model given in [11].

1019	5.3.2 Congestion Control BB Parameters

1021	Initialization Parameters.  A set of different options, some of which can
1022	be permanent constants, but others are selected by either the Sender or a
1023	network manager.

1025	Lost Packet.   Every time a packet is detected as lost, the Senders must be
1026	notified of this.

1028	RTT Measurement.  Every time a RTT measurement is generated, either between
1029	Sender and Receiver(s), or between one level of the tree and another, the
1030	CC BB must be notified.

1032	Highest Allowed Sequence Number (HASN).  This is used to implement
1033	"receiver-driven" window control [13].  Each receiver can keep track of a
1034	congestion window and compute the HASN to be included in each ACK. A Repair
1035	Head aggregates the HASNs by computing the minimum value from all its
1036	children and forwards that as its own HASN up the tree.

1038	5.4 Generic Router Assist BB

1040	The task of designing scaleable RM protocols can be made easier by the
1041	presence of some specific support in routers.  In some application-
1042	specific cases, the increased benefits afforded by the addition of special
1043	router support can justify the resulting additional complexity and expense.

1045	Functional components which can take advantage of router support include
1046	feedback aggregation/suppression (both for loss notification and congestion
1047	control) and constrained retransmission of repair packets.

1049	The process of designing and deploying these mechanisms inside routers can
1050	be much slower than the one required for end-host protocol mechanisms.
1051	Therefore, it would be highly advantageous to define these mechanisms in a
1052	generic way that multiple protocols can use if it is available, but do not
1053	necessarily need to depend on.

1055	This component has two halves, a signaling protocol and actual router
1056	algorithms.  The signaling protocol allows the transport protocol to
1057	request from the router the functions that it wishes to perform, and the
1058	router algorithms actually perform these functions.

1060	An important component of the signaling protocol is some level of
1061	commonality between the packet headers of multiple protocols, which allows
1062	the router to recognize and interpret the headers.  This is covered in the
1063	section on common packet headers, below.

1065	5.4.1 GRA BB Algorithms

1067	NACK Suppression.  NACKs are sent towards the parent Repair Head or Sender,
1068	with a Router Alert option on.  GRA enabled routers detect these packets
1069	and suppress redundant NACKs.  It then updates a soft state table so that
1070	it knows to retransmit the requested packet to the requesting children,
1071	using Dynamic Selective Retransmission. The NACK suppression algorithm
1072	needs to work with both ACKs and NACKs, in order for Dynamic Selective
1073	Retransmission to work with TRACK.  This means that GRA can not suppress
1074	ACKs but must still use them to update its state for retransmissions.  It
1075	also means that GRA must work with ACK and NACK selective bitmaps, not just
1076	NACKs that request a single packet.

1078	Dynamic Selective Retransmission.  When a retransmission occurs, it is only
1079	forwarded to the interfaces of each router that have signaled through the
1080	use of NACKs that they need to see that packet.

1082	Nearest Repair Head Hint.  The router is made aware of the nearest Repair
1083	Heads, and is able to tell a child which is the best candidate for it to
1084	use.  This must only be used as a hint to children.

1086	Fine Grained Loss Reports.  A major limitation of TFMCC is its limitation
1087	of only getting 1-bit loss reports (i.e. a packet is lost, or it is not)
1088	from the routers.  A 8 or 16 bit report, piggybacked on to data packets,
1089	with the cumulative loss detected across all interfaces of GRA enabled
1090	routers the data packet crossed, would allow TRACK to become much more
1091	responsive to changes in network conditions.  These reports can only be
1092	used as hints.

1094	Signaling Protocol.  The functions for GRA need to be requested by the
1095	protocol ahead of time, and then the run time packet headers need to be
1096	decipherable by the router.

1098	5.4.2 GRA BB Parameters

1100	GRA Enabled.  Is GRA enabled in any of the routers in the network?  Which
1101	functions do the deployed version of GRA support?

1103	Packet Format.  Which type of packet format is GRA to operate over?  It is
1104	likely that different protocol instantiations will require differences in
1105	the packet headers they send to the router.  This is tied to the common
1106	packet header BB, below.

1108	5.5 Automatic Tree Configuration BB

1110	TRACK takes advantage of hierarchical Repair Heads, to greatly increase the
1111	theoretical scalability of the protocol.  These Repair Heads are used to
1112	form a tree with the source at the root, the Receivers at the leaves of the
1113	tree, and the Repair Heads in the middle.  The Repair Heads can either be
1114	dedicated server software for this task, or they may be application nodes
1115	that are performing dual duty.

1117	The effectiveness of these agents to assist in the delivery of data is
1118	highly dependent upon how well the logical tree they use to communicate
1119	matches the underlying routing topology.  The purpose of this building
1120	block is to construct and manage the logical tree connecting the agents.
1121	Ideally, this building block will perform these functions in a manner that
1122	adapts to changes in session membership, routing topology, and network
1123	availability.

1125	5.5.1  Auto Tree BB Algorithms

1127	These are discussed in section 3.3.  They are not yet mature enough to
1128	break down in to component parts.

1130	5.5.2  Auto Tree BB Parameters

1132	These are discussed in section 3.3.  The algorithms are not yet mature
1133	enough to break down in to the parameters needed.

1135	5.6  Security

1137	As specified in [12], the primary security requirement for a TRACK protocol
1138	is protection of the transport infrastructure.  This is accomplished
1139	through the use of lightweight group authentication of the control and,
1140	optionally, the data packets sent to the group.  These algorithms use IPsec
1141	and shared symmetric keys.  For TRACK, [12] recommends that there be one
1142	shared key for the Data Session and one for each Local Control Channel.
1143	These keys are distributed through a separate key manager component, which
1144	may be either centralized or distributed.  Each member of the group is
1145	responsible for contacting the key manager, establishing a pair-wise
1146	security association with the key manager, and obtaining the appropriate
1147	keys.  The TRACK protocol then provides options for piggy-backing key
1148	update messages on the Data Session and each Local Control Channel of the
1149	protocol.  These can either include a new shared group key (encrypted with
1150	the old group key) or a notification that the group key(s) are being
1151	changed and that the group members should contact the key manager to get
1152	the new key(s).  The former typically occurs on a periodic basis, while the
1153	latter may occur when a group member leaves.

1155	The exact algorithms for this BB is presently the subject of research
1156	within the IRTF Secure Multicast Group (SMuG).  Solutions for these
1157	requirements will be standardized within the IETF when ready.

1159	5.7  Common Headers BB

1161	As pointed out in the generic router support section, it is important to
1162	have some level of commonality across packet headers.  It may also be
1163	useful to have common data header formats for other reasons.  This building
1164	block consists of recommendations on fields in their packet headers that
1165	protocols should make common across themselves.  TRACK needs to implement
1166	these recommendations in the TRACK PI.

1168	5.7.1 Common Header BB Fields

1170	GRA Signaling.  The Retransmission, NACK and ACK packet headers need to
1171	provide a means for signaling their existence to GRA.  For NACK and ACK
1172	headers, the selective bitmap needs to be specified in a common way across
1173	all protocols so that the GRA component can interpret these fields and
1174	determine the sequence numbers of the packets that are being requested.
1175	For the Retransmission packets, the sequence number of the packet needs to
1176	be in a standard position so that GRA can interpret it.  For both NACK, ACK
1177	and Retransmission packets, the Session ID needs to be specified in a
1178	standard way across protocols.

1180	Data Packets.  The identification of data packets within a stream should be
1181	common across all protocols, both to aid in commonality of application
1182	semantics across protocols and to aid in GRA signaling. A Data Packet is
1183	identified by three fields: the Session ID, the Sequence Number, and the
1184	FEC Window Offset.  The Session ID may include the multicast address and/or
1185	a unique ID.  The sequence number starts at 1 and increments with each Data
1186	packet sent in the Session.  Sequence numbers are always sequentially
1187	generated, without gaps.  The FEC Window Offset specifies the offset of a
1188	Data packet in to an FEC window.  For a window of W generated packets, a
1189	maximum window size of M, and a maximum parity size of P, the packets are
1190	numbered as follows.  The first W-1 packets are numbered as 0 through W-2,
1191	with W never to exceed M.  Packet W is always numbered as M-1, so that
1192	Receivers and Repair Heads can detect a partially filled window.  The P
1193	parity packets are numbered M through M+P-1.

1195	IP and UDP.  It is easiest to implement protocols in the application space
1196	using UDP packets, but eventual kernel implementations will have TRACK
1197	implemented directly on top of IP.  Other protocols share this requirement,
1198	and the way that this transition is done should be specified across all
1199	protocols.

1201	6. Security Considerations

1203	7. References

1205	1) Bradner, S., "The Internet Standards Process -- Revision 3", BCP
1206	9, RFC 2026, October 1996.

1208	2) Whetten, B., et. al. "Reliable Multicast Transport Building
1209	Blocks for One-to-Many Bulk-Data Transfer."  Internet Draft,
1210	draft-ietf-rmt-buildingblocks-02.txt, Work in Progress.

1212	3) Handley, M., et. al.  "The Reliable Multicast Design Space for
1213	Bulk Data Transfer."  Internet Draft, draft-ietf-rmt-design-
1214	space-01.txt, Work in Progress.

1216	4) Bradner, S., "Key words for use in RFCs to Indicate Requirement
1217	Levels", BCP 14, RFC 2119, March 1997

1219	5) Whetten, B., Taskale, G.  "Overview of the Reliable Multicast
1220	Transport Protocol II (RMTP-II)."  IEEE Networking, Special Issue
1221	on Multicast, February 2000.

1223	6) Nonnenmacher, J., Biersack, E.  "Reliable Multicast: Where
1224	to use Forward Error Correction", Proc. 5th. Workshop on
1225	Protocols for High Speed Networks, Sophia Antipolis, France, Oct.
1226	1996.

1228	7) Nonnenmacher, J., et. al.  "Parity-Based Loss Recovery for
1229	Reliable Multicast Transmission", In Proc. of ACM SIGCOMM '97,
1230	Cannes, France, September 1997.

1232	8) Rizzo, L.  "Effective erasure codes for reliable computer
1233	communications protocols", DEIT Technical Report LR-970115.

1235	9) Nonnenmacher, J., Biersack, E. "Optimal Multicast Feedback",
1236	Proc. IEEE INFOCOM 1998, March 1998.

1238	10) Whetten, B., Conlan, J.  "A Rate Based Congestion Control Scheme
1239	for Reliable Multicast", GlobalCast Communications Technical
1240	White Paper, November 1998.  http://www.talarian.com/rmtp-ii

1242	11) Padhye, J., et. al.  "Modeling TCP Throughput:  A Simple Model
1243	and its Empirical Validation".  University of Massachusetts
1244	Technical Report CMPSCI TR 98-008.

1246	12) Hardjorno, T., Whetten, B.  "Security Requirements for TRACK
1247	Protocols."  Work in Progress.

1249	13) Golestani, J., "Fundamental Observations on Multicast Congestion
1250	Control in the Internet", Bell Labs, Lucent Technology, paper
1251	presented at the July 1998 RMRG meeting.

1253	14) Kadansky, M., D. Chiu, J. Wesley, J. Provino, "Tree-based
1254	Reliable Multicast (TRAM)", draft-kadansky-tram-02.txt, Work in
1255	Progress.

1257	15) Whetten, B., M. Basavaiah, S. Paul, T. Montgomery, "RMTP-II
1258	Specification", draft-whetten-rmtp-ii-00.txt, April 8, 1998. Work
1259	in Progress.

1261	16) draft-ietf-rmt-bb-tree-config-00.txt

1263	8. Acknowledgments

1265	Special thanks goes to the following individuals, who have
1266	contributed to the design and review of this document.

1268	Supratik Bhattacharyya, Sprint Labs

1270	Seok Koh, ETRI Korea
1271	Joseph Wesley, Sun Microsystems

1273	9. Author's Addresses

1275	Brian Whetten
1276	Talarian Corporation
1277	333 Distel Circle
1278	Los Altos CA 94022
1279	whetten@talarian.com

1281	Dah Ming Chiu
1282	Sun Microsystems Laboratories
1283	1 Network Drive
1284	Burlington, MA 01803
1285	dahming.chiu@sun.com

1287	Sanjoy Paul
1288	Edgix Corporation
1289	130 W. 42nd Street, Suite 850
1290	New York, NY 10036
1291	sanjoy@edgix.com

1293	Miriam Kadansky
1294	Sun Microsystems Laboratories
1295	1 Network Drive
1296	Burlington, MA 01803
1297	miriam.kadansky@east.sun.com

1299	Gursel Taskale
1300	Talarian Corporation
1301	333 Distel Circle
1302	Los Altos CA 94022
1303	whetten@talarian.com

1305	Full Copyright Statement

1307	Copyright (C) The Internet Society, 2000.  All Rights Reserved. This
1308	document and translations of it may be copied and furnished to others, and
1309	derivative works that comment on or otherwise explain it or assist in its
1310	implementation may be prepared, copied, published and distributed, in whole
1311	or in part, without restriction of any kind, provided that the above
1312	copyright notice and this paragraph are included on all such copies and
1313	derivative works. However, this document itself may not be modified in any
1314	way, such as by removing the copyright notice or references to the Internet
1315	Society or other Internet organizations, except as needed for the purpose
1316	of developing Internet standards in which case the procedures for
1317	copyrights defined in the Internet Standards process must be followed, or
1318	as required to translate it into other languages.