idnits 2.17.1 

draft-mannie-stc-ion-msp-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-27) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 13) being 59 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 149 has weird spacing: '...opology   vs. ...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 1996) is 10056 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'SCSP' is defined on line 597, but no explicit
     reference was found in the text

  == Unused Reference: 'Epidemic' is defined on line 600, but no explicit
     reference was found in the text

  == Unused Reference: 'LNNI' is defined on line 606, but no explicit
     reference was found in the text

  == Unused Reference: 'IGMP' is defined on line 609, but no explicit
     reference was found in the text

  == Unused Reference: 'OSPF' is defined on line 612, but no explicit
     reference was found in the text

  == Unused Reference: 'PNNI' is defined on line 614, but no explicit
     reference was found in the text

  -- Unexpected draft version: The latest known version of 
     draft-ietf-ipatm-ipmc is -11, but you're referring to -12.

  == Outdated reference: A later version (-14) exists of
     draft-ietf-rolc-nhrp-09

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MPOA'

  ** Obsolete normative reference: RFC 1577 (ref. 'Classical') (Obsoleted by
     RFC 2225)

  -- No information found for draft-luciani-rolc-scsp - is the name correct?

  -- Possible downref: Normative reference to a draft: ref. 'SCSP' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Epidemic'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'LANE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'LNNI'

  ** Obsolete normative reference: RFC 1583 (ref. 'OSPF') (Obsoleted by RFC
     2178)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PNNI'


     Summary: 10 errors (**), 0 flaws (~~), 10 warnings (==), 10 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	IP Over NBMA Working Group                                   Eric Mannie
2	INTERNET-DRAFT                                            Marc De Preter
3	Expires 21th of April 1997                                     (ULB-STC)
4	<draft-mannie-stc-ion-msp-00.txt>
5	                                                            October 1996

7	         Multicast Synchronization Protocol (MSP)

9	Status of this Memo

11	This document is an Internet-Draft. Internet-Drafts are working
12	documents of the Internet Engineering Task Force (IETF), its areas,
13	and its working groups.  Note that other groups may also distribute
14	working documents as Internet- Drafts.

16	Internet-Drafts are draft documents valid for a maximum of six months
17	and may be updated, replaced, or obsoleted by other documents at any
18	time.  It is inappropriate to use Internet-Drafts as reference
19	material or to cite them other than as ``work in progress.''

21	To learn the current status of any Internet-Draft, please check the
22	``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
23	Directories on ds.internic.net (US East Coast), nic.nordu.net
24	(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (PacificRim).

26	Abstract

28	This document defines a Multicast Synchronization Protocol (MSP)
29	designed to avoid traditional problems related to the use of unicast
30	(pt-to-pt) synchronization protocols (such as those encountered with
31	OSPF, P-NNI, SCSP, epidemic protocols,...). These protocols imply the
32	establishment and maintenance of a topology of servers (tree, star,
33	line, mesh,...). It is not obvious to find neither the best topology
34	for a given synchronization protocol nor the best algorithm which
35	allows to create this topology. An attempt to study the influence of
36	the spatial distribution (topology) has been made, for instance, in
37	Epidemic algorithms by Xerox Parc, which showed interesting results.
38	Moreover, traditional synchronization protocols notably imply a
39	convergence time and traffic which is proportional to the size of the
40	topology. We believe that reducing the topology to a set of members of
41	a single multicast group could reduce both convergence time and
42	traffic. Note that in that case, no configuration algorithm is
43	required.

45	1. Introduction

47	MSP allows to synchronize a replicated database between various
48	servers which are all members of the same server group (SG). It takes
49	advantage of multicast capabilities provided by a growing number of
50	underlying layers. It is suitable in environments supporting multicast
51	(group) addresses (such as ethernet and SMDS) or point-to-multipoint
52	connections (such as ATM). In this context all servers can directly
53	communicate with all servers in the same group, using the underlying
54	multicast capability. No particular topology (except the underlying
55	multicast topology itself) nor configuration algorithm (such as the
56	Spanning Tree,...)  are required. No problem due to critical links and
57	topology partitions occurs. This protocol is very robust as an update
58	generated by a server is directly received by all other servers in the
59	same group. Finally, MSP is a generic protocol and is defined
60	independently of the particular database or cache to synchronize.

62	2. Overview

64	MSP is a generic protocol defined independently of the particular
65	database to synchronise. The MSP pdus may be either self transported
66	or part of other protocols and used as fields of these protocols. For
67	instance MSP can be supported on the top of an IP multicast service,
68	on the top of an Ethernet network, on the top of an ATM service or it
69	can be a part of Classical IP, MARS or NHRP. It can even be used to
70	synchronize different databases in parallel, in the same pdus.

72	Each server is the owner of a part of the database (e.g.  bindings by
73	local clients) and has a unique server ID. It maintains a copy of the
74	complete database (replicated database), each entry is either locally
75	owned or learned from another server. An entry which belongs to a
76	particular server is tagged with a timestamp (event time) and the
77	server ID of its owner. This timestamp identifies an update of the
78	entry and is set each time the entry is updated. The timestamp is
79	unique in the context of the owner and is the concatenation of a
80	standard time and a sequence number.

82	All servers are directly connected by a multicast group or a mesh of
83	point-to-multipoint connections. Each entry update generated by a
84	server is sent with its timestamp and the server ID. This update is
85	directly received by all other servers and each local database is
86	updated accordingly.  Updates are packed with their timestamps in
87	pdus, which are logically grouped into transactions. Transactions
88	allow to speed up the detection of the loss of a part of the pdus.

90	Pdus are not positively acknowledged (ACK) by receiving servers. If a
91	server detects some missing pdus, it sends a NACK in the multicast
92	group for the corresponding missing updates identified by their
93	timestamps. In order to avoid an implosion of concurrent NACKs and
94	reduce the total number of transmitted NACKs, a technique similar to
95	IGMP is used. A server waits for a delay randomly chosen between zero
96	and D milliseconds before sending a NACK. If, during this period,
97	another NACK is seen for the same timestamps, the NACK generation is
98	cancelled and the server waits for the retransmitted updates. This
99	scheme is based on the fact that if a pdu is lost in the context of a
100	multicast group, there are probably more than one server which have
101	not received the pdu. In addition, it should be noted that such pdus
102	are essentially small pdus and that the expected error rate of the
103	underlying layer should be very low (e.g. ATM).

105	Small hello pdus are generated periodically at each hello interval and
106	include the server id and its last attributed timestamp. Hello pdus
107	are needed to detect the loss of a set of complete transactions.

109	When a server or a connection (re)starts, either all the database has
110	been lost, or, only the most recent updates have not been
111	received. The corresponding server sends a SYNC pdu to the multicast
112	group, either indicating that all the database has to be
113	retransmitted, or, listing each server ID and its last received
114	timestamp. SYNC are not sent directly; the same scheme as the one used
115	for NACKs is used again; it allows a server to take advantage of the
116	resynchronization requested by another server. Receiving servers send
117	transactions for the missing information, and could use the same
118	scheme as the one used with NACKs and SYNCs. Each server only sends
119	its own updates and only if their timestamps are greater than the
120	requested one. If some timestamps are not available any longer, it
121	means that the corresponding information has been updated by a more
122	recent update and that only this last update has to be retransmitted
123	by the owner. Only more recent updates are retransmitted. Selective
124	SYNCs are resent if a part of the requested updates has not been
125	received, obsolete timestamps are advertised.

127	For the configuration point of view, only a single multicast address
128	or a list of server addresses has to be configured (e.g. obtained
129	through a "LEC" like configuration server [LANE]). No particular
130	topology has to be built, no configuration algorithm is needed, no
131	problem occurs if a server fails to rebuild the topology !

133	Finally, if we only consider a topology of servers connected by
134	point-to-point connections, MSP acts like a traditional
135	synchronization protocol as explained in the MSP unicast chapter.

137	     S1     S2     S3                    S1  S2
138	     |      |      |                      \  /\
139	     |      |      |                       \/  \
140	   +-----------------+                    S3   S4
141	   |                 |                    /
142	   | Multicast Group |---S4              /
143	   |                 |                  S5-----S6
144	   +-----------------+                  /
145	     |      |      |                   /
146	     |      |      |                  S7
147	     S5     S6     S7

149	       Multicast topology   vs. traditional topology

151	3. Server Group

153	MSP allows to synchronize a replicated database between various
154	servers which are all members of the same server group (SG). It
155	ensures that within a short duration, all servers in the SG will have
156	exactly the same copy of the database. The scope of a server group
157	could be restricted to servers which are all connected to the same LIS
158	[Classical], LAG [NHRP], cluster [MARS], ELAN [LANE] or IASG [MPOA].

160	Each server group (SG) is identified by a server group ID (SGID). This
161	allows to support multiple server groups at the same time in the same
162	domain. This SGID only needs to be unique in this domain and could,
163	for instance, consist of the multicast address used to identify the
164	servers.

166	Moreover, each server in a SG is uniquely identified by a server ID
167	(SID). This ID could be the internetwork layer address of the server
168	itself, e.g. its IP address. Each pdu transmitted by a server will be
169	tagged with its server group ID (SGID) and server ID (SID). Using a
170	complete internetwork layer address as SID allows to quickly identify
171	a server and to facilitate the management. Both the SGID and the SID
172	are represented on 32 bits.

174	4. Underlying Layer

176	MSP is designed to take advantage of multicast capabilities provided
177	by the underlying layer. This version mainly focuses on the use of
178	unidirectional point-to-multipoint connections and full multicast
179	service such as supported by an Ethernet network. MSP may be supported
180	over the IP layer itself or over any NBMA network such as an SMDS or
181	ATM (UNI3.0/3.1 or later) network.

183	It should be noted that, if no multicast capability at all is
184	supported by the underlying layer, MSP mainly acts like a traditional
185	point-to-point synchronization protocol (see specific chapter).

187	5. Topology

189	The topology is reduced to the scope of a single multicast group. No
190	particular algorithm nor protocol are required to establish and
191	maintain this topology. The address of the multicast group or the list
192	of all members may be directly obtained through a configuration server
193	as the LECS [LANE].

195	All servers are directly connected by a multicast group or a mesh of
196	point-to-multipoint connections. In the worst case, n
197	point-to-multipoint connections are needed when n servers have to be
198	synchronized. In a near future, when multipoint- to-multipoint
199	connections will be available, the need to support n connections will
200	be cancelled.

202	It is important to note that these point-to-multipoint connections are
203	only supported by servers (never by clients) and that these servers
204	could be efficiently implemented over internetworking units, such as
205	ATM switches. In addition, the behaviour of these servers is very
206	static as they do not appear and disappear continuously in a server
207	group. This considerably simplifies the management of the connections.

209	A multicast topology is very robust as every message is directly
210	received by all servers in the SG and as there is no single point of
211	failure or forwarding point at the server level. The complexity of
212	establishing the topology and dealing with dynamic topology
213	partitioning is left to the underlying layer, at the level of routers
214	or switches.

216	Resources required by servers for message generation are reduced since
217	a message is once sent and is never repeated from server to
218	server. Forwarding of messages is better achieved by transport
219	protocols than by synchronization protocols. It also results that the
220	time needed by an update to reach all servers is very reduced and not
221	directly proportional to the number of servers.

223	More resources are required to receive messages but this process may
224	be optimized by using appropriate filtering on incoming messages and
225	database lookup, e.g. using dedicated hardware such as CAM (contents
226	access memory). The same kind of problem is solved in Ethernet
227	networks. Filtering on an incoming UPDATE pdu received from a given
228	server is easily applied, by comparing the recorded last received
229	timestamp from that server with the smallest (first) or largest (last)
230	timestamp of that pdu. If the first timestamp in the pdu is greater
231	than the recorded timestamp, the complete pdu updates the local
232	database. If the last timestamp in the pdu is smaller than the
233	recorded timestamp, the complete pdu may be dropped (except if in
234	response to a NACK). Otherwise, all entries between the recorded
235	timestamp and the end of the UPDATE pdu update the local database.

237	6. Database and Timestamp

239	MSP is defined independently of the particular database (cache) to
240	synchronise. Database entries are transparently transported and are
241	defined by the particular protocols whose databases are
242	synchronized. The encoding of these entries will be defined in
243	specific appendixes of this document or in companion documents. The
244	database entries describe for instance bindings between IP addresses
245	and ATM addresses. Unlike other synchronization protocols, MSP does
246	not use summaries of database entries. MSP identifies events and
247	ensures that the most recent event for each entry has been received by
248	all servers.

250	Each server is the owner of the part of the database which is
251	generated by its own clients. It maintains a copy of the complete
252	database but is only responsible for the entries it owns and can only
253	transmit these entries. For instance, each entry in a database may be
254	either tagged as locally owned or learned from another server.

256	Each event (client action) in the database is identified by a
257	timestamp which is unique in the context of the server which generates
258	that event. This timestamp is the concatenation of a standard time
259	(e.g. GMT) and a real contiguous sequence number. This time may be
260	local to each server, a global time is not needed by this protocol,
261	i.e.  clocks do not need to be synchronized. The sequence number is
262	incremented by one at each new event. It allows also to support more
263	than one event per tick of the standard time.

265	Each entry in the database is associated with a timestamp set by the
266	owner server at the moment when the client generates or updates the
267	entry. A server cannot change timestamps for entries it does not
268	own. Timestamps are used by servers to identify missing information in
269	their database.

271	If a server receives an entry which is already included in its
272	database, it must compare the two timestamps and keep the entry with
273	the most recent (greater) timestamp.

275	A timestamp X is greater (more recent) than a timestamp Y if the
276	standard time of X is greater than the standard time of Y or if they
277	are equal, the sequence number of X is greater than the sequence
278	number of Y.

280	When a server starts for the first time or looses its knowledge of its
281	current timestamp (after a crash for instance), it only has to
282	generate a new value for the standard time and restart the sequence
283	numbering at zero.  The same scheme is applied when the sequence
284	number wraps out. The standard time is never modified when a timestamp
285	sequence number is incremented. It is only used to insure a unique
286	value during the lifetime of a server.

288	7. Transactions

290	Database entries are transmitted in variable length UPDATE pdus. The
291	size of an UPDATE pdu may be limited in order to fit in the particular
292	MTU supported by the underlying layer.  UPDATE pdus related to events
293	close in time are logically grouped into transactions. A transaction
294	is delimited by a start flag in its first UPDATE pdu and by a stop
295	flag in its last UPDATE pdu. The shortest transaction contains a
296	single UPDATE pdu with both the start and stop flags set.

298	     +------------+-------------------+
299	     | Pdu header | Entry 1...        |
300	     |(#seq,...)  |        ...Entry n |
301	     +------------+-------------------+

303	               UPDATE pdu

305	Transactions are generated in response to synchronization request
306	(SYNC received) or in response to a NACK or when a set of entries have
307	to be flooded. An empty transaction is sent to indicate that the
308	requested timestamps are not available any more (obsolete), following
309	a synchronization request or a NACK. Such a transaction is made of one
310	empty UPDATE pdu identifying the requested timestamps.

312	Transactions are not numbered but all UPDATE pdus are numbered
313	sequentially among all transactions generated by the same server. All
314	entries in a transaction and in its subsequent UPDATE pdus are sorted
315	by timestamp order. No sort algorithm is needed at all, only a sorted
316	list per server has to be maintained (trivial since timestamps are
317	generated in order). The numbering of UPDATE pdus is contiguous so
318	that a server can immediately detect missing pdus. UPDATE pdu sequence
319	numbers should not be confused with timestamp sequence numbers. PDU
320	sequence numbers are needed since timestamp sequence numbers are not
321	always contiguous (if a server [re-]starts or when entries are
322	obsoleted).

324	     +-----
325	     |
326	     |   +------------+-------------------+
327	     |   | Pdu header | Entry 1...        | UPDATE
328	     |   |(#seq,...)  |        ...Entry n |   pdu
329	     |   +------------+-------------------+
330	     |              .
331	     |              .
332	     |              .
333	     |   +------------+-------------------+
334	     |   | Pdu header | Entry 1...        | UPDATE
335	     |   |(#seq,...)  |        ...Entry n |   pdu
336	     |   +------------+-------------------+
337	     +-----

339	            A transaction

341	Transactions are useful in order to detect faster the loss of the last
342	UPDATE pdus sent by a server. This loss may be detected either by a
343	timer waiting for the end of the current transaction, or when the
344	first UPDATE pdu of the next transaction is received or when the next
345	HELLO pdu is received from the server generating that
346	transaction. These HELLO pdus are generated regularly by each server
347	to indicate its last timestamp value.

349	A transaction starts either immediately when its first UPDATE pdu is
350	built or after a small random delay in order to avoid multicast
351	storms. During this delay, new updates may be generated by clients and
352	added in the transaction.

354	If a gap in the numbering of UPDATE pdus is detected, it means either
355	that a part of a transaction has been lost or that a complete
356	transaction or set of transactions have been lost. A NACK pdu is sent,
357	indicating the last received timestamp before the gap and the first
358	received timestamp just after the gap. The last received timestamp
359	before the gap was received in an UPDATE pdu in the same transaction
360	or in a previous transaction.

362	If after a timeout (NACK interval), the requested entries have not
363	been retransmitted (in a new transaction), a NACK pdu is retransmitted
364	for the same timestamps. After a maximum number of NACK
365	retransmission, the corresponding server is considered as not
366	available any more. All its updates are kept physically in the
367	database but the server memorizes the fact that the last valid
368	received timestamp was the last one received before the gap. When the
369	previously unreachable server will become available again, it will
370	sent an HELLO or a SYNC or a transaction and the current server will
371	discover that a set of timestamps are missing. A NACK pdu consist in a
372	list of server IDs with a list of timestamp intervals for each of
373	these IDs. In the simplest case, a NACK pdu is made of a single ID
374	with a single interval. A specific timestamp value is used to indicate
375	an open interval such as [x, [ (i.e. from value x to the infinite).

377	If a server crashes and loses its current UPDATE pdu sequence number,
378	it restarts the numbering at zero. In that case no problem occurs as
379	it resynchronizes its database as described in the synchronization
380	chapter. In order to be able to detect a gap in UPDATE streams, a
381	server keeps the last pdu sequence number received from each
382	server. After having received a SYNC pdu, a server must transparently
383	set that number to the value contained in the next received UPDATE pdu
384	from the corresponding server, without checking for a gap.

386	MSP may be implemented over a service which does not prevent pdu
387	misordering. In that case, a server should wait for a small timer
388	before deciding that a pdu is lost, in order to have a chance to
389	re-order the pdus. When that timer expires, a gap is detected and a
390	NACK is sent.

392	8. Synchronization

394	The synchronization of information is required in two cases: when an
395	update has to be flooded and when a server has to build or rebuild its
396	database. The first case is detailed in the Update flooding
397	chapter. The last case is the synchronization process itself. It
398	occurs, for instance, when a server joins the SG for the first time
399	without having any idea of any existing binding or when a server joins
400	the SG already having a part of the bindings (e.g. when a broken
401	underlying connection is rebuilt).

403	A server having to synchronize with the rest of the group will first
404	build a list made of all server IDs included in its database (possibly
405	empty). For each entry in the list, it adds the corresponding highest
406	known timestamp. This list is inserted in a SYNC pdu, together with
407	the total number of elements, the synchronizing server ID and its own
408	highest timestamp. This pdu is multicasted to the group.

410	Each server receiving a SYNC pdu scans the list for its own ID. If it
411	finds its ID, it builds a transaction containing all entries locally
412	owned and whose timestamps are greater than the required one. If that
413	required timestamp is greater or equal to its own highest timestamp,
414	no entries have to be send and an empty transaction is build,
415	signalling that the synchronizing server is up to date. Finally, the
416	transaction is multicasted to the group.

418	If a server does not find its ID in the list included in a SYNC pdu,
419	it means either that it has joined the group while the synchronizing
420	server was unreachable or that it does not own any entry in the global
421	database. In the first case, it has to build and send a transaction
422	for all entries it owns.  In the last case, it does not have to send
423	any transaction.

425	Finally, each server receiving a SYNC pdu checks the synchronizing
426	server timestamp. If the local timestamp associated with the
427	synchronizing server is lower than the received one, the server has
428	not received a part of the synchronizing server own database and sends
429	a NACK.

431	The SYNC pdu is a kind of summary of the database known by the
432	synchronizing server. Its length is bounded by the total number of
433	servers in the SG. If a SYNC pdu does not reach a subset of the
434	servers, the synchronizing server will not receive any transaction in
435	response from these servers and will retransmit, after a timeout, a
436	SYNC pdu for these servers only.

438	The multicasting of each transaction improves the robustness of the
439	protocol and allows also other servers to learn entries before
440	starting their own synchronization (if still needed after the silent
441	listening). Transactions in response to a SYNC may be multicasted on a
442	separate multicast address on which only servers which have to
443	synchronize are listening. This last solution reduces the traffic
444	which is globally multicasted and allows also each server to
445	independently decide if it wants or not to receive the synchronization
446	traffic.

448	If all servers or a large number of servers have lost their
449	connectivity at the same time, the multicast scheme is very
450	efficient. If the real multicasting is not supported and if the
451	transmission on point-to-multipoint connections is not desired, it is
452	possible to use an on-demand point-to-point connections.

454	9. Update Flooding

456	Each time a client modifies its entry in a server, a new update is
457	generated and has to be flooded to all servers in the SG. The owner
458	server associates a timestamp with the update and builds a transaction
459	to flood that update. This update is included in an UPDATE pdu. The
460	server may send a transaction for that single update or it may group a
461	number of updates together in one or more UPDATE pdus in the same
462	transaction. Transactions may be sent immediately or after a small
463	random delay (see chapter on multicast storm).

465	10. Hello Pdu

467	Small HELLO pdus are sent periodically at each hello interval. Each
468	pdu includes the server ID and the last used timestamp of the
469	sender. This timestamp allows to detect the loss of a part of the
470	updates sent by the server which has generated the HELLO pdu. In
471	particular, it allows to detect the loss of a complete transaction or
472	a set of complete transactions.

474	When a server receives a given number of HELLO pdus indicating that it
475	has missed a few updates (its timestamps from the sending servers are
476	out of date), it may decide to resynchronize its database and generate
477	a SYNC. Otherwise, it may send an NACK pdu for each of these servers.

479	11. Multicast Storm

481	In order to avoid a multicast storm of NACKs when some UPDATE pdus are
482	lost or a storm of SYNCs when many servers have to synchronize at the
483	same time or a storm of transaction, a technique similar to IGMP may
484	be used. Before sending a NACK, a SYNC or the beginning of a
485	transaction, a server may wait for a small random delay between 0 and
486	D milliseconds. During this delay, the server listens to MSP pdus,
487	receives all transactions and updates its database.  This silent
488	listening may result in a decreasing traffic and cancel some local
489	operations as explained hereafter.

491	If a server wants to send a NACK and that during the random delay, a
492	NACK is seen for the same set or subset of timestamps, the server
493	waits for the responding transactions. If after, that delay all of its
494	requested timestamps have been received, the generation of the NACK is
495	cancelled. The previous explanations on NACK retransmission are also
496	applicable here.

498	If a server wants to send a SYNC and that during the random delay,
499	other compatible SYNCs have been seen, it waits for the corresponding
500	transactions and decides after if its SYNC is still needed.

502	If a server wants to send a transaction in response to a SYNC, it may
503	also wait for a random delay in order to limit the number of
504	simultaneous transactions transmitted and/or received, to decrease the
505	amount of resources needed.

507	12. Unicast MSP

509	MSP is a multicast or point-to-multipoint protocol but may be also
510	used in an unicast or point-to-point environment. In that case it acts
511	like a traditional synchronization protocol, except mainly that each
512	UPDATE pdu doesn't need to be acked one by one, and that after a
513	failure no complete database summary has to be exchanged in two ways
514	each time.  In this last case, only two small SYNC pdus are exchanged
515	and each server acts as a proxy for the information owned by other
516	servers behind him. Random delays are not needed any more, since there
517	is only one sender for each direction.

519	Of course, as we are back in the point-to-point case, an algorithm is
520	again needed to establish and maintain the topology. Each server knows
521	automatically the servers for which it must act as a proxy by
522	listening the Hello pdus and learning the position of each server. A
523	proxy server generates or forwards Hello pdus for servers it
524	represents.

526	A server wishing to synchronize will send a SYNC pdu to each server it
527	is connected to. These servers will respond with their own updates and
528	those of the servers they represent.  The SYNC scheme is the same as
529	the one used in the multicast case, except for the random delay
530	technique.

532	A server knows which servers it represents, by keeping a trace of the
533	connections where HELLO pdus are received from.  It represents all the
534	servers that sends HELLO pdus on all its connections other than the
535	one where it has received the SYNC pdu.

537	When a transaction is received from a neighbour server, the receiving
538	server must directly repeat this update on all its connections, except
539	the one on which it was received.

541	A server must also respond to each received NACK destinated to itself
542	or to one of the servers for which it acts as a proxy. In addition, it
543	does not need to wait for a random delay when it generates a NACK.

545	13. Further Study (not in this version)

547	In order to reduce the number of pdu formats, a SYNC pdu could be
548	implemented as a NACK pdu. In that case, a flag is used to indicate
549	that the NACK is a synchronization. Only the last received timestamp
550	(open interval) is given for each known server. A server which does
551	not see its ID in that list must retransmit all its own entries.

553	Transactions could be suppressed if the HELLO pdu rate is high enough
554	to allow to quickly detect the loss of the last transmitted pdu from a
555	given server. In the current version, the reception of the last pdu of
556	a transaction allows to know that a bundle of pdus have been
557	transmitted and that no further pdu must be waited before the
558	beginning of a next transaction.

560	14. Security

562	When MSP is embedded in another protocol, security considerations are
563	mainly covered by this specific protocol.  Detailed security analysis
564	of this protocol is for further study.

566	Conclusion

568	MSP is a generic multicast synchronization protocol which may also act
569	as traditional unicast protocol. It reduces the traffic by identifying
570	events in the database in place of using database summaries, and by
571	supporting negative acknowledgments (NACK) in place of systematic
572	ACKs. It is particularly suitable in environments with a low error
573	rate such as ATM. It reduces the convergence time and improves the
574	robustness by using a multicast topology where updates are directly
575	received by all servers. No single point of failure exists. No
576	configuration algorithm and protocol are needed, no specific problem
577	occurs if the topology partitions. It takes advantage of the fact that
578	the forwarding of information and the dynamic routing is better
579	achieved by well-known dedicated protocols such as interworking and
580	routing protocols which are implemented anyway to support the normal
581	data transfer service.

583	References

585	[MARS] "Support for Multicast over UNI 3.0/3.1 based ATM
586	Networks.", Armitage, draft-ietf-ipatm-ipmc-12.txt.

588	[NHRP] "NBMA Next Hop Resolution Protocol (NHRP)", Luciani,
589	Katz, Piscitello, Cole, draft-ietf-rolc-nhrp-09.txt.

591	[MPOA] "Baseline Text for MPOA, draft", C. Brown, ATM Forum
592	95-0824R6, February 1996.

594	[Classical] "Classical IP and ARP over ATM", Laubach, RFC
595	1577.

597	[SCSP] "Server Cache Synchronization Protocol (SCSP) -
598	NBMA", J. Luciani et al., draft-luciani-rolc-scsp-03.txt

600	[Epidemic] "Epidemic Algorithms for Replicated Database
601	Maintenance", Demers et al., Xerox PARC.

603	[LANE] LAN Emulation over ATM Version 1.0, ATM Forum af-
604	lane-0021.000, January 1995.

606	[LNNI] LAN Emulation over ATM Version 2 - LNNI specification
607	- Draft 3 ATM Forum 95-1082R3, April 1996.

609	[IGMP] "Host Extensions for IP Multicasting", S. Deering,
610	STD 5, rfc1112, Stanford University, February 1989.

612	[OSPF] "OSPF Version 2", Moy, RFC1583.

614	[PNNI] "PNNI Specification version 1", Dykeman, Goguen, ATM
615	Forum af-pnni-055.000, March 1996.

617	Acknowledgments

619	Thanks to all who have contributed but particular thanks to Andy Malis
620	from Nexen and Ramin Najmabadi Kia from ULB.

622	Author's Address

624	   Eric Mannie
625	   Brussels University (ULB)
626	   Service Telematique et Communication
627	   CP 230, bld du Triomphe
628	   1050 Brussels, Belgium
629	   phone: +32-2-650.57.17
630	   fax:   +32-2-629.38.16
631	   email: mannie@helios.iihe.ac.be

633	   Marc De Preter
634	   Brussels University (ULB)
635	   Service Telematique et Communication
636	   CP 230, bld du Triomphe
637	   1050 Brussels, Belgium
638	   phone: +32-2-650.57.17
639	   fax:   +32-2-629.38.16
640	   email: depreter@helios.iihe.ac.be

642	Appendix 1 - PDU Format

644	For further study