idnits 2.17.1 draft-ietf-mmusic-confarch-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 53 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 291 has weird spacing: '...rs need hold ...' == Line 293 has weird spacing: '...unicast of m...' == Line 294 has weird spacing: '...tween on-tree...' == Line 347 has weird spacing: '...or slow links...' == Line 595 has weird spacing: '... * With reser...' == (4 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2000) is 8686 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '1' is defined on line 1250, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 1268, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 1272, but no explicit reference
     was found in the text

  == Unused Reference: '15' is defined on line 1299, but no explicit
     reference was found in the text

  == Unused Reference: '19' is defined on line 1312, but no explicit
     reference was found in the text

  == Unused Reference: '20' is defined on line 1315, but no explicit
     reference was found in the text

  == Unused Reference: '23' is defined on line 1326, but no explicit
     reference was found in the text

  == Unused Reference: '26' is defined on line 1335, but no explicit
     reference was found in the text

  == Unused Reference: '27' is defined on line 1339, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Downref: Normative reference to an Informational RFC: RFC 2689 (ref. '2')

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  ** Downref: Normative reference to an Experimental RFC: RFC 1075 (ref. '4')

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  ** Obsolete normative reference: RFC 2362 (ref. '7') (Obsoleted by RFC
     4601, RFC 5059)

  -- Possible downref: Non-RFC (?) normative reference: ref. '8'

  -- Possible downref: Non-RFC (?) normative reference: ref. '9'

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'

  -- Possible downref: Non-RFC (?) normative reference: ref. '11'

  -- Possible downref: Non-RFC (?) normative reference: ref. '12'

  -- Possible downref: Non-RFC (?) normative reference: ref. '13'

  -- Possible downref: Non-RFC (?) normative reference: ref. '14'

  -- Possible downref: Non-RFC (?) normative reference: ref. '15'

  -- Possible downref: Non-RFC (?) normative reference: ref. '16'

  -- Possible downref: Non-RFC (?) normative reference: ref. '17'

  -- Possible downref: Non-RFC (?) normative reference: ref. '18'

  -- Possible downref: Non-RFC (?) normative reference: ref. '19'

  ** Downref: Normative reference to an Historic RFC: RFC 1421 (ref. '20')

  -- Possible downref: Non-RFC (?) normative reference: ref. '21'

  -- Possible downref: Non-RFC (?) normative reference: ref. '22'

  ** Downref: Normative reference to an Historic RFC: RFC 1584 (ref. '23')

  -- Possible downref: Non-RFC (?) normative reference: ref. '24'

  ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref.
     '25')

  -- Possible downref: Non-RFC (?) normative reference: ref. '26'

  -- Possible downref: Non-RFC (?) normative reference: ref. '27'

  ** Obsolete normative reference: RFC 1889 (ref. '28') (Obsoleted by RFC
     3550)

  -- Possible downref: Non-RFC (?) normative reference: ref. '29'


     Summary: 13 errors (**), 0 flaws (~~), 16 warnings (==), 24 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	INTERNET-DRAFT                 M. Handley/J. Crowcroft/C. Bormann/J. Ott
3	Expires: January 2001                                  ACIRI/UCL/TZI/TZI
4	                                                               July 2000

6	           The Internet Multimedia Conferencing Architecture
7	                     draft-ietf-mmusic-confarch-03

9	Status of this memo

11	   This document is an Internet-Draft and is in full conformance with
12	   all provisions of Section 10 of RFC 2026.

14	   Internet-Drafts are working documents of the Internet Engineering
15	   Task Force (IETF), its areas, and its working groups.  Note that
16	   other groups may also distribute working documents as Internet-
17	   Drafts.

19	   Internet-Drafts are draft documents valid for a maximum of six months
20	   and may be updated, replaced, or obsoleted by other documents at any
21	   time.  It is inappropriate to use Internet- Drafts as reference
22	   material or to cite them other than as "work in progress."

24	   The list of current Internet-Drafts can be accessed at
25	   http://www.ietf.org/ietf/1id-abstracts.txt

27	   The list of Internet-Draft Shadow Directories can be accessed at
28	   http://www.ietf.org/shadow.html

30	   This document is a product of the Multiparty Multimedia Session
31	   Control (MMUSIC) working group of the Internet Engineering Task
32	   Force.  Comments are solicited and should be addressed to the working
33	   group's mailing list at confctrl@isi.edu and/or the authors.

35	Abstract

37	   This document provides an overview of multimedia conferencing on the
38	   Internet.  The protocols mentioned are specified elsewhere as RFCs,
39	   Internet-Drafts, or ITU recommendations.  Each of these
40	   specifications gives details of the protocol itself, how it works and
41	   what it does.  This document attempts to provide the reader with an
42	   overview of how the components fit together and of some of the
43	   assumptions made, as well as some statement of direction for those
44	   components still in a nascent stage.

46	   (Remove before publication:) This document is a product of the
47	   Multiparty Multimedia Session Control (MMUSIC) working group of the
48	   Internet Engineering Task Force.  Comments are solicited and should
49	   be addressed to the working group's mailing list at confctrl@isi.edu
50	   and/or the authors.

52	1.  Introduction

54	   The Internet is not currently very good at carrying audio and video.
55	   This is hardly surprising as it was not designed or engineered with
56	   real-time traffic in mind, but there has recently been a great deal
57	   of interest in using the Internet for telephony services.  Part of
58	   this has come from pricing anomalies that make internet telephony
59	   somewhat artificially cheaper than traditional telephone services,
60	   but this is not the whole story.  The Internet itself is improving to
61	   better handle traffic such as audio and video, and in the medium
62	   term, the internet should be able to provide good quality realtime
63	   multimedia services, although such quality improvements are likely to
64	   incur additional charges.

66	   However, the real interest in using the internet for audio and video
67	   should come from the prospect for a single ubiquitous communications
68	   network that not only allows traditional telephony services, but also
69	   video, shared collaboration tools, and through IP Multicast, multi-
70	   party conferences and multimedia sessions that scale from small group
71	   meetings through to television sized audiences.  In principle, this
72	   may lead to a ``democratization'' of telecommunication services,
73	   where licenses to broadcast are not required to control physical
74	   access to the limited broadcast medium (although they may still be
75	   required for political reasons).

77	   It is far from clear what services will eventually emerge using such
78	   communications capabilities.  We can only say that the technical
79	   capability to have large numbers of sessions ranging in audience from
80	   hundreds to millions of participants, largely unlimited by geographic
81	   boundaries, will lead to services and social structures that do not
82	   exist today.  However, we can describe the basic technologies that
83	   are likely to bring about such changes, and in this document we
84	   attempt to provide such an overview.  We leave it to the reader to
85	   imagine the uses to which this technology will be put.

87	2.  The Technology

89	   In conjunction with computers, the term ``conferencing'' is often
90	   used in two different ways: firstly, to refer to bulletin boards and
91	   mail list style asynchronous exchanges of messages between multiple
92	   users; secondly, to refer to synchronous or so-called ``real-time''
93	   conferencing, including audio/video communication and shared tools
94	   such as whiteboards and other applications.  This document is about
95	   the architecture for this latter application, multimedia conferencing
96	   in an Internet environment.

98	   There are other infrastructures for teleconferencing in the world:
99	   POTS (Plain Old Telephone System) networks often provide voice
100	   conferencing and phone-bridges, while with ISDN, H.320 [14] can be
101	   used for small, strictly organized video-telephony conferencing.

103	   The architecture that has evolved in the Internet is far more general
104	   as well as being scalable to very large groups, and permits the open
105	   introduction of new media and new applications as they are devised.
106	   As the simplest case, it also allows two persons to communicate via
107	   audio only, so it encompasses IP telephony.

109	   The determining factors of a conferencing architecture are
110	   communication within (possibly large) groups of humans and real-time
111	   delivery of information.  In the Internet, this is supported at a
112	   number of levels.  The remainder of this section provides an overview
113	   of this support, and the rest of the document describes each aspect
114	   in more detail.

116	   In a conference, information must be distributed to all the
117	   conference participants.  Early conferencing systems used a fan-out
118	   of data streams, e.g., one connection between each pair of
119	   participants, which means that the same information must cross some
120	   networks more than once.  The Internet architecture uses the more
121	   efficient approach of multicasting the information to all
122	   participants (see section 3).

124	   Multimedia conferences require real-time delivery of at least the
125	   audio and video information streams used in the conference.  In an
126	   ISDN context, fixed rate circuits are allocated for this purpose --
127	   whether their bandwidth is required at any particular instance or
128	   not.  On the other hand, the traditional Internet service model
129	   (``best effort'') cannot make the necessary quality of service
130	   available in congested networks.  New service models are being
131	   defined in the Internet together with protocols to reserve capacity
132	   or prioritize traffic in a more flexible way than that available with
133	   circuit switching (see section 4).

135	   In a datagram network, multimedia information must be transmitted in
136	   packets, some of which may be delayed more than others.  In order
137	   that audio and video streams be played out at the recipient in the
138	   correct timing, information must be transmitted that allows the
139	   recipient to reconstitute the timing.  A transport protocol with the
140	   specific functions needed for this has been defined (see section 5).
141	   The nature of the Internet reflects that of the world in that it is
142	   very heterogeneous.  Techniques exist to exploit this, and to deliver
143	   appropriate quality to different participants in the same conference
144	   according to their capabilities.

146	   Conference tools such as virtual whiteboards or shared editors are
147	   not concerned with real-time delivery of audio or video but maintain
148	   and update shared state between the participants.  Work on support of
149	   such applications in a multicast environment is in progress (section
150	   6.2).

152	   The humans participating in a conference generally need to have a
153	   specific idea of the context in which the conference is happening,
154	   which can be formalized as a conference policy.  Some conferences are
155	   essentially crowds gathered around an attraction, while others have
156	   very formal guidelines on who may take part (listen in) and who may
157	   speak at which point.  In any case, initially the participants must
158	   find each other, i.e. establish communication relationships
159	   (conference setup, section 7).  During the conference, some
160	   conference control information is exchanged to implement a conference
161	   policy or at least to inform the crowd of who is present (section 6).

163	   In addition, security measures may be required to actually enforce
164	   the conference policy, e.g. to control who is listening and to
165	   authenticate contributions as actually originating from a specific
166	   person.  In the Internet, there is little tendency to rely on the
167	   traditional ``security'' of distribution offered e.g. by the phone
168	   system.  Instead, cryptographic methods are used for encryption and
169	   authentication, which need to be supported by additional conference
170	   setup and control mechanisms (section 8).

172	        Figure 1: Internet multimedia conferencing protocol stacks

174	   |<---       Conference Management       --->|<--- Media Agents   --->|
175	   |                                           |                        |
176	   |         Conference      |    Conference   | Audio/ |    Shared     |
177	   |     Setup & Discovery   |  Course Control | Video  |  Applications |

179	   +-------------------------+------+--------+-+--------+------------+  +
180	   |         S D P           |      | Distr. |  RTP /   |  Reliable  |  |
181	   | SAP | SIP | HTTP | SMTP | RSVP | Ctrl(1)|  RTCP    |Multicast(2)|  |
182	   +-----+--+--+------+------+   +--+--------+----------+------------+--+
183	   |   UDP  |      T C P     |   |                U D P                 |
184	   +--------+----------------+---+--------------------------------------+
185	   |                        IP + IP Multicast                           |
186	   +--------------------------------------------------------------------+
187	   |                 Integrated Services Forwarding                     |
188	   +--------------------------------------------------------------------+

190	   The protocol stacks for internet multimedia conferencing are
191	   illustrated in Figure 1.  Most of the protocols are not deeply
192	   layered unlike many protocol stacks, but rather are used alongside
193	   each other to produce a complete conference.

195	3.  Multicast Traffic Distribution

197	   +--------------+------------------+-----------------------------------+
198	   |Protocol      | Documentation    |  Purpose                          |
199	   +--------------+------------------+-----------------------------------+
200	   |IP Multicast  |  RFC 1112, 2236  |  Host extensions for IP Multicast |
201	   |              |                  |  Multicast routing protocols:     |
202	   |DVMRP         |  RFC 1075        |  Dense-mode Intra-domain          |
203	   |PIM-SM        |  RFC 2362        |  Sparse-mode Intra-domain         |
204	   |PIM-DM        |  Internet Draft  |  Dense-mode Intra-domain          |
205	   |CBT           |  RFC 2189        |  Sparse-mode Intra-domain         |
206	   +--------------+------------------+-----------------------------------+

208	   IP multicast provides efficient many-to-many data distribution in an
209	   internet environment.  It is easy to view IP multicast as simply an
210	   optimization for data distribution, and indeed this is the case, but
211	   IP multicast can also result in a different way of thinking about
212	   application design.  To see why this might be the case, examine the
213	   IP multicast service model, as described by Van Jacobson [8]:

215	   -    Senders just send to the group

217	   -    Receivers express an interest in receiving data sent to the
218	        group

220	   -    Routers conspire to deliver data from senders to receivers

222	   With IP multicast, the group is indirectly identified by a single IP
223	   class-D multicast address.

225	   Several things are important about this service model from an
226	   architectural point of view.  Receivers do not need to know who or
227	   where the senders are to receive traffic from them.  Senders never
228	   need to know who the receivers are.  Neither senders or receivers
229	   need care about the network topology as the network optimizes
230	   delivery.

232	   An IP multicast group is scalable because information about group
233	   membership and group changes at the IP level are kept local to
234	   routers near the relevant members.  How this is performed depends on
235	   the particular multicast routing scheme in use local to the member,
236	   and although it is not a trivial task, several solutions do exist and
237	   therefore multicast routing will not be discussed in detail here.
238	   For more detailed information on multicast routing, see [7, 6, 4, 1,
239	   23].  Typically, as a group with s senders and r receivers increases
240	   in size, state in routers scales O(s) or O(1) depending on the
241	   routing scheme in use.  This state may be in on-tree routers for
242	   newer so called sparse-mode algorithms such as PIM, or in off-tree
243	   routers for older so-called dense-mode algorithms such as DVMRP.
244	   Thus the most scalable current multicast routing algorithms require
245	   O(1) state in on-tree routers, and hence the total routing state
246	   scales O(g) in a router that is on-tree for g groups.  We can also
247	   envisage multicast routing schemes which require less than O(g)
248	   state*, but the requirement is not currently urgent, so none of these
249	   have yet been implemented.

251	   The level of indirection introduced by the IP class D address
252	   denominating the group solves the distributed systems binding
253	   problem, by pushing this task down into routing; given a multicast
254	   address (and UDP port), a host can send a message to the members of a
255	   group without needing to discover who they are.  Similarly receivers
256	   can ``tune in'' to multicast data sources without needing to bother
257	   the data source itself with any form of request.

259	   IP multicast is a natural solution for multi-party conferencing
260	   because of the efficiency of the data distribution trees, with data
261	   being replicated in the network at appropriate points rather than in
262	   end-systems.  It also avoids the need to configure special-purpose
263	   servers to support the session, which require support, and which
264	   cause traffic concentration and can be a bottleneck.  For larger
265	   broadcast-style sessions, it is essential that data-replication be
266	   carried out in a way that only requires per-receiver network-state to
267	   be local to each receiver, and that data-replication occurs within
268	   the network.  Attempting to configure a tree of application-specific
269	   replication servers for such broadcasts rapidly becomes a ``multicast
270	   routing'' problem, and thus native multicast support is a more
271	   appropriate solution.

273	3.1.  Address Allocation

275	    +----------+-------------------+----------------------------------+
276	    |Protocol  | Documentation     |  Purpose                         |
277	    +----------+-------------------+----------------------------------+
278	    |MADCAP    |  Internet Draft   |  DHCP-like client protocol       |
279	    |          |                   |   for address allocation         |
280	    |AAP       |  Internet Draft   |  Intra-domain address allocation |
281	    |MASC      |  Internet Draft   |  Inter-domain address allocation |
282	    |BGMP      |  Internet Draft   |  Inter-domain multicast routing  |
283	    +----------+-------------------+----------------------------------+

285	   How does an application choose a multicast address to use?

287	   In the absence of any other information, we can bootstrap a multicast
288	   application by using well-known multicast addresses.  Routing
289	   (unicast and multicast) and group membership protocols [5] can do
290	_________________________
291	  * with IP encapsulation, not all on-tree routers need  hold  the
292	state for a group whose traffic they are forwarding -- traffic for
293	the group can be encapsulated (either unicast  of  multicast)  be-
294	tween  on-tree  routers  nearer  the edge of the network, reducing
295	some of the state burden on backbone routers.

297	   just that.  However, this is not the best way of managing
298	   applications of which there is more than one instance at any one
299	   time.

301	   For these, we need a mechanism for allocating group addresses
302	   dynamically, and a directory service which can hold these allocations
303	   together with some key (session information for example --- see
304	   later), so that users can look up the address associated with the
305	   application.  The address allocation and directory functions should
306	   be distributed to scale well.

308	   Multicast address allocation is currently an active area of research.
309	   For many years multicast address allocation has been performed using
310	   multicast session directories (see section 7.1), but as the users and
311	   uses of IP multicast increase, it is becoming clear that a more
312	   hierarchical approach is required.

314	   An architecture [10] is currently being developed based around a
315	   well-defined API that an application can use to request an address.
316	   The host then requests an address from a local address allocation
317	   server, which in turn chooses and reserves an unallocated address
318	   from a range dynamically allocated to the domain.  By allocating
319	   addresses in a hierarchical and topologically sensitive fashion, the
320	   address itself can be used in a hierarchical multicast routing
321	   protocol currently being developed (BGMP, [29]) that will help
322	   multicast routing scale more gracefully that current schemes.

324	4.  Internet Service Models

326	             +----------+----------------+--------------------+
327	             |Protocol  | Documentation  |  Purpose           |
328	             +----------+----------------+--------------------+
329	             |IP        |  RFC 791       |  Internet Protocol |
330	             +----------+----------------+--------------------+

332	   Traditionally the internet has provided so-called best-effort
333	   delivery of datagram traffic from senders to receivers.  No
334	   guarantees are made regarding when or if a datagram will be delivered
335	   to a receiver, however datagrams are normally only dropped when a
336	   router exceeds a queue size limit due to congestion.  The best-effort
337	   internet service model does not assume FIFO queuing, although many
338	   routers have implemented this.

340	   With best-effort service, if a link is not congested, queues will not
341	   build at routers, datagrams will not be discarded in routers, and
342	   delays will consist of serialization delays at each hop plus
343	   propagation delays.  With sufficiently fast link speeds,
344	   serialization delays are insignificant compared to propagation
345	   delays*.
346	_________________________
347	  * For slow  links,  a set of mechanisms has  been  defined  that
348	   If a link is congested, with best-effort service, queuing delays will
349	   start to influence end-to-end delays, and packets will start to be
350	   lost as queue size limits are exceeded.  Real-time traffic does not
351	   cope terribly well with packet loss levels of more than a few
352	   percent, although it is possible to add redundancy [12] to increase
353	   the levels at which loss becomes a problem.  In the last few years a
354	   significant amount of work has also gone into providing non-best-
355	   effort services that would provide a better assurance that an
356	   acceptable quality conference will be possible.

358	4.1.  Non-best effort service

360	   Real-time internet traffic is defined as datagrams that are delay
361	   sensitive.  It could be argued that all datagrams are delay sensitive
362	   to some extent, but for these purposes we refer only to datagrams
363	   where exceeding an end-to-end delay bound of a few hundred
364	   milliseconds renders the datagrams useless for the purpose they were
365	   intended.  For the purposes of this definition, TCP traffic is
366	   normally not considered to be real-time traffic, although there may
367	   be exceptions to this rule.

369	   On congested links, best-effort service queuing delays will adversely
370	   affect real-time traffic.  This does not mean that best-effort
371	   service cannot support real-time traffic --- merely that congested
372	   best-effort links seriously degrade the service provided.  For such
373	   congested links, a better-that-best-effort service is desirable.

375	   To achieve this, the service model of the routers can be modified.
376	   FIFO queuing can be replaced by packet forwarding strategies that
377	   discriminate different ``flows'' of traffic.  The idea of a flow is
378	   very general.  A flow might consist of ``all marketing site web
379	   traffic'', or ``all fileserver traffic to and from teller machines''.
380	   On the other hand, a flow might consist of a particular sequence of
381	   packets from an application in a particular machine to a peer
382	   application in another particular machine set up on request, or it
383	   might consist of all packets marked with a particular Type-of-Service
384	   bit.

386	   There is really a spectrum of possibilities for non-best-effort
387	   service something like that shown in Figure 2.

389	               Figure 2: Spectrum of internet service types

391	   best effort            assured by                guaranteed by
392	   unsignalled          type of service           per-flow reservation
393	   +-------------+-------------+-------------+-------------+
394	           prioritized by                assured by
395	           type of service          aggregate reservation

397	_________________________
398	helps minimize serialization and link access delays [2].

400	   This spectrum is intended to illustrate that between best-effort, and
401	   hard per-flow guarantees lie many possibilities for non-best-effort
402	   service, including having hard guarantees based on an aggregate
403	   reservation, assurances that traffic marked with a particular type-
404	   of-service bit will not be dropped so long as it remains in profile,
405	   and simpler prioritization-based services.

407	   Towards the right hand side of the spectrum, flows are typically
408	   identifiable in the Internet by the tuple: source machine,
409	   destination machine, source port, destination port, protocol, any of
410	   which could be ``ANY'' (wildcarded).

412	   In the multicast case, the destination is the group, and can be used
413	   to provide efficient aggregation.

415	   Flow identification is called classification and a class (which can
416	   contain one or more flows) has an associated service model applied.
417	   This can default to best effort.

419	   Through network management, we can imagine establishing classes of
420	   long lived flows -- enterprise networks (``Intranets'') often enforce
421	   traffic policies that distinguish priorities which can be used to
422	   discriminate in favor of more important traffic in the event of
423	   overload (though in an underloaded network, the effect of such
424	   policies will be invisible, and may incur no load/work in routers).

426	   The router service model to provide such classes with different
427	   treatment can be as simple as a priority queuing system, or it can be
428	   more elaborate.

430	   Although best-effort services can support real-time traffic,
431	   classifying real-time traffic separately from non-real-time traffic
432	   and giving real-time traffic priority treatment ensures that real-
433	   time traffic sees minimum delays.  Non-real-time TCP traffic tends to
434	   be elastic in its bandwidth requirements, and will then tend to fill
435	   any remaining bandwidth.

437	   We could imagine a future Internet with sufficient capacity to carry
438	   all of the world's telephony traffic.  Since this is a relatively
439	   modest capacity requirement, it might be simpler to establish
440	   ``POTS'' as a static class which is given some fraction of the
441	   capacity overall, and then within the backbone of the network no
442	   individual call need be given an allocation (i.e. we would no longer
443	   need the call setup/tear down that was needed in the legacy POTS
444	   which was only present due to under-provisioning of trunks, and to
445	   allow the trunk exchanges the option of call blocking).  The vision
446	   is of a network that is engineered with capacity for all of the non-
447	   best-effort average load sources to send without needing individual
448	   reservations.

450	4.2.  Reservations
451	+-----------------+----------------+---------------------------------------+
452	|Protocol         | Documentation  |  Purpose                              |
453	+-----------------+----------------+---------------------------------------+
454	|RSVP             |  RFC 2205      |  Resource ReSerVation Protocol (RSVP) |
455	|Controlled Load  |  RFC 2211      |  Network service model                |
456	| Service         |                |   selected by RSVP                    |
457	|Guaranteed       |  RFC 2212      |  Network service model                |
458	| Service         |                |   selected by RSVP                    |
459	+-----------------+----------------+---------------------------------------+

461	   For flows that may take a significant fraction of the network (i.e.
462	   are ``special'' and can't just be lumped under a static class), we
463	   need a more dynamic way of establishing these classifications.  In
464	   the short term, this applies to many multimedia calls since the
465	   Internet is largely under-provisioned at the time of writing.

467	   RSVP has been standardized for just this purpose.  It provides flow
468	   identification and classification.  Hosts and applications are
469	   modified to speak RSVP client language, and routers speak RSVP.

471	   Since most traffic requiring reservations is delivered to groups
472	   (e.g. TV), it is natural for the receiver to make the request for a
473	   reservation for a flow.  This has the added advantage that different
474	   receivers can make heterogeneous requests for capacity from the same
475	   source.  Thus RSVP can accommodate monochrome, color and HDTV
476	   receivers from a single source (also see section Figure 5).

478	   Again the routers conspire to deliver the right flows to the right
479	   locations.

481	   RSVP accommodates the wildcarding noted above.

483	Admission Control

485	   If a network is provisioned such that it has excess capacity for all
486	   the real-time flows using it, a simple priority classification
487	   ensures that real-time traffic is minimally delayed.  However, if a
488	   network is insufficiently provisioned for the traffic in a real-time
489	   traffic class, then real-time traffic will be queued, and delays and
490	   packet loss will result.  Thus in an under-provisioned network,
491	   either all real-time flows will suffer, or some of them must be given
492	   priority.

494	   RSVP provides a mechanism by which an admission control request can
495	   be made, and if sufficient capacity remains in the requested traffic
496	   class, then a reservation for that capacity can be put in place.

498	   If insufficient capacity remains, the admission request will be
499	   refused, but the traffic will still be forwarded with the default
500	   service for that traffic's traffic class.  In many cases even an
501	   admission request that failed at one or more routers can still supply
502	   acceptable quality as it may have succeeded in installing a
503	   reservation in all the routers that were suffering congestion.  This
504	   is because other reservations may not be fully utilising their
505	   reserved capacity in those routers where the reservation failed.

507	Billing

509	   If a reservation involves setting aside resources for a flow, this
510	   will tie up resources so that other reservations may not succeed, and
511	   depending on whether the flow fills the reservation, other traffic is
512	   prevented from using the network.  Clearly some negative feedback is
513	   required in order to prevent pointless reservations from denying
514	   service to other users.  This feedback is typically in the form of
515	   billing.

517	   Billing requires that the user making the reservation is properly
518	   authenticated so that the correct user can be charged.  Billing for
519	   reservations introduces a level of complexity to the internet that
520	   has not typically been experienced with non-reserved traffic, and
521	   requires network providers to have reciprocal usage-based billing
522	   arrangements for traffic carried between them.  It also suggests the
523	   use of mechanisms whereby some fraction of the bill for a link
524	   reservation can be charged to each of the downstream multicast
525	   receivers.

527	4.3.  Differentiated Services

529	   +-------------------------+----------------+------------------------+
530	   |Protocol                 | Documentation  |  Purpose               |
531	   +-------------------------+----------------+------------------------+
532	   |Differentiated Services  |  RFC 2474      |  DS Field in IP Header |
533	   |Differentiated Services  |  RFC 2475      |  DS Architecture       |
534	   +-------------------------+----------------+------------------------+

536	   Whereas RSVP asks routers to classify packets into classes to achieve
537	   a requested quality of services, it is also possible to explicitly
538	   mark packets to indicate the type of service required.  Of course,
539	   there has to be an incentive and mechanisms to ensure that ``high-
540	   priority'' is not set by everyone in all packets, and this incentive
541	   is provided by edge-based policing and by buying profiles of higher
542	   priority service.  In this context, a profile could have many forms,
543	   but a typical profile might be a token-bucket filter specifying a
544	   mean rate and a bucket size with certain time-of-day restrictions.

546	   This is still an active research area, but the general idea is for a
547	   customer to buy from their provider a profile for higher quality
548	   service, and the provider polices marked traffic from the site to
549	   ensure that the profile is not exceeded.  Within a provider's
550	   network, routers give preferential services to packets marked with
551	   the relevant type-of-service bit.  Where providers peer, they arrange
552	   for an aggregate higher-quality profile to be provided, and police
553	   each other's aggregate if it exceeds the profile.  In this way,
554	   policing only needs to be performed at the edges to a provider's
555	   network on the assumption that within the network there is sufficient
556	   capacity to cope with the amount of higher-quality traffic that has
557	   been sold.  The remainder of the capacity can be filled with regular
558	   best-effort traffic.

560	   One big advantage of differentiated services over reservations is
561	   that routers do not need to keep per-flow state, or look at source
562	   and destination addresses to classify the traffic, and this means
563	   that routers can be considerably simpler.  Another big advantage is
564	   that the billing arrangements for differentiated services are
565	   pairwise between providers at boundaries -- at no time does a
566	   customer need to negotiate a billing arrangement with each provider
567	   in the path*

569	5.  Transport Protocols

571	   So-called real-time delivery of traffic requires little in the way of
572	   transport protocol.  In particular, real-time traffic that is sent
573	   over more than trivial distances is not retransmittable.

575	   With packet multimedia data there is no need for the different media
576	   comprising a conference to be carried in the same packets.  In fact
577	   it simplifies receivers if different media streams are carried in
578	   separate flows (i.e., separate transport ports and/or separate
579	   multicast groups).  This also allows the different media to be given
580	   different quality of service.  For example, under congestion, a
581	   router might preferentially drop video packets over audio packets.
582	   In addition, some sites may not wish to receive all the media flows.
583	   For example, a site with a slow access link may be able to
584	   participate in a conference using only audio and a whiteboard whereas
585	   other sites in the same conference with more capacity may also send
586	   and receive video.  This can be done because the video can be sent to
587	   a different multicast group than the audio and whiteboard.  This is
588	   first step towards coping with heterogeneity by allowing the
589	   receivers to decide how much traffic to receive, and hence allowing a
590	   conference to scale more gracefully.

592	5.1.  Receiver Adaptation and Synchronization

594	_________________________
595	  * With  reservations  there  may  be ways to avoid this too, but
596	they're somewhat more difficult given the more specific nature  of
597	a reservation.

599	                 Figure 3: Network Jitter and Packet Audio
600	                                                               |x|
601	                                                               | | |
602	                                                               |x| |
603	                                                               | | |
604	                                Compression                    |x| v
605	                                + Packetizer                   | |
606	                                 +--------+                 +-------+
607	   Microphone                    |        | 1.5 Mbit/s link |       |
608	               +-----+ A   A   A |        |-----------------| Router|
609	   /~\------+__|     |>>> >>> >>>|        |A   A   A   A   A|       |
610	   \_/------+  |A->D |20 ms Audio|        |-----------------|       |
611	               +-----+Timeslices |        |   --------->    |       |
612	                                 |        |                 +-------+
613	                                 +--------+                    |A|
614	                                                               | | |
615	                                               Shared link:    |x| |
616	                                               Audio traffic   |A| |
617	                                               interspersed w  |A| |
618	                                               other traffic   |x| |
619	                                                               |x| |
620	                                                               |x| |
621	                                Depacketizer                   |A| |
622	                                + Timing recovery              |A| v
623	                                + Decompression                | |
624	                                 +--------+                 +-------+
625	      Speaker                    |        | 1.5 Mbit/s link |       |
626	   |\          +-----+ A   A   A |        |-----------------| Router|
627	   | +---+     |     |<<< <<< <<<|        |A       A   AA A |       |
628	   | |   |-----|D->A |20 ms Audio|        |-----------------|       |
629	   | +---+     +-----+Timeslices |        |   <---------    |       |
630	   |/                            |        |                 +-------+
631	                                 +--------+                    | |
632	                                                               |X| |
633	                                                               |X| |
634	                                                               | | |
635	                                                               |X| v
636	                                                               | |

638	   Best-effort traffic is delayed by queues in routers between the
639	   sender and the receivers.  Even reserved priority traffic may see
640	   small transient queues in routers, and so packets comprising a flow
641	   will be delayed for different times.  Such delay variance is known as
642	   jitter, and is illustrated in Figure 3.

644	   Real-time applications such as audio and video need to be able to
645	   buffer real-time data at the receiver for sufficient time to remove
646	   the jitter added by the network and recover the original timing
647	   relationships between the media data.  In order to know how long to
648	   buffer for, each packet must carry a timestamp which gives the time
649	   at the sender when the data was captured.  Note that for audio and
650	   video data timing recovery, it is not necessary to know the absolute
651	   time that the data was captured at the sender, only the time relative
652	   to the other data packets.

654	                   Figure 4: Inter-media synchronization

656	   Incoming packets

658	   ----------------     +----------------+
659	   V A  V   AV    A --> |   Host         |
660	   ----------------     |   Demuxing     |
661	                        +----------------+
662	                       /                  \
663	                      /                    \
664	           A      A  /  A           V   V   \  V
665	                    v                        v
666	      +---------------+                     +---------------+
667	      | Depacketizer  |  per source         | Depacketizer  |
668	      +---------------+  delay adaptation:  +---------------+
669	           v           \   45 ms    95 ms  /          v
670	      +------------+     \               /     +------------+
671	      | format     |       \           /       | format     |
672	      | conversion |    +------------------+   | conversion |
673	      +------------+    | synchronization  |   +------------+
674	           |            |      agent       |          |
675	           |            +------------------+          |
676	           | mix           /           \              |
677	           v              /             \             v
678	        |     |          /               \         |     |
679	        +-----+         /                 \        +-----+
680	        |     |        /                   \       |     |
681	        +-----+       /                     \      +-----+
682	        |  A  |      / 95 ms           95 ms \     |  V  |
683	        +-----+     /                         \    +-----+
684	        |  A  | <--+                           +-> |  V  |
685	        +-----+          /|         +--------+     +-----+
686	        |  A  |     +---+ |         |/------\|     |  V  |
687	        +-----+>>>>>|   | |         ||      ||<<<<<+-----+
688	                    +---+ |         |\------/|
689	                         \|         +--------+

691	   As audio and video flows will receive differing jitter and possibly
692	   differing quality of service, audio and video that were grabbed at
693	   the same time at the sender may not arrive at the receiver at the
694	   same time.  At the receiver, each flow will need a playout buffer to
695	   remove network jitter.  Inter-flow synchronization can be performed
696	   by adapting these playout buffers so that samples/frames that
697	   originated at the same time are played out at the same time (see
698	   figure Figure 4).  This requires that the time base of different
699	   flows from the same sender can be related at the receivers, e.g. by
700	   making available the absolute times at which each of them was
701	   captured.

703	5.2.  RTP

705	   +-------------+----------------+--------------------------------------+
706	   |Protocol     | Documentation  |  Purpose                             |
707	   +-------------+----------------+--------------------------------------+
708	   |RTP,RTCP     |  RFC 1889      |  packet format for realtime traffic  |
709	   |RTP Profile  |  RFC 1890      |  specific RTP profile for AV traffic |
710	   |RTP Payload  |  RFC 2032,     |  payload formats for specific codecs |
711	   | Formats     |   2035, etc    |                                      |
712	   +-------------+----------------+--------------------------------------+

714	   The transport protocol for real-time flows is RTP [28].  This
715	   provides a standard format packet header which gives media specific
716	   timestamp data, as well as payload format information and sequence
717	   numbering amongst other things.  RTP is normally carried using UDP.
718	   It does not provide or require any connection setup, nor does it
719	   provide any enhanced reliability over UDP.  For RTP to provide a
720	   useful media flow, there must be sufficient capacity in the relevant
721	   traffic class to accommodate the traffic.  How this capacity is
722	   ensured is independent of RTP.

724	   Every original RTP source is identified by a source identifier, and
725	   this source id is carried in every packet.  RTP allows flows from
726	   several sources to be mixed in gateways to provide a single resulting
727	   flow.  When this happens, each mixed packet contains the source IDs
728	   of all the contributing sources.

730	   RTP media timestamp units are flow specific --- they are in units
731	   that are appropriate to the media flow.  For example, 8kHz sampled
732	   PCM encoded audio has a timestamp clock rate of 8kHz.  This means
733	   that inter-flow synchronization is not possible from the RTP
734	   timestamps alone.

736	   Each RTP flow is supplemented by Real-Time Control Protocol (RTCP)
737	   packets.  There are a number of different RTCP packet types.  RTCP
738	   packets provide the relationship between the realtime clock at a
739	   sender and the RTP media timestamps so that inter-flow
740	   synchronization can be performed, and they provide textual
741	   information to identify a sender in a conference from the source id.

743	5.3.  Conference Membership and Reception Feedback

745	   IP multicast allows sources to send to a multicast group without
746	   being a receiver of that group.  However, for many conferencing
747	   purposes it is useful to know who is listening to the conference, and
748	   whether the media flows are reaching receivers properly.  Accurately
749	   performing both these tasks restricts the scaling of the conference.
750	   IP multicast means that no-one knows the precise membership of a
751	   multicast group at a specific time, and this information cannot be
752	   discovered, as to try to do so would cause an implosion of messages,
753	   many of which would be lost*.  Instead, RTCP provides approximate
754	   membership information through periodic multicast of session messages
755	   which, in addition to information about the recipient, also give
756	   information about the reception quality at that receiver.  RTCP
757	   session messages are restricted in rate, so that as a conference
758	   grows, the rate of session messages remains constant, and each
759	   receiver reports less often.  A member of the conference can never
760	   know exactly who is present at a particular time from RTCP reports,
761	   but does have a good approximation to the conference membership.  The
762	   is analogous to what happens in a real-world meeting hall; the
763	   meeting organizers may have an attendance list, but if people are
764	   coming and going all the time, they probably do not know exactly who
765	   is in the room at any one moment.

767	   Reception quality information is primarily intended for debugging
768	   purposes, as debugging of IP multicast problems is a difficult task.
769	   However, it is possible to use reception quality information for rate
770	   adaptive senders, although it is not clear whether this information
771	   is sufficiently timely to be able to adapt fast enough to transient
772	   congestion.

774	5.4.  Scaling Issues and Heterogeneity

776	   The Internet is very heterogeneous, with link speeds ranging from
777	   around 10 kbit/s up to around 10 Gbit/s, and very varied levels of
778	   congestion.  How then can a single multicast source satisfy a large
779	   and heterogeneous set of receivers?

781	_________________________
782	  * Note that a conference policy that restricts  conference  mem-
783	bership can be implemented using encryption and restricted distri-
784	bution of encryption keys, of which more later.

786	    Figure 5: Receiver adaptation: multiple layers and multicast groups

788	                             /~~~\           ##### 2 Mbit/s layer
789	                             | R |           ===== 512 kbit/s layer
790	                             \___/           ----- 64 kbit/s layer
791	                            #
792	          10 Mbit/s        # 10 Mbit/s
793	          link:           #  link
794	                         #
795	   /~~~\  #######>  +---+
796	   | S |  =======>  |   |
797	   \___/  ------->  +---+
798	                       \ = 1.5 Mbit/s
799	   Source               \ = link
800	                         \ =       1.5 Mbit/s
801	                          \ =         link
802	                           \ +---+=========>/~~~\
803	                             |   |--------->| R |
804	                             +---+          \___/
805	                10 Mbit/s  / =    \
806	                    link  / =      \ 128 kbit/s
807	                         / =        \ link
808	                        / =          \      10 Mbit/s
809	                     /~~~\            +---+     link /~~~\
810	                     | R |            |   |--------->| R |
811	                     \___/            +---+          \___/
812	                                           \
813	                                            \ 10 Mbit/s
814	                                             \ link
815	                                              \
816	                                               /~~~\
817	                                               | R |
818	                                               \___/

820	   In addition to each receiver performing its own adaptation to jitter,
821	   if the sender layers [22] its video (or audio) stream, different
822	   receivers can choose to receive different amounts of traffic and
823	   hence different qualities.  To do this the sender must code they
824	   video as a base layer (the lowest quality that might be acceptable)
825	   and a number of enhancement layers, each of which adds more quality
826	   at the expense of more bandwidth.  With video, these additional
827	   layers might increase the framerate or increase the spatial
828	   resolution of the images or both.  Each layer is sent to a different
829	   multicast group, and receivers can decide individually how many
830	   layers to subscribe to.  This is illustrated in Figure 5.  Of course,
831	   if they are going to respond to congestion in this way, then we also
832	   need to arrange that the receivers in a conference behind a common
833	   bottleneck tend to respond together to prevent de-synchronized
834	   experiments by different receivers from having the net effect that
835	   too many layers are always being drawn through a common bottleneck.
836	   RLM [21] is one way that this might be achieved, although there is
837	   continuing research in this area.

839	6.  Conference Control

841	+---------+--------------------------+-------------------------------------+
842	|Protocol | Documentation            |  Purpose                            |
843	+---------+--------------------------+-------------------------------------+
844	|H.323    | ITU recommendation H.323 | Tightly coupled conference setup    |
845	|         |                          |    and control                      |
846	|H.332    | ITU recommendation H.332 | Loosely coupled extensions to H.323 |
847	+---------+--------------------------+-------------------------------------+

849	   Conferences come in many shapes and sizes, but there are only really
850	   two models for conference control: light-weight sessions and tightly
851	   coupled conferencing.  For both models, rendezvous mechanisms are
852	   needed.  Note that the conference control model is orthogonal to
853	   issues of quality of service and network resource reservation, and it
854	   is also orthogonal to the mechanism for discovering the conference.

856	   Light-weight sessions are multicast based multimedia conferences that
857	   lack explicit conference membership control and explicit conference
858	   control mechanisms.  Typically a lightweight session consists of a
859	   number of many-to-many media streams supported using RTP and RTCP
860	   using IP multicast*.  Typically, the only conference control
861	   information needed during the course of a light-weight session is
862	   that distributed in the RTCP session information, i.e. an approximate
863	   membership list with some attributes per member.

865	   Tightly coupled conferences may also be multicast based and use RTP
866	   and RTCP, but in addition they have an explicit conference membership
867	   mechanism and may have an explicit conference control mechanism that
868	   provides facilities such as floor control.

870	   The most widely used tightly coupled conference control protocols
871	   suitable for Internet use are those belonging to the ITU's H.323
872	   family [16].  However it should be noted that this is inappropriate
873	   for larger conferences where scaling problems will be introduced by
874	   the conference control mechanisms.  The Simple Conference Control
875	   Protocol (SCCP) [18] has been proposed as a more scalable distributed
876	   conference control protocol.

878	   In order to try and address large conferences, the ITU is currently
879	   standardising H.332 [17], which is essentially a small tightly
880	   coupled H.323 conference with a larger lightweight-sessions-style
881	_________________________
882	  * There is some confusion on the term session,  which  is  some-
883	times  used  for  a  conference  and  sometimes for a single media
884	stream transported by RTP.  In this document, we prefer to use the
885	less ambiguous term conference except where existing protocols use
886	the term session.

888	   conference listening in as passive participants.  It is not yet clear
889	   whether H.332 will see large scale acceptance, as its benefits over a
890	   simple lightweight session are not terribly obvious.  It seems likely
891	   that lightweight sessions combined with stream authentication (see
892	   section 8.3) might be a more appropriate solution for many potential
893	   customers.

895	6.1.  Controlling Multimedia Servers

897	       +---------+---------------+---------------------------------+
898	       |Protocol | Documentation |  Purpose                        |
899	       +---------+---------------+---------------------------------+
900	       |RTSP     |  RFC 2326     |  Remote control and AV playback |
901	       |         |               |    and recording servers        |
902	       +---------+---------------+---------------------------------+

904	   The Real-Time Stream-control Protocol (RTSP) provides a standard way
905	   to remote control a multimedia server.  While primarily aimed at web-
906	   based media-on-demand services, RTSP is also well suited to provide
907	   VCR-like controls for audio and video streams, and to provide
908	   playback and record functionality of RTP data streams.  A client can
909	   specify that an RTSP server plays a recorded multimedia session into
910	   an existing multicast-based conference, or can specify that the
911	   server should join the conference and record it.

913	6.2.  Protocols for Non-A/V Applications

915	   Applications other than audio and video have evolved in Internet
916	   conferencing, e.g. Imm, Wb [8], NTE [11].  Such applications can be
917	   used to substitute for meeting aids in physical conferences
918	   (whiteboards, projectors) or replace visual and auditory cues that
919	   are lost in teleconferences (e.g., a speaker list application); they
920	   also can enable new styles of joint work.

922	   Most non-A/V applications have in common that the application
923	   protocol is about establishing and updating a shared state.  Loss of
924	   information is often not acceptable, so some form of multicast
925	   reliability is required.  The applications' requirements differ: Some
926	   applications make per-participant additions to the shared state that
927	   are orthogonal to each other (e.g., whiteboards), some evolve a more
928	   closely interrelated common state (e.g., additions to a speaker list
929	   must be properly sequenced).  Some applications can make use of added
930	   bandwidth/react to congestion in an elastic way, others transport
931	   data that, although not strictly real-time, is time-critical.

933	   In the IRTF research group on Reliable Multicast [13], work is in
934	   progress on common protocol elements that can be used in such
935	   applications.  At the time of writing, some aspects of reliable
936	   multicast are not well-understood, such as the proper way to provide
937	   congestion control in a multi-sender multicast environment.  As
938	   congestion control is considered an essential element, standards
939	   track protocols are not expected before this can be solved.

941	7.  Conference Setup

943	   There are two basic forms of conference discovery mechanism.  These
944	   are session advertisement and session invitation.  Session
945	   advertisements are provided using a session directory, and inviting a
946	   user to join a session is provided using a session invitation
947	   protocol such as SIP or H.323.

949	7.1.  Session Directories

951	     +----------+------------------+----------------------------------+
952	     |Protocol  | Documentation    |  Purpose                         |
953	     +----------+------------------+----------------------------------+
954	     |SDP       |  RFC 2327        |  Session description format      |
955	     |SAP       |  Internet draft  |  Multicast session announcements |
956	     +----------+------------------+----------------------------------+

958	   The rendezvous mechanism for many light-weight sessions is a
959	   multicast based session directory.  This ``broadcasts'' session
960	   descriptions [9] to all the potential session participants.  These
961	   session descriptions provide an advertisement that the session will
962	   exist, and also provide sufficient information including multicast
963	   addresses, ports, media formats and session times so that a receiver
964	   of the session description can join the session.  The session
965	   description protocol (SDP) describes the content and format of a
966	   multimedia session, and the session announcement protocol (SAP) is
967	   used to distribute it to all potential session recipients.

969	   This mechanism can also be applied to advertised tightly coupled
970	   sessions, and only requires that additional information about the
971	   mechanism to use to join the session is given.  However, as the
972	   number of sessions in the session directory grows, we expect that
973	   only larger-scale public sessions will be announced in this manner,
974	   and smaller, more private sessions will tend to use direct invitation
975	   rather than advertisement.

977	7.2.  Session Invitation

979	+----------+----------------+----------------------------------------------+
980	|Protocol  | Documentation  |  Purpose                                     |
981	+----------+----------------+----------------------------------------------+
982	|SIP       |  RFC 2543      |  initiating multimedia calls and conferences |
983	+----------+----------------+----------------------------------------------+
984	   Not all sessions are advertised, and even those that are advertised
985	   may require a mechanism to explicitly invite a user to join a
986	   session.  Such a mechanism is required regardless of whether the
987	   session is a lightweight session or a more tightly coupled session,
988	   although the invitation system must specify the mechanism to be used
989	   to join the session.

991	   As users are mobile, it is important that such an invitation
992	   mechanism is capable of locating and inviting a user in a location
993	   independent manner.  Thus user addresses need to be used as a level
994	   of indirection rather than routing a call to a specific terminal.
995	   The invitation mechanism should also provide for alternative
996	   responses, such as leaving a message or being referred to another
997	   user, should the invited user be unavailable.

999	   The Session Initiation Protocol (SIP) provides a mechanism whereby a
1000	   user can be invited to participate in a conference.  SIP does not
1001	   care whether the session is already ongoing, or is just being
1002	   created, and it doesn't care whether the conference is a small
1003	   tightly coupled session or a huge broadcast -- it merely conveys an
1004	   invitation to a user in a timely manner, inviting them to
1005	   participate, and provides enough information for them to be able to
1006	   know what sort of session to expect.  Thus although SIP can be used
1007	   to make telephone-style calls, it is by no means restricted to that
1008	   style of conference.

1010	8.  Security

1012	   There is a temptation to believe that multicast is inherently less
1013	   private than unicast communication since the traffic visits so many
1014	   more places in the network.  In fact, this is not the case except
1015	   with broadcast and prune type multicast routing protocols [4].
1016	   However, IP multicast does make it simple for a host to anonymously
1017	   join a multicast group and receive traffic destined to that group
1018	   without the other senders' and receivers' knowledge.  If the
1019	   application requirement (conference policy) is to communicate between
1020	   some defined set of users, then strict privacy can only be enforced
1021	   in any case through adequate end-to-end encryption.

1023	   RTP specifies a standard way to encrypt RTP and RTCP packets using
1024	   private key encryption schemes such as DES [24].  It also specifies a
1025	   standard mechanism to manipulate plain text keys using MD5 [25] so
1026	   that the resulting bit string can be used as a DES key.  This allows
1027	   simple out-of-band mechanisms such as privacy-enhanced mail to be
1028	   used for encryption key exchange.

1030	8.1.  Authentication and Key Distribution

1032	   +----------+----------------------------+---------------------------+
1033	   |Protocol  | Documentation              |  Purpose                  |
1034	   +----------+----------------------------+---------------------------+
1035	   |PGP       |  RFC 1991                  |  public key cryptography  |
1036	   |X.509     |  ITU recommendation X.509  |  directory authentication |
1037	   +----------+----------------------------+---------------------------+

1039	   Key distribution is closely tied to authentication.  Conference or
1040	   session directory keys can be securely distributed using public-key
1041	   cryptography on a one-to-one basis (by email, a directory service, or
1042	   by an explicit conference setup mechanism), but this is only as good
1043	   as the certification mechanism used to certify that a key given by a
1044	   user is the correct public key for that user.  Such certification
1045	   mechanisms [3] are however not specific to conferencing, and it looks
1046	   likely that certificates such as those provided by PGP will be most
1047	   widely used in the near term.

1049	   Session keys can be distributed using encrypted Session Descriptions
1050	   carried in SIP session invitations, or in encrypted session
1051	   announcements as described below.  Neither of these mechanisms
1052	   provide for changing keys during a session as might be required in
1053	   some tightly coupled sessions, but they are probably sufficient for
1054	   many used in the context of lightweight sessions.

1056	   Even without privacy requirements in the conference policy, strong
1057	   authentication of a user is required if making a network reservation
1058	   results in usage based billing.

1060	8.2.  Encrypted Session Announcements

1062	     +----------+------------------+----------------------------------+
1063	     |Protocol  | Documentation    |  Purpose                         |
1064	     +----------+------------------+----------------------------------+
1065	     |SAP       |  Internet draft  |  multicast session announcements |
1066	     +----------+------------------+----------------------------------+

1068	   Session Directories can make encrypted session announcements using
1069	   private key encryption, and carry the encryption keys to be used for
1070	   each of the conference media streams in the session.  Whilst this
1071	   does not solve the key distribution problem, it does allow a single
1072	   conference to be announced more than once to more than one key-group,
1073	   where each group holds a different session directory key, so that the
1074	   two groups can be brought together into a single conference without
1075	   having to know each other's keys.

1077	8.3.  Secured ``Broadcasts''

1079	   While private-key encryption is sufficient to exclude non-members
1080	            Figure 6: Joining a light-weight multimedia session

1082	   User A     |                                             |
1083	   creates    |  SDP/SAP                                    |
1084	   conference |----------->                                 |
1085	              |                                             |User B
1086	              |  SDP/SAP                            IGMP    |starts
1087	              |----------->               IGMP /--<---------|session
1088	              |                 IGMP /-<------/             |directory
1089	              |-----------<---------/                       |
1090	              |                                             |
1091	              |  SDP/SAP                                    |
1092	              |-------------------------------------------->|
1093	              |                                             |
1094	   User A     |                                             |
1095	   starts     |    RTP                                      |
1096	   sending    |===========>                                 |
1097	              |    RTCP                                     |
1098	              |----------->                                 |
1099	              |                                             |
1100	              |    RTP                                      |
1101	              |===========>                                 |
1102	              |                                             |
1103	              |    RTP                                      |User B
1104	              |===========>                         IGMP    |joins
1105	              |                           IGMP /--<---------|conference
1106	              |                 IGMP /-<------/             |
1107	              |-----------<---------/                       |User's App
1108	              |                                     RTCP    |Sends RTCP
1109	              |    RTP                         /--<---------|Session
1110	              |===============================/============>|Message
1111	              |<-----------------------------/              |
1112	              |    RSVP Path Message                        |
1113	              |-------------------------------------------->|
1114	              |                                             |User's App
1115	              |    RTP                          /-----------|makes
1116	              |================================/===========>|reservation
1117	              |    RSVP RESV Message    /-----/             |
1118	              |<-----------------------/                    |
1119	              |                                             |
1120	              |    RTP                                      |Quality
1121	              |============================================>|of Service
1122	              |                                             |improves

1124	   from sending or receiving multicast conference traffic, it does mean
1125	   that all members of a session are equal.  This is normally acceptable
1126	   for multi-way conferences, but will not be acceptable for many
1127	   broadcasters who require the ability to ensure that only they can
1128	   send, perhaps in addition to ensuring that only their paid customers
1129	   can receive.  This is nicely illustrated by the multicast of the
1130	   Rolling Stones concert in 1994 which was billed as being the first
1131	   live concert on the Mbone.  In fact, this honour goes to a little
1132	   known band called Severe Tire Damage who had multicast an impromptu
1133	   concert a year previously.  To make their point, just before the
1134	   Stones were due to go on stage, Severe Tire Damage suddenly started
1135	   broadcasting one of their songs live to the same multicast group.
1136	   Clearly commercial broadcasters want to avoid occurrences like this
1137	   one.

1139	   Such secured broadcasts can be performed by encrypting a hash
1140	   (digitally signing) of each packet with the senders private key of a
1141	   public-private key pair.  The public key is then given to the
1142	   receivers, and they discard (and prune if possible) any packets that
1143	   are unsigned.  The problem with this is that even encrypting a 128
1144	   bit hash with a public key algorithm can be relatively expensive to
1145	   perform at high packet rates sometimes seen with video.  The use of
1146	   public-key cryptography for this purpose has not yet been
1147	   standardized, but some such mechanism will clearly be needed before
1148	   the Mbone becomes an acceptable environment for commercial
1149	   broadcasters.

1151	9.  Summary

1153	   This document is an attempt to gather together in one place the set
1154	   of assumptions behind the design of the Internet Multimedia
1155	   Conferencing architecture, and the services that are provided to
1156	   support it.

1158	   Figure 6 shows an example time sequence involved in setting up a
1159	   light-weight session between two sites.  In this case, site A creates
1160	   a session advertisement, and some time later starts sending a media
1161	   stream even though there may be no receiver at that time.  Some time
1162	   later, site B joins the session (the multicast routing protocol here
1163	   is PIM), and starts to receive the traffic.  At the earliest
1164	   opportunity site B also makes an RSVP reservation to ensure the flow
1165	   quality is satisfactory.  This example should be taken as
1166	   illustrative only -- there are different ways to join sessions, and
1167	   different ways to get improved quality of service.

1169	   The lightweight sessions model for Internet multimedia conferencing
1170	   may not be appropriate for all conferences, but for those sessions
1171	   that do not require tightly-coupled conference control, it provides
1172	   an elegant style of conferencing that scales from two participants to
1173	   millions of participants.  It achieves this scaling by virtue of the
1174	   way that multicast routing is receiver driven, keeping essential
1175	   information about receivers local to those receivers.  Each new
1176	   participant only adds state close to them in the network.  It also
1177	   scales by not requiring explicit conference join mechanisms; if
1178	   everyone were to need to know exactly who is in the session at any
1179	   time, the scaling would be severely adversely affected.  RTCP
1180	   provides membership information that is accurate when the group is
1181	   small, and increasingly only a statistical representation of the
1182	   membership as the group grows.  Security is handled through the use
1183	   of encryption rather than through the control of data distribution.

1185	   For those that require tightly coupled conferences, solutions such as
1186	   H.323 are emerging there too.

1188	   There are still many parts of this architecture that are incomplete,
1189	   and are still the subject of active research.  In particular,
1190	   differentiated services for better-than-best-effort service show
1191	   great promise to provide a more scalable alternative to individual
1192	   reservations.  Multicast routing scales well to large groups, but
1193	   scales less well to large numbers of groups; we expect this will
1194	   become the subject of significant research over the next few years.
1195	   Multicast congestion control mechanisms are still a research topic,
1196	   although in the last year several schemes have emerged that show
1197	   promise.  Layered codecs show great promise to allow conferences to
1198	   scale in the face of heterogeneity, but the join and leave mechanisms
1199	   that allow them to perform receiver-based congestion control are
1200	   still being examined.  We have several working examples of reliable-
1201	   multicast-based shared applications; the next few years should see
1202	   the start of standardization work in this area as appropriate
1203	   multicast congestion control mechanisms emerge.  Finally a complete
1204	   security architecture for conferencing would be very desirable;
1205	   currently we have many parts of the solution, but are still waiting
1206	   for an appropriate key-distribution architecture to emerge from the
1207	   security research community.

1209	   The Internet Multimedia Conferencing architecture and the Mbone have
1210	   come a long way from their early beginnings on the DARTnet testbed in
1211	   1992.  The picture is not yet finished, but it has now taken shape
1212	   sufficiently that we can see the form it will take.  Whether or not
1213	   the Internet does evolve into the single communications network that
1214	   is used for most telephone, television, and other person-to-person
1215	   communication, only time will tell.  However, we believe that it is
1216	   becoming clear that if the industry decides that this should be the
1217	   case, the Internet should be up to the task.

1219	10.  Acknowledgments

1221	   Acknowledgments are due to the End-to-End Research Group, the Int-
1222	   serv, RSVP, MMUSIC and AVT working groups of the IETF, and discussion
1223	   with colleagues at UCL.  The earliest clear exposition of some of the
1224	   ideas here was presented at ACM SIGCOMM 1994 in London by Van
1225	   Jacobson.

1227	11.  Authors' Addresses

1229	   Mark Handley
1230	   AT&T Center for Internet Research at ICSI
1231	   1947 Center St, Suite 600
1232	   Berkeley, CA 94704
1233	   EMail: mjh@aciri.org

1235	   Jon Crowcroft,
1236	   Department of Computer Science
1237	   University College London
1238	   Gower Street,
1239	   London WC1E 6BT, UK.
1240	   Email: j.crowcroft@cs.ucl.ac.uk

1242	   Carsten Bormann, Joerg Ott
1243	   Universitaet Bremen TZI
1244	   Postfach 330440
1245	   D-28334 Bremen, GERMANY.
1246	   Email: cabo@tzi.org, jo@tzi.org

1248	References

1250	   [1]  A. Ballardie, P. Francis, J. Crowcroft, ``An Architecture for
1251	        Scalable Inter-Domain Multicast Routing'', ACM SIGCOMM 1993, pp
1252	        85-95.

1254	   [2]  C. Bormann, ``Providing integrated services over low-bitrate
1255	        links,'' RFC2689, September 1999.

1257	   [3]  CCITT (Consultative Committee on International Telegraphy and
1258	        Telephony). ``Recommendation X.509: The Directory --
1259	        Authentication Framework.'' 1988.

1261	   [4]  S. Deering, C. Partridge, D. Waitzman, ``Distance Vector
1262	        Multicast Routing Protocol'', RFC 1075,  Nov 1988.

1264	   [5]  Steve Deering, ``Multicast Routing in Internetworks and Extended
1265	        LANs'', ACM SIGCOMM 88, August 1988, pp 55-64 and ``Host
1266	        Extensions for IP Multicasting'', RFC 1112.

1268	   [6]  S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C-G. Liu, L.
1269	        Wei ``An Architecture for Wide Area Multicast Routing'' ACM
1270	        SIGCOMM 1994, pp 126-135.

1272	   [7]  Estrin, Farinacci, Helmy, Thaler, Deering, Handley, Jacobson,
1273	        Liu, Sharma, Wei, ``Protocol Independent Multicast-Sparse Mode
1274	        (PIM-SM): Protocol Specification'', RFC 2362.

1276	   [8]  S. Floyd, V. Jacobson, S. McCanne, C-G. Liu, L. Zhang, ``A
1277	        Reliable Multicast Framework for Light-weight Sessions and
1278	        Application Level Framing'' ACM SIGCOMM 1995, pp 342-356.

1280	   [9]  M. Handley, V. Jacobson, ``SDP: Session Description Protocol''
1281	        INTERNET-DRAFT, Dec 1997.

1283	   [10] M. Handley, D. Thaler, D. Estrin, ``The Internet Multicast
1284	        Address Allocation Architecture'', INTERNET-DRAFT, Dec 1997.

1286	   [11] M. Handley, J. Crowcroft, ``Network Text Editor (NTE): A
1287	        scalable shared text editor for the MBone'', ACM SIGCOMM 1997.

1289	   [12] V. Hardman, A. Sasse, M. Handley, A. Watson, ``Reliable Audio
1290	        for Use over the Internet'' Proc INET '95, Hawaii, Internet
1291	        Society, Reston, VA, 1995.

1293	   [13] IRTF Research Group on Reliable Multicast,
1294	        http://www.east.isi.edu/RMRG/

1296	   [14] ITU ``Recommendation H.320: Narrow-band visual telephone systems
1297	        and terminal equipment'', ITU, Geneva, 1997

1299	   [15] ITU ``Recommendation T.124 -- Generic Conference Control'', ITU,
1300	        Geneva.

1302	   [16] ITU ``Recommendation H.323: Visual telephone systems and
1303	        equipment for local area networks which provide a non guaranteed
1304	        quality of service'', ITU, Geneva, 1996

1306	   [17] ITU ``Recommendation H.332: H.323 Extended for Loosely-Coupled
1307	        conferences'', ITU, Geneva

1309	   [18] C. Bormann, J. Ott, C. Reichert, ``Simple Conference Control
1310	        Protocol'' INTERNET-DRAFT, June 1996.

1312	   [19] V. Jacobson, ``Congestion Avoidance and Control'', ACM SIGCOMM
1313	        1988.

1315	   [20] J. Linn, ``Privacy Enhancement for Internet Electronic Mail:
1316	        Part I: Message Encryption and Authentication Procedures'', RFC
1317	        1421, Feb 1993

1319	   [21] S. McCanne, V. Jacobson and M. Vetterli, ``Receiver-driven
1320	        Layered Multicast''. ACM SIGCOMM 1996, pp. 117-130.

1322	   [22] S. McCanne, M. Vetterli, ``Joint Source/Channel Coding for
1323	        Multicast Packet Video''. Proceedings of the IEEE International
1324	        Conference on Image Processing. October, 1995. Washington, DC.

1326	   [23] J. Moy, ``Multicast Extensions to OSPF'', RFC 1584, March 1994.

1328	   [24] National Institute of Standards and Technology (NIST), ``FIPS
1329	        Publication 46-1: Data Encryption Standard'', January 22, 1988

1331	   [25] Rivest, R., ``The MD5 Message-Digest Algorithm'', RFC 1321, MIT
1332	        Laboratory for Computer Science and RSA Data Security, Inc.,
1333	        April 1992

1335	   [26] Schooler, E., A Distributed Architecture for Multimedia
1336	        Conference Control, ISI Research Report ISI/RR-91-289, November
1337	        1991.  ftp://ftp.isi.edu/pub/hpcc-papers/mmc/mmcc.ps

1339	   [27] Schulzrinne, H., ``Personal Mobility for Multimedia Services in
1340	        the Internet'' IMDS'96, March 4-6 1996.
1341	        ftp://ftp.fokus.gmd.de/pub/step/papers/Schu9603:Personal.ps.gz

1343	   [28] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson ``RTP: A
1344	        Transport Protocol for Real-Time Applications'' RFC 1889.

1346	   [29] D. Thaler, D. Estrin, D. Meyer, ``Border Gateway Multicast
1347	        Protocol'', INTERNET-DRAFT, Oct 1997.