idnits 2.17.1 draft-ietf-mmusic-confarch-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 53 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 291 has weird spacing: '...rs need hold ...' == Line 293 has weird spacing: '...unicast of m...' == Line 294 has weird spacing: '...tween on-tree...' == Line 347 has weird spacing: '...or slow links...' == Line 595 has weird spacing: '... * With reser...' == (4 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2000) is 8686 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '1' is defined on line 1250, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 1268, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 1272, but no explicit reference was found in the text == Unused Reference: '15' is defined on line 1299, but no explicit reference was found in the text == Unused Reference: '19' is defined on line 1312, but no explicit reference was found in the text == Unused Reference: '20' is defined on line 1315, but no explicit reference was found in the text == Unused Reference: '23' is defined on line 1326, but no explicit reference was found in the text == Unused Reference: '26' is defined on line 1335, but no explicit reference was found in the text == Unused Reference: '27' is defined on line 1339, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Downref: Normative reference to an Informational RFC: RFC 2689 (ref. '2') -- Possible downref: Non-RFC (?) normative reference: ref. '3' ** Downref: Normative reference to an Experimental RFC: RFC 1075 (ref. '4') -- Possible downref: Non-RFC (?) normative reference: ref. '6' ** Obsolete normative reference: RFC 2362 (ref. '7') (Obsoleted by RFC 4601, RFC 5059) -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' -- Possible downref: Non-RFC (?) normative reference: ref. '11' -- Possible downref: Non-RFC (?) normative reference: ref. '12' -- Possible downref: Non-RFC (?) normative reference: ref. '13' -- Possible downref: Non-RFC (?) normative reference: ref. '14' -- Possible downref: Non-RFC (?) normative reference: ref. '15' -- Possible downref: Non-RFC (?) normative reference: ref. '16' -- Possible downref: Non-RFC (?) normative reference: ref. '17' -- Possible downref: Non-RFC (?) normative reference: ref. '18' -- Possible downref: Non-RFC (?) normative reference: ref. '19' ** Downref: Normative reference to an Historic RFC: RFC 1421 (ref. '20') -- Possible downref: Non-RFC (?) normative reference: ref. '21' -- Possible downref: Non-RFC (?) normative reference: ref. '22' ** Downref: Normative reference to an Historic RFC: RFC 1584 (ref. '23') -- Possible downref: Non-RFC (?) normative reference: ref. '24' ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. '25') -- Possible downref: Non-RFC (?) normative reference: ref. '26' -- Possible downref: Non-RFC (?) normative reference: ref. '27' ** Obsolete normative reference: RFC 1889 (ref. '28') (Obsoleted by RFC 3550) -- Possible downref: Non-RFC (?) normative reference: ref. '29' Summary: 13 errors (**), 0 flaws (~~), 16 warnings (==), 24 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT M. Handley/J. Crowcroft/C. Bormann/J. Ott 3 Expires: January 2001 ACIRI/UCL/TZI/TZI 4 July 2000 6 The Internet Multimedia Conferencing Architecture 7 draft-ietf-mmusic-confarch-03 9 Status of this memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC 2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet- Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html 30 This document is a product of the Multiparty Multimedia Session 31 Control (MMUSIC) working group of the Internet Engineering Task 32 Force. Comments are solicited and should be addressed to the working 33 group's mailing list at confctrl@isi.edu and/or the authors. 35 Abstract 37 This document provides an overview of multimedia conferencing on the 38 Internet. The protocols mentioned are specified elsewhere as RFCs, 39 Internet-Drafts, or ITU recommendations. Each of these 40 specifications gives details of the protocol itself, how it works and 41 what it does. This document attempts to provide the reader with an 42 overview of how the components fit together and of some of the 43 assumptions made, as well as some statement of direction for those 44 components still in a nascent stage. 46 (Remove before publication:) This document is a product of the 47 Multiparty Multimedia Session Control (MMUSIC) working group of the 48 Internet Engineering Task Force. Comments are solicited and should 49 be addressed to the working group's mailing list at confctrl@isi.edu 50 and/or the authors. 52 1. Introduction 54 The Internet is not currently very good at carrying audio and video. 55 This is hardly surprising as it was not designed or engineered with 56 real-time traffic in mind, but there has recently been a great deal 57 of interest in using the Internet for telephony services. Part of 58 this has come from pricing anomalies that make internet telephony 59 somewhat artificially cheaper than traditional telephone services, 60 but this is not the whole story. The Internet itself is improving to 61 better handle traffic such as audio and video, and in the medium 62 term, the internet should be able to provide good quality realtime 63 multimedia services, although such quality improvements are likely to 64 incur additional charges. 66 However, the real interest in using the internet for audio and video 67 should come from the prospect for a single ubiquitous communications 68 network that not only allows traditional telephony services, but also 69 video, shared collaboration tools, and through IP Multicast, multi- 70 party conferences and multimedia sessions that scale from small group 71 meetings through to television sized audiences. In principle, this 72 may lead to a ``democratization'' of telecommunication services, 73 where licenses to broadcast are not required to control physical 74 access to the limited broadcast medium (although they may still be 75 required for political reasons). 77 It is far from clear what services will eventually emerge using such 78 communications capabilities. We can only say that the technical 79 capability to have large numbers of sessions ranging in audience from 80 hundreds to millions of participants, largely unlimited by geographic 81 boundaries, will lead to services and social structures that do not 82 exist today. However, we can describe the basic technologies that 83 are likely to bring about such changes, and in this document we 84 attempt to provide such an overview. We leave it to the reader to 85 imagine the uses to which this technology will be put. 87 2. The Technology 89 In conjunction with computers, the term ``conferencing'' is often 90 used in two different ways: firstly, to refer to bulletin boards and 91 mail list style asynchronous exchanges of messages between multiple 92 users; secondly, to refer to synchronous or so-called ``real-time'' 93 conferencing, including audio/video communication and shared tools 94 such as whiteboards and other applications. This document is about 95 the architecture for this latter application, multimedia conferencing 96 in an Internet environment. 98 There are other infrastructures for teleconferencing in the world: 99 POTS (Plain Old Telephone System) networks often provide voice 100 conferencing and phone-bridges, while with ISDN, H.320 [14] can be 101 used for small, strictly organized video-telephony conferencing. 103 The architecture that has evolved in the Internet is far more general 104 as well as being scalable to very large groups, and permits the open 105 introduction of new media and new applications as they are devised. 106 As the simplest case, it also allows two persons to communicate via 107 audio only, so it encompasses IP telephony. 109 The determining factors of a conferencing architecture are 110 communication within (possibly large) groups of humans and real-time 111 delivery of information. In the Internet, this is supported at a 112 number of levels. The remainder of this section provides an overview 113 of this support, and the rest of the document describes each aspect 114 in more detail. 116 In a conference, information must be distributed to all the 117 conference participants. Early conferencing systems used a fan-out 118 of data streams, e.g., one connection between each pair of 119 participants, which means that the same information must cross some 120 networks more than once. The Internet architecture uses the more 121 efficient approach of multicasting the information to all 122 participants (see section 3). 124 Multimedia conferences require real-time delivery of at least the 125 audio and video information streams used in the conference. In an 126 ISDN context, fixed rate circuits are allocated for this purpose -- 127 whether their bandwidth is required at any particular instance or 128 not. On the other hand, the traditional Internet service model 129 (``best effort'') cannot make the necessary quality of service 130 available in congested networks. New service models are being 131 defined in the Internet together with protocols to reserve capacity 132 or prioritize traffic in a more flexible way than that available with 133 circuit switching (see section 4). 135 In a datagram network, multimedia information must be transmitted in 136 packets, some of which may be delayed more than others. In order 137 that audio and video streams be played out at the recipient in the 138 correct timing, information must be transmitted that allows the 139 recipient to reconstitute the timing. A transport protocol with the 140 specific functions needed for this has been defined (see section 5). 141 The nature of the Internet reflects that of the world in that it is 142 very heterogeneous. Techniques exist to exploit this, and to deliver 143 appropriate quality to different participants in the same conference 144 according to their capabilities. 146 Conference tools such as virtual whiteboards or shared editors are 147 not concerned with real-time delivery of audio or video but maintain 148 and update shared state between the participants. Work on support of 149 such applications in a multicast environment is in progress (section 150 6.2). 152 The humans participating in a conference generally need to have a 153 specific idea of the context in which the conference is happening, 154 which can be formalized as a conference policy. Some conferences are 155 essentially crowds gathered around an attraction, while others have 156 very formal guidelines on who may take part (listen in) and who may 157 speak at which point. In any case, initially the participants must 158 find each other, i.e. establish communication relationships 159 (conference setup, section 7). During the conference, some 160 conference control information is exchanged to implement a conference 161 policy or at least to inform the crowd of who is present (section 6). 163 In addition, security measures may be required to actually enforce 164 the conference policy, e.g. to control who is listening and to 165 authenticate contributions as actually originating from a specific 166 person. In the Internet, there is little tendency to rely on the 167 traditional ``security'' of distribution offered e.g. by the phone 168 system. Instead, cryptographic methods are used for encryption and 169 authentication, which need to be supported by additional conference 170 setup and control mechanisms (section 8). 172 Figure 1: Internet multimedia conferencing protocol stacks 174 |<--- Conference Management --->|<--- Media Agents --->| 175 | | | 176 | Conference | Conference | Audio/ | Shared | 177 | Setup & Discovery | Course Control | Video | Applications | 179 +-------------------------+------+--------+-+--------+------------+ + 180 | S D P | | Distr. | RTP / | Reliable | | 181 | SAP | SIP | HTTP | SMTP | RSVP | Ctrl(1)| RTCP |Multicast(2)| | 182 +-----+--+--+------+------+ +--+--------+----------+------------+--+ 183 | UDP | T C P | | U D P | 184 +--------+----------------+---+--------------------------------------+ 185 | IP + IP Multicast | 186 +--------------------------------------------------------------------+ 187 | Integrated Services Forwarding | 188 +--------------------------------------------------------------------+ 190 The protocol stacks for internet multimedia conferencing are 191 illustrated in Figure 1. Most of the protocols are not deeply 192 layered unlike many protocol stacks, but rather are used alongside 193 each other to produce a complete conference. 195 3. Multicast Traffic Distribution 197 +--------------+------------------+-----------------------------------+ 198 |Protocol | Documentation | Purpose | 199 +--------------+------------------+-----------------------------------+ 200 |IP Multicast | RFC 1112, 2236 | Host extensions for IP Multicast | 201 | | | Multicast routing protocols: | 202 |DVMRP | RFC 1075 | Dense-mode Intra-domain | 203 |PIM-SM | RFC 2362 | Sparse-mode Intra-domain | 204 |PIM-DM | Internet Draft | Dense-mode Intra-domain | 205 |CBT | RFC 2189 | Sparse-mode Intra-domain | 206 +--------------+------------------+-----------------------------------+ 208 IP multicast provides efficient many-to-many data distribution in an 209 internet environment. It is easy to view IP multicast as simply an 210 optimization for data distribution, and indeed this is the case, but 211 IP multicast can also result in a different way of thinking about 212 application design. To see why this might be the case, examine the 213 IP multicast service model, as described by Van Jacobson [8]: 215 - Senders just send to the group 217 - Receivers express an interest in receiving data sent to the 218 group 220 - Routers conspire to deliver data from senders to receivers 222 With IP multicast, the group is indirectly identified by a single IP 223 class-D multicast address. 225 Several things are important about this service model from an 226 architectural point of view. Receivers do not need to know who or 227 where the senders are to receive traffic from them. Senders never 228 need to know who the receivers are. Neither senders or receivers 229 need care about the network topology as the network optimizes 230 delivery. 232 An IP multicast group is scalable because information about group 233 membership and group changes at the IP level are kept local to 234 routers near the relevant members. How this is performed depends on 235 the particular multicast routing scheme in use local to the member, 236 and although it is not a trivial task, several solutions do exist and 237 therefore multicast routing will not be discussed in detail here. 238 For more detailed information on multicast routing, see [7, 6, 4, 1, 239 23]. Typically, as a group with s senders and r receivers increases 240 in size, state in routers scales O(s) or O(1) depending on the 241 routing scheme in use. This state may be in on-tree routers for 242 newer so called sparse-mode algorithms such as PIM, or in off-tree 243 routers for older so-called dense-mode algorithms such as DVMRP. 244 Thus the most scalable current multicast routing algorithms require 245 O(1) state in on-tree routers, and hence the total routing state 246 scales O(g) in a router that is on-tree for g groups. We can also 247 envisage multicast routing schemes which require less than O(g) 248 state*, but the requirement is not currently urgent, so none of these 249 have yet been implemented. 251 The level of indirection introduced by the IP class D address 252 denominating the group solves the distributed systems binding 253 problem, by pushing this task down into routing; given a multicast 254 address (and UDP port), a host can send a message to the members of a 255 group without needing to discover who they are. Similarly receivers 256 can ``tune in'' to multicast data sources without needing to bother 257 the data source itself with any form of request. 259 IP multicast is a natural solution for multi-party conferencing 260 because of the efficiency of the data distribution trees, with data 261 being replicated in the network at appropriate points rather than in 262 end-systems. It also avoids the need to configure special-purpose 263 servers to support the session, which require support, and which 264 cause traffic concentration and can be a bottleneck. For larger 265 broadcast-style sessions, it is essential that data-replication be 266 carried out in a way that only requires per-receiver network-state to 267 be local to each receiver, and that data-replication occurs within 268 the network. Attempting to configure a tree of application-specific 269 replication servers for such broadcasts rapidly becomes a ``multicast 270 routing'' problem, and thus native multicast support is a more 271 appropriate solution. 273 3.1. Address Allocation 275 +----------+-------------------+----------------------------------+ 276 |Protocol | Documentation | Purpose | 277 +----------+-------------------+----------------------------------+ 278 |MADCAP | Internet Draft | DHCP-like client protocol | 279 | | | for address allocation | 280 |AAP | Internet Draft | Intra-domain address allocation | 281 |MASC | Internet Draft | Inter-domain address allocation | 282 |BGMP | Internet Draft | Inter-domain multicast routing | 283 +----------+-------------------+----------------------------------+ 285 How does an application choose a multicast address to use? 287 In the absence of any other information, we can bootstrap a multicast 288 application by using well-known multicast addresses. Routing 289 (unicast and multicast) and group membership protocols [5] can do 290 _________________________ 291 * with IP encapsulation, not all on-tree routers need hold the 292 state for a group whose traffic they are forwarding -- traffic for 293 the group can be encapsulated (either unicast of multicast) be- 294 tween on-tree routers nearer the edge of the network, reducing 295 some of the state burden on backbone routers. 297 just that. However, this is not the best way of managing 298 applications of which there is more than one instance at any one 299 time. 301 For these, we need a mechanism for allocating group addresses 302 dynamically, and a directory service which can hold these allocations 303 together with some key (session information for example --- see 304 later), so that users can look up the address associated with the 305 application. The address allocation and directory functions should 306 be distributed to scale well. 308 Multicast address allocation is currently an active area of research. 309 For many years multicast address allocation has been performed using 310 multicast session directories (see section 7.1), but as the users and 311 uses of IP multicast increase, it is becoming clear that a more 312 hierarchical approach is required. 314 An architecture [10] is currently being developed based around a 315 well-defined API that an application can use to request an address. 316 The host then requests an address from a local address allocation 317 server, which in turn chooses and reserves an unallocated address 318 from a range dynamically allocated to the domain. By allocating 319 addresses in a hierarchical and topologically sensitive fashion, the 320 address itself can be used in a hierarchical multicast routing 321 protocol currently being developed (BGMP, [29]) that will help 322 multicast routing scale more gracefully that current schemes. 324 4. Internet Service Models 326 +----------+----------------+--------------------+ 327 |Protocol | Documentation | Purpose | 328 +----------+----------------+--------------------+ 329 |IP | RFC 791 | Internet Protocol | 330 +----------+----------------+--------------------+ 332 Traditionally the internet has provided so-called best-effort 333 delivery of datagram traffic from senders to receivers. No 334 guarantees are made regarding when or if a datagram will be delivered 335 to a receiver, however datagrams are normally only dropped when a 336 router exceeds a queue size limit due to congestion. The best-effort 337 internet service model does not assume FIFO queuing, although many 338 routers have implemented this. 340 With best-effort service, if a link is not congested, queues will not 341 build at routers, datagrams will not be discarded in routers, and 342 delays will consist of serialization delays at each hop plus 343 propagation delays. With sufficiently fast link speeds, 344 serialization delays are insignificant compared to propagation 345 delays*. 346 _________________________ 347 * For slow links, a set of mechanisms has been defined that 348 If a link is congested, with best-effort service, queuing delays will 349 start to influence end-to-end delays, and packets will start to be 350 lost as queue size limits are exceeded. Real-time traffic does not 351 cope terribly well with packet loss levels of more than a few 352 percent, although it is possible to add redundancy [12] to increase 353 the levels at which loss becomes a problem. In the last few years a 354 significant amount of work has also gone into providing non-best- 355 effort services that would provide a better assurance that an 356 acceptable quality conference will be possible. 358 4.1. Non-best effort service 360 Real-time internet traffic is defined as datagrams that are delay 361 sensitive. It could be argued that all datagrams are delay sensitive 362 to some extent, but for these purposes we refer only to datagrams 363 where exceeding an end-to-end delay bound of a few hundred 364 milliseconds renders the datagrams useless for the purpose they were 365 intended. For the purposes of this definition, TCP traffic is 366 normally not considered to be real-time traffic, although there may 367 be exceptions to this rule. 369 On congested links, best-effort service queuing delays will adversely 370 affect real-time traffic. This does not mean that best-effort 371 service cannot support real-time traffic --- merely that congested 372 best-effort links seriously degrade the service provided. For such 373 congested links, a better-that-best-effort service is desirable. 375 To achieve this, the service model of the routers can be modified. 376 FIFO queuing can be replaced by packet forwarding strategies that 377 discriminate different ``flows'' of traffic. The idea of a flow is 378 very general. A flow might consist of ``all marketing site web 379 traffic'', or ``all fileserver traffic to and from teller machines''. 380 On the other hand, a flow might consist of a particular sequence of 381 packets from an application in a particular machine to a peer 382 application in another particular machine set up on request, or it 383 might consist of all packets marked with a particular Type-of-Service 384 bit. 386 There is really a spectrum of possibilities for non-best-effort 387 service something like that shown in Figure 2. 389 Figure 2: Spectrum of internet service types 391 best effort assured by guaranteed by 392 unsignalled type of service per-flow reservation 393 +-------------+-------------+-------------+-------------+ 394 prioritized by assured by 395 type of service aggregate reservation 397 _________________________ 398 helps minimize serialization and link access delays [2]. 400 This spectrum is intended to illustrate that between best-effort, and 401 hard per-flow guarantees lie many possibilities for non-best-effort 402 service, including having hard guarantees based on an aggregate 403 reservation, assurances that traffic marked with a particular type- 404 of-service bit will not be dropped so long as it remains in profile, 405 and simpler prioritization-based services. 407 Towards the right hand side of the spectrum, flows are typically 408 identifiable in the Internet by the tuple: source machine, 409 destination machine, source port, destination port, protocol, any of 410 which could be ``ANY'' (wildcarded). 412 In the multicast case, the destination is the group, and can be used 413 to provide efficient aggregation. 415 Flow identification is called classification and a class (which can 416 contain one or more flows) has an associated service model applied. 417 This can default to best effort. 419 Through network management, we can imagine establishing classes of 420 long lived flows -- enterprise networks (``Intranets'') often enforce 421 traffic policies that distinguish priorities which can be used to 422 discriminate in favor of more important traffic in the event of 423 overload (though in an underloaded network, the effect of such 424 policies will be invisible, and may incur no load/work in routers). 426 The router service model to provide such classes with different 427 treatment can be as simple as a priority queuing system, or it can be 428 more elaborate. 430 Although best-effort services can support real-time traffic, 431 classifying real-time traffic separately from non-real-time traffic 432 and giving real-time traffic priority treatment ensures that real- 433 time traffic sees minimum delays. Non-real-time TCP traffic tends to 434 be elastic in its bandwidth requirements, and will then tend to fill 435 any remaining bandwidth. 437 We could imagine a future Internet with sufficient capacity to carry 438 all of the world's telephony traffic. Since this is a relatively 439 modest capacity requirement, it might be simpler to establish 440 ``POTS'' as a static class which is given some fraction of the 441 capacity overall, and then within the backbone of the network no 442 individual call need be given an allocation (i.e. we would no longer 443 need the call setup/tear down that was needed in the legacy POTS 444 which was only present due to under-provisioning of trunks, and to 445 allow the trunk exchanges the option of call blocking). The vision 446 is of a network that is engineered with capacity for all of the non- 447 best-effort average load sources to send without needing individual 448 reservations. 450 4.2. Reservations 451 +-----------------+----------------+---------------------------------------+ 452 |Protocol | Documentation | Purpose | 453 +-----------------+----------------+---------------------------------------+ 454 |RSVP | RFC 2205 | Resource ReSerVation Protocol (RSVP) | 455 |Controlled Load | RFC 2211 | Network service model | 456 | Service | | selected by RSVP | 457 |Guaranteed | RFC 2212 | Network service model | 458 | Service | | selected by RSVP | 459 +-----------------+----------------+---------------------------------------+ 461 For flows that may take a significant fraction of the network (i.e. 462 are ``special'' and can't just be lumped under a static class), we 463 need a more dynamic way of establishing these classifications. In 464 the short term, this applies to many multimedia calls since the 465 Internet is largely under-provisioned at the time of writing. 467 RSVP has been standardized for just this purpose. It provides flow 468 identification and classification. Hosts and applications are 469 modified to speak RSVP client language, and routers speak RSVP. 471 Since most traffic requiring reservations is delivered to groups 472 (e.g. TV), it is natural for the receiver to make the request for a 473 reservation for a flow. This has the added advantage that different 474 receivers can make heterogeneous requests for capacity from the same 475 source. Thus RSVP can accommodate monochrome, color and HDTV 476 receivers from a single source (also see section Figure 5). 478 Again the routers conspire to deliver the right flows to the right 479 locations. 481 RSVP accommodates the wildcarding noted above. 483 Admission Control 485 If a network is provisioned such that it has excess capacity for all 486 the real-time flows using it, a simple priority classification 487 ensures that real-time traffic is minimally delayed. However, if a 488 network is insufficiently provisioned for the traffic in a real-time 489 traffic class, then real-time traffic will be queued, and delays and 490 packet loss will result. Thus in an under-provisioned network, 491 either all real-time flows will suffer, or some of them must be given 492 priority. 494 RSVP provides a mechanism by which an admission control request can 495 be made, and if sufficient capacity remains in the requested traffic 496 class, then a reservation for that capacity can be put in place. 498 If insufficient capacity remains, the admission request will be 499 refused, but the traffic will still be forwarded with the default 500 service for that traffic's traffic class. In many cases even an 501 admission request that failed at one or more routers can still supply 502 acceptable quality as it may have succeeded in installing a 503 reservation in all the routers that were suffering congestion. This 504 is because other reservations may not be fully utilising their 505 reserved capacity in those routers where the reservation failed. 507 Billing 509 If a reservation involves setting aside resources for a flow, this 510 will tie up resources so that other reservations may not succeed, and 511 depending on whether the flow fills the reservation, other traffic is 512 prevented from using the network. Clearly some negative feedback is 513 required in order to prevent pointless reservations from denying 514 service to other users. This feedback is typically in the form of 515 billing. 517 Billing requires that the user making the reservation is properly 518 authenticated so that the correct user can be charged. Billing for 519 reservations introduces a level of complexity to the internet that 520 has not typically been experienced with non-reserved traffic, and 521 requires network providers to have reciprocal usage-based billing 522 arrangements for traffic carried between them. It also suggests the 523 use of mechanisms whereby some fraction of the bill for a link 524 reservation can be charged to each of the downstream multicast 525 receivers. 527 4.3. Differentiated Services 529 +-------------------------+----------------+------------------------+ 530 |Protocol | Documentation | Purpose | 531 +-------------------------+----------------+------------------------+ 532 |Differentiated Services | RFC 2474 | DS Field in IP Header | 533 |Differentiated Services | RFC 2475 | DS Architecture | 534 +-------------------------+----------------+------------------------+ 536 Whereas RSVP asks routers to classify packets into classes to achieve 537 a requested quality of services, it is also possible to explicitly 538 mark packets to indicate the type of service required. Of course, 539 there has to be an incentive and mechanisms to ensure that ``high- 540 priority'' is not set by everyone in all packets, and this incentive 541 is provided by edge-based policing and by buying profiles of higher 542 priority service. In this context, a profile could have many forms, 543 but a typical profile might be a token-bucket filter specifying a 544 mean rate and a bucket size with certain time-of-day restrictions. 546 This is still an active research area, but the general idea is for a 547 customer to buy from their provider a profile for higher quality 548 service, and the provider polices marked traffic from the site to 549 ensure that the profile is not exceeded. Within a provider's 550 network, routers give preferential services to packets marked with 551 the relevant type-of-service bit. Where providers peer, they arrange 552 for an aggregate higher-quality profile to be provided, and police 553 each other's aggregate if it exceeds the profile. In this way, 554 policing only needs to be performed at the edges to a provider's 555 network on the assumption that within the network there is sufficient 556 capacity to cope with the amount of higher-quality traffic that has 557 been sold. The remainder of the capacity can be filled with regular 558 best-effort traffic. 560 One big advantage of differentiated services over reservations is 561 that routers do not need to keep per-flow state, or look at source 562 and destination addresses to classify the traffic, and this means 563 that routers can be considerably simpler. Another big advantage is 564 that the billing arrangements for differentiated services are 565 pairwise between providers at boundaries -- at no time does a 566 customer need to negotiate a billing arrangement with each provider 567 in the path* 569 5. Transport Protocols 571 So-called real-time delivery of traffic requires little in the way of 572 transport protocol. In particular, real-time traffic that is sent 573 over more than trivial distances is not retransmittable. 575 With packet multimedia data there is no need for the different media 576 comprising a conference to be carried in the same packets. In fact 577 it simplifies receivers if different media streams are carried in 578 separate flows (i.e., separate transport ports and/or separate 579 multicast groups). This also allows the different media to be given 580 different quality of service. For example, under congestion, a 581 router might preferentially drop video packets over audio packets. 582 In addition, some sites may not wish to receive all the media flows. 583 For example, a site with a slow access link may be able to 584 participate in a conference using only audio and a whiteboard whereas 585 other sites in the same conference with more capacity may also send 586 and receive video. This can be done because the video can be sent to 587 a different multicast group than the audio and whiteboard. This is 588 first step towards coping with heterogeneity by allowing the 589 receivers to decide how much traffic to receive, and hence allowing a 590 conference to scale more gracefully. 592 5.1. Receiver Adaptation and Synchronization 594 _________________________ 595 * With reservations there may be ways to avoid this too, but 596 they're somewhat more difficult given the more specific nature of 597 a reservation. 599 Figure 3: Network Jitter and Packet Audio 600 |x| 601 | | | 602 |x| | 603 | | | 604 Compression |x| v 605 + Packetizer | | 606 +--------+ +-------+ 607 Microphone | | 1.5 Mbit/s link | | 608 +-----+ A A A | |-----------------| Router| 609 /~\------+__| |>>> >>> >>>| |A A A A A| | 610 \_/------+ |A->D |20 ms Audio| |-----------------| | 611 +-----+Timeslices | | ---------> | | 612 | | +-------+ 613 +--------+ |A| 614 | | | 615 Shared link: |x| | 616 Audio traffic |A| | 617 interspersed w |A| | 618 other traffic |x| | 619 |x| | 620 |x| | 621 Depacketizer |A| | 622 + Timing recovery |A| v 623 + Decompression | | 624 +--------+ +-------+ 625 Speaker | | 1.5 Mbit/s link | | 626 |\ +-----+ A A A | |-----------------| Router| 627 | +---+ | |<<< <<< <<<| |A A AA A | | 628 | | |-----|D->A |20 ms Audio| |-----------------| | 629 | +---+ +-----+Timeslices | | <--------- | | 630 |/ | | +-------+ 631 +--------+ | | 632 |X| | 633 |X| | 634 | | | 635 |X| v 636 | | 638 Best-effort traffic is delayed by queues in routers between the 639 sender and the receivers. Even reserved priority traffic may see 640 small transient queues in routers, and so packets comprising a flow 641 will be delayed for different times. Such delay variance is known as 642 jitter, and is illustrated in Figure 3. 644 Real-time applications such as audio and video need to be able to 645 buffer real-time data at the receiver for sufficient time to remove 646 the jitter added by the network and recover the original timing 647 relationships between the media data. In order to know how long to 648 buffer for, each packet must carry a timestamp which gives the time 649 at the sender when the data was captured. Note that for audio and 650 video data timing recovery, it is not necessary to know the absolute 651 time that the data was captured at the sender, only the time relative 652 to the other data packets. 654 Figure 4: Inter-media synchronization 656 Incoming packets 658 ---------------- +----------------+ 659 V A V AV A --> | Host | 660 ---------------- | Demuxing | 661 +----------------+ 662 / \ 663 / \ 664 A A / A V V \ V 665 v v 666 +---------------+ +---------------+ 667 | Depacketizer | per source | Depacketizer | 668 +---------------+ delay adaptation: +---------------+ 669 v \ 45 ms 95 ms / v 670 +------------+ \ / +------------+ 671 | format | \ / | format | 672 | conversion | +------------------+ | conversion | 673 +------------+ | synchronization | +------------+ 674 | | agent | | 675 | +------------------+ | 676 | mix / \ | 677 v / \ v 678 | | / \ | | 679 +-----+ / \ +-----+ 680 | | / \ | | 681 +-----+ / \ +-----+ 682 | A | / 95 ms 95 ms \ | V | 683 +-----+ / \ +-----+ 684 | A | <--+ +-> | V | 685 +-----+ /| +--------+ +-----+ 686 | A | +---+ | |/------\| | V | 687 +-----+>>>>>| | | || ||<<<<<+-----+ 688 +---+ | |\------/| 689 \| +--------+ 691 As audio and video flows will receive differing jitter and possibly 692 differing quality of service, audio and video that were grabbed at 693 the same time at the sender may not arrive at the receiver at the 694 same time. At the receiver, each flow will need a playout buffer to 695 remove network jitter. Inter-flow synchronization can be performed 696 by adapting these playout buffers so that samples/frames that 697 originated at the same time are played out at the same time (see 698 figure Figure 4). This requires that the time base of different 699 flows from the same sender can be related at the receivers, e.g. by 700 making available the absolute times at which each of them was 701 captured. 703 5.2. RTP 705 +-------------+----------------+--------------------------------------+ 706 |Protocol | Documentation | Purpose | 707 +-------------+----------------+--------------------------------------+ 708 |RTP,RTCP | RFC 1889 | packet format for realtime traffic | 709 |RTP Profile | RFC 1890 | specific RTP profile for AV traffic | 710 |RTP Payload | RFC 2032, | payload formats for specific codecs | 711 | Formats | 2035, etc | | 712 +-------------+----------------+--------------------------------------+ 714 The transport protocol for real-time flows is RTP [28]. This 715 provides a standard format packet header which gives media specific 716 timestamp data, as well as payload format information and sequence 717 numbering amongst other things. RTP is normally carried using UDP. 718 It does not provide or require any connection setup, nor does it 719 provide any enhanced reliability over UDP. For RTP to provide a 720 useful media flow, there must be sufficient capacity in the relevant 721 traffic class to accommodate the traffic. How this capacity is 722 ensured is independent of RTP. 724 Every original RTP source is identified by a source identifier, and 725 this source id is carried in every packet. RTP allows flows from 726 several sources to be mixed in gateways to provide a single resulting 727 flow. When this happens, each mixed packet contains the source IDs 728 of all the contributing sources. 730 RTP media timestamp units are flow specific --- they are in units 731 that are appropriate to the media flow. For example, 8kHz sampled 732 PCM encoded audio has a timestamp clock rate of 8kHz. This means 733 that inter-flow synchronization is not possible from the RTP 734 timestamps alone. 736 Each RTP flow is supplemented by Real-Time Control Protocol (RTCP) 737 packets. There are a number of different RTCP packet types. RTCP 738 packets provide the relationship between the realtime clock at a 739 sender and the RTP media timestamps so that inter-flow 740 synchronization can be performed, and they provide textual 741 information to identify a sender in a conference from the source id. 743 5.3. Conference Membership and Reception Feedback 745 IP multicast allows sources to send to a multicast group without 746 being a receiver of that group. However, for many conferencing 747 purposes it is useful to know who is listening to the conference, and 748 whether the media flows are reaching receivers properly. Accurately 749 performing both these tasks restricts the scaling of the conference. 750 IP multicast means that no-one knows the precise membership of a 751 multicast group at a specific time, and this information cannot be 752 discovered, as to try to do so would cause an implosion of messages, 753 many of which would be lost*. Instead, RTCP provides approximate 754 membership information through periodic multicast of session messages 755 which, in addition to information about the recipient, also give 756 information about the reception quality at that receiver. RTCP 757 session messages are restricted in rate, so that as a conference 758 grows, the rate of session messages remains constant, and each 759 receiver reports less often. A member of the conference can never 760 know exactly who is present at a particular time from RTCP reports, 761 but does have a good approximation to the conference membership. The 762 is analogous to what happens in a real-world meeting hall; the 763 meeting organizers may have an attendance list, but if people are 764 coming and going all the time, they probably do not know exactly who 765 is in the room at any one moment. 767 Reception quality information is primarily intended for debugging 768 purposes, as debugging of IP multicast problems is a difficult task. 769 However, it is possible to use reception quality information for rate 770 adaptive senders, although it is not clear whether this information 771 is sufficiently timely to be able to adapt fast enough to transient 772 congestion. 774 5.4. Scaling Issues and Heterogeneity 776 The Internet is very heterogeneous, with link speeds ranging from 777 around 10 kbit/s up to around 10 Gbit/s, and very varied levels of 778 congestion. How then can a single multicast source satisfy a large 779 and heterogeneous set of receivers? 781 _________________________ 782 * Note that a conference policy that restricts conference mem- 783 bership can be implemented using encryption and restricted distri- 784 bution of encryption keys, of which more later. 786 Figure 5: Receiver adaptation: multiple layers and multicast groups 788 /~~~\ ##### 2 Mbit/s layer 789 | R | ===== 512 kbit/s layer 790 \___/ ----- 64 kbit/s layer 791 # 792 10 Mbit/s # 10 Mbit/s 793 link: # link 794 # 795 /~~~\ #######> +---+ 796 | S | =======> | | 797 \___/ -------> +---+ 798 \ = 1.5 Mbit/s 799 Source \ = link 800 \ = 1.5 Mbit/s 801 \ = link 802 \ +---+=========>/~~~\ 803 | |--------->| R | 804 +---+ \___/ 805 10 Mbit/s / = \ 806 link / = \ 128 kbit/s 807 / = \ link 808 / = \ 10 Mbit/s 809 /~~~\ +---+ link /~~~\ 810 | R | | |--------->| R | 811 \___/ +---+ \___/ 812 \ 813 \ 10 Mbit/s 814 \ link 815 \ 816 /~~~\ 817 | R | 818 \___/ 820 In addition to each receiver performing its own adaptation to jitter, 821 if the sender layers [22] its video (or audio) stream, different 822 receivers can choose to receive different amounts of traffic and 823 hence different qualities. To do this the sender must code they 824 video as a base layer (the lowest quality that might be acceptable) 825 and a number of enhancement layers, each of which adds more quality 826 at the expense of more bandwidth. With video, these additional 827 layers might increase the framerate or increase the spatial 828 resolution of the images or both. Each layer is sent to a different 829 multicast group, and receivers can decide individually how many 830 layers to subscribe to. This is illustrated in Figure 5. Of course, 831 if they are going to respond to congestion in this way, then we also 832 need to arrange that the receivers in a conference behind a common 833 bottleneck tend to respond together to prevent de-synchronized 834 experiments by different receivers from having the net effect that 835 too many layers are always being drawn through a common bottleneck. 836 RLM [21] is one way that this might be achieved, although there is 837 continuing research in this area. 839 6. Conference Control 841 +---------+--------------------------+-------------------------------------+ 842 |Protocol | Documentation | Purpose | 843 +---------+--------------------------+-------------------------------------+ 844 |H.323 | ITU recommendation H.323 | Tightly coupled conference setup | 845 | | | and control | 846 |H.332 | ITU recommendation H.332 | Loosely coupled extensions to H.323 | 847 +---------+--------------------------+-------------------------------------+ 849 Conferences come in many shapes and sizes, but there are only really 850 two models for conference control: light-weight sessions and tightly 851 coupled conferencing. For both models, rendezvous mechanisms are 852 needed. Note that the conference control model is orthogonal to 853 issues of quality of service and network resource reservation, and it 854 is also orthogonal to the mechanism for discovering the conference. 856 Light-weight sessions are multicast based multimedia conferences that 857 lack explicit conference membership control and explicit conference 858 control mechanisms. Typically a lightweight session consists of a 859 number of many-to-many media streams supported using RTP and RTCP 860 using IP multicast*. Typically, the only conference control 861 information needed during the course of a light-weight session is 862 that distributed in the RTCP session information, i.e. an approximate 863 membership list with some attributes per member. 865 Tightly coupled conferences may also be multicast based and use RTP 866 and RTCP, but in addition they have an explicit conference membership 867 mechanism and may have an explicit conference control mechanism that 868 provides facilities such as floor control. 870 The most widely used tightly coupled conference control protocols 871 suitable for Internet use are those belonging to the ITU's H.323 872 family [16]. However it should be noted that this is inappropriate 873 for larger conferences where scaling problems will be introduced by 874 the conference control mechanisms. The Simple Conference Control 875 Protocol (SCCP) [18] has been proposed as a more scalable distributed 876 conference control protocol. 878 In order to try and address large conferences, the ITU is currently 879 standardising H.332 [17], which is essentially a small tightly 880 coupled H.323 conference with a larger lightweight-sessions-style 881 _________________________ 882 * There is some confusion on the term session, which is some- 883 times used for a conference and sometimes for a single media 884 stream transported by RTP. In this document, we prefer to use the 885 less ambiguous term conference except where existing protocols use 886 the term session. 888 conference listening in as passive participants. It is not yet clear 889 whether H.332 will see large scale acceptance, as its benefits over a 890 simple lightweight session are not terribly obvious. It seems likely 891 that lightweight sessions combined with stream authentication (see 892 section 8.3) might be a more appropriate solution for many potential 893 customers. 895 6.1. Controlling Multimedia Servers 897 +---------+---------------+---------------------------------+ 898 |Protocol | Documentation | Purpose | 899 +---------+---------------+---------------------------------+ 900 |RTSP | RFC 2326 | Remote control and AV playback | 901 | | | and recording servers | 902 +---------+---------------+---------------------------------+ 904 The Real-Time Stream-control Protocol (RTSP) provides a standard way 905 to remote control a multimedia server. While primarily aimed at web- 906 based media-on-demand services, RTSP is also well suited to provide 907 VCR-like controls for audio and video streams, and to provide 908 playback and record functionality of RTP data streams. A client can 909 specify that an RTSP server plays a recorded multimedia session into 910 an existing multicast-based conference, or can specify that the 911 server should join the conference and record it. 913 6.2. Protocols for Non-A/V Applications 915 Applications other than audio and video have evolved in Internet 916 conferencing, e.g. Imm, Wb [8], NTE [11]. Such applications can be 917 used to substitute for meeting aids in physical conferences 918 (whiteboards, projectors) or replace visual and auditory cues that 919 are lost in teleconferences (e.g., a speaker list application); they 920 also can enable new styles of joint work. 922 Most non-A/V applications have in common that the application 923 protocol is about establishing and updating a shared state. Loss of 924 information is often not acceptable, so some form of multicast 925 reliability is required. The applications' requirements differ: Some 926 applications make per-participant additions to the shared state that 927 are orthogonal to each other (e.g., whiteboards), some evolve a more 928 closely interrelated common state (e.g., additions to a speaker list 929 must be properly sequenced). Some applications can make use of added 930 bandwidth/react to congestion in an elastic way, others transport 931 data that, although not strictly real-time, is time-critical. 933 In the IRTF research group on Reliable Multicast [13], work is in 934 progress on common protocol elements that can be used in such 935 applications. At the time of writing, some aspects of reliable 936 multicast are not well-understood, such as the proper way to provide 937 congestion control in a multi-sender multicast environment. As 938 congestion control is considered an essential element, standards 939 track protocols are not expected before this can be solved. 941 7. Conference Setup 943 There are two basic forms of conference discovery mechanism. These 944 are session advertisement and session invitation. Session 945 advertisements are provided using a session directory, and inviting a 946 user to join a session is provided using a session invitation 947 protocol such as SIP or H.323. 949 7.1. Session Directories 951 +----------+------------------+----------------------------------+ 952 |Protocol | Documentation | Purpose | 953 +----------+------------------+----------------------------------+ 954 |SDP | RFC 2327 | Session description format | 955 |SAP | Internet draft | Multicast session announcements | 956 +----------+------------------+----------------------------------+ 958 The rendezvous mechanism for many light-weight sessions is a 959 multicast based session directory. This ``broadcasts'' session 960 descriptions [9] to all the potential session participants. These 961 session descriptions provide an advertisement that the session will 962 exist, and also provide sufficient information including multicast 963 addresses, ports, media formats and session times so that a receiver 964 of the session description can join the session. The session 965 description protocol (SDP) describes the content and format of a 966 multimedia session, and the session announcement protocol (SAP) is 967 used to distribute it to all potential session recipients. 969 This mechanism can also be applied to advertised tightly coupled 970 sessions, and only requires that additional information about the 971 mechanism to use to join the session is given. However, as the 972 number of sessions in the session directory grows, we expect that 973 only larger-scale public sessions will be announced in this manner, 974 and smaller, more private sessions will tend to use direct invitation 975 rather than advertisement. 977 7.2. Session Invitation 979 +----------+----------------+----------------------------------------------+ 980 |Protocol | Documentation | Purpose | 981 +----------+----------------+----------------------------------------------+ 982 |SIP | RFC 2543 | initiating multimedia calls and conferences | 983 +----------+----------------+----------------------------------------------+ 984 Not all sessions are advertised, and even those that are advertised 985 may require a mechanism to explicitly invite a user to join a 986 session. Such a mechanism is required regardless of whether the 987 session is a lightweight session or a more tightly coupled session, 988 although the invitation system must specify the mechanism to be used 989 to join the session. 991 As users are mobile, it is important that such an invitation 992 mechanism is capable of locating and inviting a user in a location 993 independent manner. Thus user addresses need to be used as a level 994 of indirection rather than routing a call to a specific terminal. 995 The invitation mechanism should also provide for alternative 996 responses, such as leaving a message or being referred to another 997 user, should the invited user be unavailable. 999 The Session Initiation Protocol (SIP) provides a mechanism whereby a 1000 user can be invited to participate in a conference. SIP does not 1001 care whether the session is already ongoing, or is just being 1002 created, and it doesn't care whether the conference is a small 1003 tightly coupled session or a huge broadcast -- it merely conveys an 1004 invitation to a user in a timely manner, inviting them to 1005 participate, and provides enough information for them to be able to 1006 know what sort of session to expect. Thus although SIP can be used 1007 to make telephone-style calls, it is by no means restricted to that 1008 style of conference. 1010 8. Security 1012 There is a temptation to believe that multicast is inherently less 1013 private than unicast communication since the traffic visits so many 1014 more places in the network. In fact, this is not the case except 1015 with broadcast and prune type multicast routing protocols [4]. 1016 However, IP multicast does make it simple for a host to anonymously 1017 join a multicast group and receive traffic destined to that group 1018 without the other senders' and receivers' knowledge. If the 1019 application requirement (conference policy) is to communicate between 1020 some defined set of users, then strict privacy can only be enforced 1021 in any case through adequate end-to-end encryption. 1023 RTP specifies a standard way to encrypt RTP and RTCP packets using 1024 private key encryption schemes such as DES [24]. It also specifies a 1025 standard mechanism to manipulate plain text keys using MD5 [25] so 1026 that the resulting bit string can be used as a DES key. This allows 1027 simple out-of-band mechanisms such as privacy-enhanced mail to be 1028 used for encryption key exchange. 1030 8.1. Authentication and Key Distribution 1032 +----------+----------------------------+---------------------------+ 1033 |Protocol | Documentation | Purpose | 1034 +----------+----------------------------+---------------------------+ 1035 |PGP | RFC 1991 | public key cryptography | 1036 |X.509 | ITU recommendation X.509 | directory authentication | 1037 +----------+----------------------------+---------------------------+ 1039 Key distribution is closely tied to authentication. Conference or 1040 session directory keys can be securely distributed using public-key 1041 cryptography on a one-to-one basis (by email, a directory service, or 1042 by an explicit conference setup mechanism), but this is only as good 1043 as the certification mechanism used to certify that a key given by a 1044 user is the correct public key for that user. Such certification 1045 mechanisms [3] are however not specific to conferencing, and it looks 1046 likely that certificates such as those provided by PGP will be most 1047 widely used in the near term. 1049 Session keys can be distributed using encrypted Session Descriptions 1050 carried in SIP session invitations, or in encrypted session 1051 announcements as described below. Neither of these mechanisms 1052 provide for changing keys during a session as might be required in 1053 some tightly coupled sessions, but they are probably sufficient for 1054 many used in the context of lightweight sessions. 1056 Even without privacy requirements in the conference policy, strong 1057 authentication of a user is required if making a network reservation 1058 results in usage based billing. 1060 8.2. Encrypted Session Announcements 1062 +----------+------------------+----------------------------------+ 1063 |Protocol | Documentation | Purpose | 1064 +----------+------------------+----------------------------------+ 1065 |SAP | Internet draft | multicast session announcements | 1066 +----------+------------------+----------------------------------+ 1068 Session Directories can make encrypted session announcements using 1069 private key encryption, and carry the encryption keys to be used for 1070 each of the conference media streams in the session. Whilst this 1071 does not solve the key distribution problem, it does allow a single 1072 conference to be announced more than once to more than one key-group, 1073 where each group holds a different session directory key, so that the 1074 two groups can be brought together into a single conference without 1075 having to know each other's keys. 1077 8.3. Secured ``Broadcasts'' 1079 While private-key encryption is sufficient to exclude non-members 1080 Figure 6: Joining a light-weight multimedia session 1082 User A | | 1083 creates | SDP/SAP | 1084 conference |-----------> | 1085 | |User B 1086 | SDP/SAP IGMP |starts 1087 |-----------> IGMP /--<---------|session 1088 | IGMP /-<------/ |directory 1089 |-----------<---------/ | 1090 | | 1091 | SDP/SAP | 1092 |-------------------------------------------->| 1093 | | 1094 User A | | 1095 starts | RTP | 1096 sending |===========> | 1097 | RTCP | 1098 |-----------> | 1099 | | 1100 | RTP | 1101 |===========> | 1102 | | 1103 | RTP |User B 1104 |===========> IGMP |joins 1105 | IGMP /--<---------|conference 1106 | IGMP /-<------/ | 1107 |-----------<---------/ |User's App 1108 | RTCP |Sends RTCP 1109 | RTP /--<---------|Session 1110 |===============================/============>|Message 1111 |<-----------------------------/ | 1112 | RSVP Path Message | 1113 |-------------------------------------------->| 1114 | |User's App 1115 | RTP /-----------|makes 1116 |================================/===========>|reservation 1117 | RSVP RESV Message /-----/ | 1118 |<-----------------------/ | 1119 | | 1120 | RTP |Quality 1121 |============================================>|of Service 1122 | |improves 1124 from sending or receiving multicast conference traffic, it does mean 1125 that all members of a session are equal. This is normally acceptable 1126 for multi-way conferences, but will not be acceptable for many 1127 broadcasters who require the ability to ensure that only they can 1128 send, perhaps in addition to ensuring that only their paid customers 1129 can receive. This is nicely illustrated by the multicast of the 1130 Rolling Stones concert in 1994 which was billed as being the first 1131 live concert on the Mbone. In fact, this honour goes to a little 1132 known band called Severe Tire Damage who had multicast an impromptu 1133 concert a year previously. To make their point, just before the 1134 Stones were due to go on stage, Severe Tire Damage suddenly started 1135 broadcasting one of their songs live to the same multicast group. 1136 Clearly commercial broadcasters want to avoid occurrences like this 1137 one. 1139 Such secured broadcasts can be performed by encrypting a hash 1140 (digitally signing) of each packet with the senders private key of a 1141 public-private key pair. The public key is then given to the 1142 receivers, and they discard (and prune if possible) any packets that 1143 are unsigned. The problem with this is that even encrypting a 128 1144 bit hash with a public key algorithm can be relatively expensive to 1145 perform at high packet rates sometimes seen with video. The use of 1146 public-key cryptography for this purpose has not yet been 1147 standardized, but some such mechanism will clearly be needed before 1148 the Mbone becomes an acceptable environment for commercial 1149 broadcasters. 1151 9. Summary 1153 This document is an attempt to gather together in one place the set 1154 of assumptions behind the design of the Internet Multimedia 1155 Conferencing architecture, and the services that are provided to 1156 support it. 1158 Figure 6 shows an example time sequence involved in setting up a 1159 light-weight session between two sites. In this case, site A creates 1160 a session advertisement, and some time later starts sending a media 1161 stream even though there may be no receiver at that time. Some time 1162 later, site B joins the session (the multicast routing protocol here 1163 is PIM), and starts to receive the traffic. At the earliest 1164 opportunity site B also makes an RSVP reservation to ensure the flow 1165 quality is satisfactory. This example should be taken as 1166 illustrative only -- there are different ways to join sessions, and 1167 different ways to get improved quality of service. 1169 The lightweight sessions model for Internet multimedia conferencing 1170 may not be appropriate for all conferences, but for those sessions 1171 that do not require tightly-coupled conference control, it provides 1172 an elegant style of conferencing that scales from two participants to 1173 millions of participants. It achieves this scaling by virtue of the 1174 way that multicast routing is receiver driven, keeping essential 1175 information about receivers local to those receivers. Each new 1176 participant only adds state close to them in the network. It also 1177 scales by not requiring explicit conference join mechanisms; if 1178 everyone were to need to know exactly who is in the session at any 1179 time, the scaling would be severely adversely affected. RTCP 1180 provides membership information that is accurate when the group is 1181 small, and increasingly only a statistical representation of the 1182 membership as the group grows. Security is handled through the use 1183 of encryption rather than through the control of data distribution. 1185 For those that require tightly coupled conferences, solutions such as 1186 H.323 are emerging there too. 1188 There are still many parts of this architecture that are incomplete, 1189 and are still the subject of active research. In particular, 1190 differentiated services for better-than-best-effort service show 1191 great promise to provide a more scalable alternative to individual 1192 reservations. Multicast routing scales well to large groups, but 1193 scales less well to large numbers of groups; we expect this will 1194 become the subject of significant research over the next few years. 1195 Multicast congestion control mechanisms are still a research topic, 1196 although in the last year several schemes have emerged that show 1197 promise. Layered codecs show great promise to allow conferences to 1198 scale in the face of heterogeneity, but the join and leave mechanisms 1199 that allow them to perform receiver-based congestion control are 1200 still being examined. We have several working examples of reliable- 1201 multicast-based shared applications; the next few years should see 1202 the start of standardization work in this area as appropriate 1203 multicast congestion control mechanisms emerge. Finally a complete 1204 security architecture for conferencing would be very desirable; 1205 currently we have many parts of the solution, but are still waiting 1206 for an appropriate key-distribution architecture to emerge from the 1207 security research community. 1209 The Internet Multimedia Conferencing architecture and the Mbone have 1210 come a long way from their early beginnings on the DARTnet testbed in 1211 1992. The picture is not yet finished, but it has now taken shape 1212 sufficiently that we can see the form it will take. Whether or not 1213 the Internet does evolve into the single communications network that 1214 is used for most telephone, television, and other person-to-person 1215 communication, only time will tell. However, we believe that it is 1216 becoming clear that if the industry decides that this should be the 1217 case, the Internet should be up to the task. 1219 10. Acknowledgments 1221 Acknowledgments are due to the End-to-End Research Group, the Int- 1222 serv, RSVP, MMUSIC and AVT working groups of the IETF, and discussion 1223 with colleagues at UCL. The earliest clear exposition of some of the 1224 ideas here was presented at ACM SIGCOMM 1994 in London by Van 1225 Jacobson. 1227 11. Authors' Addresses 1229 Mark Handley 1230 AT&T Center for Internet Research at ICSI 1231 1947 Center St, Suite 600 1232 Berkeley, CA 94704 1233 EMail: mjh@aciri.org 1235 Jon Crowcroft, 1236 Department of Computer Science 1237 University College London 1238 Gower Street, 1239 London WC1E 6BT, UK. 1240 Email: j.crowcroft@cs.ucl.ac.uk 1242 Carsten Bormann, Joerg Ott 1243 Universitaet Bremen TZI 1244 Postfach 330440 1245 D-28334 Bremen, GERMANY. 1246 Email: cabo@tzi.org, jo@tzi.org 1248 References 1250 [1] A. Ballardie, P. Francis, J. Crowcroft, ``An Architecture for 1251 Scalable Inter-Domain Multicast Routing'', ACM SIGCOMM 1993, pp 1252 85-95. 1254 [2] C. Bormann, ``Providing integrated services over low-bitrate 1255 links,'' RFC2689, September 1999. 1257 [3] CCITT (Consultative Committee on International Telegraphy and 1258 Telephony). ``Recommendation X.509: The Directory -- 1259 Authentication Framework.'' 1988. 1261 [4] S. Deering, C. Partridge, D. Waitzman, ``Distance Vector 1262 Multicast Routing Protocol'', RFC 1075, Nov 1988. 1264 [5] Steve Deering, ``Multicast Routing in Internetworks and Extended 1265 LANs'', ACM SIGCOMM 88, August 1988, pp 55-64 and ``Host 1266 Extensions for IP Multicasting'', RFC 1112. 1268 [6] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C-G. Liu, L. 1269 Wei ``An Architecture for Wide Area Multicast Routing'' ACM 1270 SIGCOMM 1994, pp 126-135. 1272 [7] Estrin, Farinacci, Helmy, Thaler, Deering, Handley, Jacobson, 1273 Liu, Sharma, Wei, ``Protocol Independent Multicast-Sparse Mode 1274 (PIM-SM): Protocol Specification'', RFC 2362. 1276 [8] S. Floyd, V. Jacobson, S. McCanne, C-G. Liu, L. Zhang, ``A 1277 Reliable Multicast Framework for Light-weight Sessions and 1278 Application Level Framing'' ACM SIGCOMM 1995, pp 342-356. 1280 [9] M. Handley, V. Jacobson, ``SDP: Session Description Protocol'' 1281 INTERNET-DRAFT, Dec 1997. 1283 [10] M. Handley, D. Thaler, D. Estrin, ``The Internet Multicast 1284 Address Allocation Architecture'', INTERNET-DRAFT, Dec 1997. 1286 [11] M. Handley, J. Crowcroft, ``Network Text Editor (NTE): A 1287 scalable shared text editor for the MBone'', ACM SIGCOMM 1997. 1289 [12] V. Hardman, A. Sasse, M. Handley, A. Watson, ``Reliable Audio 1290 for Use over the Internet'' Proc INET '95, Hawaii, Internet 1291 Society, Reston, VA, 1995. 1293 [13] IRTF Research Group on Reliable Multicast, 1294 http://www.east.isi.edu/RMRG/ 1296 [14] ITU ``Recommendation H.320: Narrow-band visual telephone systems 1297 and terminal equipment'', ITU, Geneva, 1997 1299 [15] ITU ``Recommendation T.124 -- Generic Conference Control'', ITU, 1300 Geneva. 1302 [16] ITU ``Recommendation H.323: Visual telephone systems and 1303 equipment for local area networks which provide a non guaranteed 1304 quality of service'', ITU, Geneva, 1996 1306 [17] ITU ``Recommendation H.332: H.323 Extended for Loosely-Coupled 1307 conferences'', ITU, Geneva 1309 [18] C. Bormann, J. Ott, C. Reichert, ``Simple Conference Control 1310 Protocol'' INTERNET-DRAFT, June 1996. 1312 [19] V. Jacobson, ``Congestion Avoidance and Control'', ACM SIGCOMM 1313 1988. 1315 [20] J. Linn, ``Privacy Enhancement for Internet Electronic Mail: 1316 Part I: Message Encryption and Authentication Procedures'', RFC 1317 1421, Feb 1993 1319 [21] S. McCanne, V. Jacobson and M. Vetterli, ``Receiver-driven 1320 Layered Multicast''. ACM SIGCOMM 1996, pp. 117-130. 1322 [22] S. McCanne, M. Vetterli, ``Joint Source/Channel Coding for 1323 Multicast Packet Video''. Proceedings of the IEEE International 1324 Conference on Image Processing. October, 1995. Washington, DC. 1326 [23] J. Moy, ``Multicast Extensions to OSPF'', RFC 1584, March 1994. 1328 [24] National Institute of Standards and Technology (NIST), ``FIPS 1329 Publication 46-1: Data Encryption Standard'', January 22, 1988 1331 [25] Rivest, R., ``The MD5 Message-Digest Algorithm'', RFC 1321, MIT 1332 Laboratory for Computer Science and RSA Data Security, Inc., 1333 April 1992 1335 [26] Schooler, E., A Distributed Architecture for Multimedia 1336 Conference Control, ISI Research Report ISI/RR-91-289, November 1337 1991. ftp://ftp.isi.edu/pub/hpcc-papers/mmc/mmcc.ps 1339 [27] Schulzrinne, H., ``Personal Mobility for Multimedia Services in 1340 the Internet'' IMDS'96, March 4-6 1996. 1341 ftp://ftp.fokus.gmd.de/pub/step/papers/Schu9603:Personal.ps.gz 1343 [28] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson ``RTP: A 1344 Transport Protocol for Real-Time Applications'' RFC 1889. 1346 [29] D. Thaler, D. Estrin, D. Meyer, ``Border Gateway Multicast 1347 Protocol'', INTERNET-DRAFT, Oct 1997.