| < draft-rosenberg-sip-conferencing-models-00.txt | draft-rosenberg-sip-conferencing-models-01.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force SIP WG | Internet Engineering Task Force SIPPING WG | |||
| Internet Draft J.Rosenberg,H.Schulzrinne | Internet Draft J.Rosenberg,H.Schulzrinne | |||
| draft-rosenberg-sip-conferencing-models-00.txt dynamicsoft,Columbia U. | draft-rosenberg-sip-conferencing-models-01.txt dynamicsoft,Columbia U. | |||
| November 17, 2000 | July 20, 2001 | |||
| Expires: May, 2001 | Expires: February 2002 | |||
| Models for Multi Party Conferencing in SIP | Models for Multi Party Conferencing in SIP | |||
| STATUS OF THIS MEMO | STATUS OF THIS MEMO | |||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
| Drafts. | Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet- Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as work in progress. | material or to cite them other than as "work in progress". | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt | http://www.ietf.org/ietf/1id-abstracts.txt | |||
| The list of Internet-Draft Shadow Directories can be accessed at | To view the list Internet-Draft Shadow Directories, see | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| Abstract | Abstract | |||
| The Session Initiation Protocol (SIP) can support multi-party | The Session Initiation Protocol (SIP) can support multi-party | |||
| conferencing in many different ways. In this draft, we define the | conferencing in many different ways. In this draft, we define the | |||
| various multi-party conferencing models, and for each, discuss how | various multi-party conferencing models, and for each, discuss how | |||
| they are used and then analyze their relative benefits and drawbacks. | they are used and then analyze their relative benefits and drawbacks. | |||
| 1 Introduction | 1 Introduction | |||
| skipping to change at page 2, line 22 ¶ | skipping to change at page 2, line 22 ¶ | |||
| o How users can join an existing conference without being | o How users can join an existing conference without being | |||
| invited | invited | |||
| o How well the model scales. | o How well the model scales. | |||
| o Which entities need to be aware of the model. | o Which entities need to be aware of the model. | |||
| o How participants learn about each other. | o How participants learn about each other. | |||
| We also identify missing pieces and recommend standard activity to | We also identify missing pieces and reccomend standard activity to | |||
| fill them in. This document itself does not define any new extensions | fill them in. This document itself does not define any new extensions | |||
| of any kind. | of any kind. However, several scenarios discussed in the draft make | |||
| use of existing extensions to SIP. | ||||
| 2 End System Mixing | 2 End System Mixing | |||
| The first model we call "end system mixing". In this model, user A | The first model we call "end system mixing". In this model, user A | |||
| calls user B, and they have a conversation. At some point later, A | calls user B, and they have a conversation. At some point later, A | |||
| decides to conference in user C. To do this, A calls C, using a | decides to conference in user C. To do this, A calls C, using a | |||
| completely separate SIP call. This call uses a different Call-ID, | completely separate SIP call. This call uses a different Call-ID, | |||
| different tags, etc. There is no call set up directly between B and | different tags, etc. There is no call set up directly between B and | |||
| C. A receives media streams from both B and C, and mixes them. A | C. A receives media streams from both B and C, and mixes them. A | |||
| sends a stream containing A's and C's streams to B, and a stream | sends a stream containing A's and C's streams to B, and a stream | |||
| skipping to change at page 2, line 48 ¶ | skipping to change at page 2, line 49 ¶ | |||
| Basically, user A handles both signaling and media mixing. B and C | Basically, user A handles both signaling and media mixing. B and C | |||
| are unaware of the multi-party call, from a SIP perspective at least. | are unaware of the multi-party call, from a SIP perspective at least. | |||
| From an RTP perspective, A is a mixer, and so the RTCP reports from A | From an RTP perspective, A is a mixer, and so the RTCP reports from A | |||
| will contain SDES information that indicates the existence of an | will contain SDES information that indicates the existence of an | |||
| additional party in the media stream. | additional party in the media stream. | |||
| Note that this model has the serious drawback that the conference | Note that this model has the serious drawback that the conference | |||
| ends when the mixing UA leaves the call. | ends when the mixing UA leaves the call. | |||
| 2.1 Inviting Users to Join | OPEN ISSUE: Another problem with this approach is that | |||
| there is no specific way for A to determine when a | ||||
| Any user in the conference can invite another user to join, so long | ||||
| as they are capable of performing the required mixing and signaling | ||||
| +----------+ | +----------+ | |||
| | | | | | | |||
| -- | | | -- | | | |||
| --- | B | | --- | B | | |||
| SIP call --- | | | SIP call --- | | | |||
| --- .. | | | --- .. | | | |||
| --- .. +----------+ | --- .. +----------+ | |||
| -- ... | -- ... | |||
| ... | ... | |||
| +----------+ .. RTP | +----------+ .. RTP | |||
| skipping to change at page 3, line 32 ¶ | skipping to change at page 3, line 32 ¶ | |||
| --- . +----------+ | --- . +----------+ | |||
| -- | | | -- | | | |||
| -- | | | -- | | | |||
| SIP call -- | | | SIP call -- | | | |||
| | C | | | C | | |||
| | | | | | | |||
| +----------+ | +----------+ | |||
| Figure 1: Three Way Calling using End System Mixing | Figure 1: Three Way Calling using End System Mixing | |||
| signaling message it receives was meant just for it, or for | ||||
| the entire conference. For example, if B sends a REFER to | ||||
| A, pointing to user D, was this REFER meant for A alone, or | ||||
| for A and C? If it was meant for A and C, presumably A | ||||
| would act upon the REFER and send it to C as well. C too | ||||
| would act on the REFER. This would cause two separate | ||||
| REFER-triggered INVITEs to get routed to D. How would D | ||||
| know that both INVITEs need to be mixed together as a | ||||
| conference? What if it cannot support this capability? | ||||
| Because the three-way calling approach works only for the most basic | ||||
| case, we do not recommend it as a general solution. | ||||
| 2.1 Inviting Users to Join | ||||
| Any user in the conference can invite another user to join, so long | ||||
| as they are capable of performing the required mixing and signaling | ||||
| functions. To invite a new user to join, a user in the conference | functions. To invite a new user to join, a user in the conference | |||
| simply calls them using normal SIP procedures. The only difference is | simply calls them using normal SIP procedures. The only difference is | |||
| that the stream sent to that new user contains the streams received | that the stream sent to that new user contains the streams received | |||
| from the other parties in the call. | from the other parties in the call. | |||
| In fact, it is perfectly acceptable for complex connectivity graphs | In fact, it is acceptable for complex connectivity graphs to be | |||
| to be constructed, as a result of different users inviting other | constructed, as a result of different users inviting other users to | |||
| users to join. For example, take our case of A calling B, and then | join. For example, take our case of A calling B, and then calling C. | |||
| calling C. If, later on, C calls D, C will performing the mixing of | If, later on, C calls D, C will performing the mixing of the streams | |||
| the streams it gets from A (which actually contain media from A and | it gets from A (which actually contain media from A and B), along | |||
| B), along with its own stream, and send that to D. This results in a | with its own stream, and send that to D. This results in a | |||
| connectivity graph that looks like: | connectivity graph that looks like Figure 2. | |||
| A------B | A------B | |||
| | | | | |||
| | | | | |||
| C------D | C------D | |||
| Figure 2: Connectivity Graph | ||||
| Note, however, that there is a possibility of loops. From here, if D | Note, however, that there is a possibility of loops. From here, if D | |||
| calls B, and brings that stream into the conference, a loop is | calls B, and brings that stream into the conference, a loop is | |||
| created. This loop can be detected using the mechanisms described in | created. This loop can be detected using the mechanisms described in | |||
| the RTP specification [2]. However, we expect these conditions to be | the RTP specification [2]. However, we expect these conditions to be | |||
| extremely rare. Presumably, D knows B is in the conference already, | extremely rare. Presumably, D knows B is in the conference already, | |||
| and so would not likely call B and invite them in. | and so would not likely call B and invite them in. | |||
| A serious problem with the more complex topologies is that the | ||||
| departure of a participant might cause a partition of the conference | ||||
| into several sub-conferences which cannot easily be healed. | ||||
| 2.2 Users Joining | 2.2 Users Joining | |||
| In this model, there is not any explicit conference "identifier" that | In this model, there is not any explicit conference "identifier" that | |||
| can be used to join. This conference model, by its nature, is built | can be used to join. This conference model, by its nature, is built | |||
| around ad-hoc conferences. However, it is still possible for a user | around ad-hoc conferences. However, it is still possible for a user | |||
| to join in the following way. | to join in the following way. | |||
| Lets say a new user, E, simply calls B, unaware even, that B is in a | Lets say a new user, E, simply calls B, unaware even, that B is in a | |||
| conference (E might actually be aware, but the SIP messaging is no | conference (E might actually be aware, but the SIP messaging is no | |||
| different). B's softphone, recognizing that B is already in a | different). B's softphone, recognizing that B is already in a | |||
| skipping to change at page 4, line 43 ¶ | skipping to change at page 5, line 22 ¶ | |||
| later. No SIP signaling at all is needed to do this. B simply starts | later. No SIP signaling at all is needed to do this. B simply starts | |||
| sending the mixed media to E. | sending the mixed media to E. | |||
| 2.3 Scalability | 2.3 Scalability | |||
| A drawback of this model is its scalability. Viewing the conference | A drawback of this model is its scalability. Viewing the conference | |||
| from a graph perspective, if the number of edges touching a vertex | from a graph perspective, if the number of edges touching a vertex | |||
| (its degree) equals N, the user corresponding to that vertex has to | (its degree) equals N, the user corresponding to that vertex has to | |||
| perform up to N separate media stream encodings. We say "up to", as | perform up to N separate media stream encodings. We say "up to", as | |||
| it depends on the number of paricipants who are talking at once. If | it depends on the number of paricipants who are talking at once. If | |||
| only one pariticpant is talking, the non-talking "mixer" endpoints | only one participant is talking, the non-talking "mixer" endpoints | |||
| don't need to do any additional encoding. If everyone is talking, it | don't need to do any additional encoding. If everyone is talking, it | |||
| is N encodes. Since encoding is generally a complex process, a | is N encodes. Since encoding is generally a complex process, a | |||
| typical workstation these days can handle two or three simultaneous | typical workstation these days can handle two or three simultaneous | |||
| encodes using a low rate codec like G.723.1. The problem can be | encodes using a low rate codec like G.723.1. The problem can be | |||
| mitigated somewhat by distributing the mixing responsibilities | mitigated somewhat by distributing the mixing responsibilities | |||
| (making the graph deep rather than wide). However, this requires a | (making the graph deep rather than wide). However, this requires a | |||
| conscious effort of the participants regarding who is to make the | conscious effort of the participants regarding who is to make the | |||
| call to add a new user. This is unlikely to happen in practice. | call to add a new user. This is unlikely to happen in practice. | |||
| Another limitation to scalability is bandwidth. If the degree of a | Another limitation to scalability is bandwidth. If the degree of a | |||
| skipping to change at page 6, line 6 ¶ | skipping to change at page 6, line 31 ¶ | |||
| Large-scale multicast conferences are usually pre-arranged, with | Large-scale multicast conferences are usually pre-arranged, with | |||
| specific start and stop times (which is why this information exists | specific start and stop times (which is why this information exists | |||
| in SDP). Protocols such as the Session Announcement Protocol (SAP) | in SDP). Protocols such as the Session Announcement Protocol (SAP) | |||
| [4] are used to announce these conferences. However, multicast | [4] are used to announce these conferences. However, multicast | |||
| conferences do not need to be pre-arranged, so long as a mechanism | conferences do not need to be pre-arranged, so long as a mechanism | |||
| exists to dynamically obtain a multicast address. SAP itself was | exists to dynamically obtain a multicast address. SAP itself was | |||
| originally used for this purpose; this has been supplanted by the | originally used for this purpose; this has been supplanted by the | |||
| malloc architecture [5], still under development. | malloc architecture [5], still under development. | |||
| So, if there are N participants, there will be point to point SIP | So, if there are N participants, there will be point-to-point SIP | |||
| relationships with pairs of participants. Each participant sends a | relationships with pairs of participants. Each participant sends a | |||
| single media stream to the group, and receives up to N-1 streams at | single media stream to the group, and receives up to N-1 streams at | |||
| any time. Note that the number of streams that a user will receive | any time. Note that the number of streams that a user will receive | |||
| depends on who is actually sending at any given time. If the stream | depends on who is actually sending at any given time. If the stream | |||
| is audio, and silence suppression is utilized, the number of streams | is audio, and silence suppression is utilized, the number of streams | |||
| a user will receive at any given time is equal to the number of users | a user will receive at any given time is equal to the number of users | |||
| talking at any given time. Even for very large conferences, this is | talking at any given time. Even for very large conferences, this is | |||
| usually just a small number of users. | usually just a small number of users. | |||
| 3.1 Inviting Users to Join | 3.1 Inviting Users to Join | |||
| skipping to change at page 6, line 34 ¶ | skipping to change at page 7, line 12 ¶ | |||
| same and all parties use the same port numbers to receive | same and all parties use the same port numbers to receive | |||
| media data. If the session description provided by the | media data. If the session description provided by the | |||
| caller is acceptable to the callee, the callee can choose | caller is acceptable to the callee, the callee can choose | |||
| not to include a session description or MAY echo the | not to include a session description or MAY echo the | |||
| description in the response. | description in the response. | |||
| The called party then joins the multicast groups indicated in the | The called party then joins the multicast groups indicated in the | |||
| SDP, using multicast protocols such as IGMP [6]. Note that it is not | SDP, using multicast protocols such as IGMP [6]. Note that it is not | |||
| even necessary for users to send each other BYE messages when the | even necessary for users to send each other BYE messages when the | |||
| conference is over, especially for large-scale, pre-arranged | conference is over, especially for large-scale, pre-arranged | |||
| conferences that have explicit end times indicated in SDP. SDP aside, | conferences that have explicit end times indicated in SDP. | |||
| a participant can simply leave the conference at any time by leaving | ||||
| the multicast groups. No SIP signaling is needed to accomplish this. | OPEN ISSUE: Do we need to specify a SIP mechanism for | |||
| indicating that no BYE is needed? | ||||
| SDP aside, a participant can simply leave the conference at any time | ||||
| by leaving the multicast groups. No SIP signaling is needed to | ||||
| accomplish this. | ||||
| 3.2 Users Joining | 3.2 Users Joining | |||
| Users can join a conference of this type without being invited. All | Users can join a conference of this type without being invited. All | |||
| they need is the multicast addresses, ports, and codecs being used. | they need is the multicast addresses, ports, and codecs being used. | |||
| These can be obtained through any number of means, including SAP. SDP | These can be obtained through any number of means, including SAP. SDP | |||
| conference descriptions can even be obtained from web pages, for | conference descriptions can even be obtained from web pages, for | |||
| example. | example. | |||
| Once the addresses are obtained, the user simply joins the | Once the addresses are obtained, the user simply joins the | |||
| skipping to change at page 8, line 11 ¶ | skipping to change at page 8, line 44 ¶ | |||
| Dial-In conference servers closely mirror dial-in conference bridges | Dial-In conference servers closely mirror dial-in conference bridges | |||
| in the traditional PSTN. | in the traditional PSTN. | |||
| A dial-in conference server acts as a normal SIP UA. Users call it, | A dial-in conference server acts as a normal SIP UA. Users call it, | |||
| and the server maintains point to point SIP relationships with each | and the server maintains point to point SIP relationships with each | |||
| user that calls in. The server takes the media from the users who | user that calls in. The server takes the media from the users who | |||
| dial into the same conference, mixes them, and sends out the | dial into the same conference, mixes them, and sends out the | |||
| appropriate mixed stream to each participant separately. | appropriate mixed stream to each participant separately. | |||
| The model is depicted in Figure 3. Note that each UA (A,B,C,D) has a | ||||
| point to point SIP and RTP relationship with the conference server. | ||||
| Each call has a different Call-ID. Each user sends their own media to | ||||
| the server. The media delivered to user A by the server is the media | ||||
| mixed from users B,C and D. The media delivered to user B by the | ||||
| server is the media mixed from users A, C and D. The media delivered | ||||
| to user C by the server is the media mixed from users A, B and D. The | ||||
| media delivered to user D is the media mixed from users A, B and C | ||||
| +-----+ | +-----+ | |||
| | | | | | | |||
| | A | | | A | | |||
| | | | | | | |||
| +-----+ | +-----+ | |||
| | . | | . | |||
| | . | | . | |||
| | . | | . | |||
| | . | | . | |||
| | . | | . | |||
| skipping to change at page 8, line 38 ¶ | skipping to change at page 9, line 31 ¶ | |||
| | . | | . | |||
| | . | | . | |||
| | . | | . | |||
| | . | | . | |||
| +-----+ | +-----+ | |||
| | | | | | | |||
| | C | | | C | | |||
| | | | | | | |||
| +-----+ | +-----+ | |||
| Figure 2: Dial-In Conference Servers | Figure 3: Dial-In Conference Servers | |||
| The model is depicted in Figure 2. Note that each UA (A,B,C,D) has a | (this is also known as a mix-minus configuration). | |||
| point to point SIP and RTP relationship with the conference server. | ||||
| Each call has a different Call-ID. Each user sends their own media to | ||||
| the server. The media delivered to user A by the server is the media | ||||
| mixed from users B,C and D. The media delivered to user B by the | ||||
| server is the media mixed from users A, C and D. The media delivered | ||||
| to user C by the server is the media mixed from users A, B and D. The | ||||
| media delivered to user D is the media mixed from users A, B and C. | ||||
| The conference is identified by the request URI of the calls from | The conference is identified by the request URI of the calls from | |||
| each participant. This provides numerous advantages from a services | each participant. This provides numerous advantages from a services | |||
| and routing point of view [9]. For example, one conference on the | and routing point of view [9]. For example, one conference on the | |||
| server might be known as sip:conference34@servers.com. All users who | server might be known as sip:conference34@servers.com. All users who | |||
| call sip:conference34@servers.com are mixed together. | call sip:conference34@servers.com are mixed together. | |||
| Dial-In conference servers are usually associated with pre-arranged | Dial-In conference servers are usually associated with pre-arranged | |||
| conferences. However, the same model applies to ad-hoc conferences. | conferences. However, the same model applies to ad-hoc conferences. | |||
| An ad-hoc conference server creates the conference state when the | An ad-hoc conference server creates the conference state when the | |||
| skipping to change at page 10, line 13 ¶ | skipping to change at page 10, line 42 ¶ | |||
| server: | server: | |||
| INVITE sip:conference34@servers.com | INVITE sip:conference34@servers.com | |||
| From: sip:B@example.com | From: sip:B@example.com | |||
| To: sip:conference34@servers.com | To: sip:conference34@servers.com | |||
| Referred-By: sip:A@example.com | Referred-By: sip:A@example.com | |||
| Since the request URI identifies the conference, this will cause B to | Since the request URI identifies the conference, this will cause B to | |||
| get added to conference 34. | get added to conference 34. | |||
| An additional mechanism for inviting a user to join is to send REFER | ||||
| from A to the conference server, with a Refer-To containing the | ||||
| address of B. This REFER would look like: | ||||
| REFER sip:conference34@servers.com SIP/2.0 | ||||
| From: sip:A@example.com | ||||
| To: sip:B@example.com | ||||
| Refer-To: sip:B@example.com | ||||
| This approach has the advantage that it doesn't require REFER support | ||||
| from B, only from the conference server. | ||||
| OPEN ISSUE: A problem with the mechanisms for adding a user | ||||
| is that they assume that the UA for user A (the one who | ||||
| adds another user to the conference) knows that it is | ||||
| indeed talking to a conference server. If the mechanisms in | ||||
| this section were applied to a UA which was not a | ||||
| conference server, the result would be the creation of | ||||
| additional call legs, but not a conference. This means that | ||||
| we require some mechanism for identifying that a URL is a | ||||
| conference URL. | ||||
| 4.2 Users Joining | 4.2 Users Joining | |||
| Users joining is easily done. The participant that wishes to join | It is easy for users to join the conference. The participant that | |||
| simply sends an INVITE to the conference server, with the conference | wishes to join simply sends an INVITE to the conference server, with | |||
| ID in the request URI. The conference ID (which is a SIP URL), can be | the conference ID in the request URI. The conference ID (which is a | |||
| learned by any number of means, including having it on a web page, | SIP URL), can be learned by any number of means, including having it | |||
| receiving it in an email, etc. | on a web page, receiving it in an email, etc. | |||
| For example, if B wishes to join sip:conference34@servers.com, B | For example, if B wishes to join sip:conference34@servers.com, B | |||
| would send the following request: | would send the following request: | |||
| INVITE sip:conference34@servers.com | INVITE sip:conference34@servers.com | |||
| From: sip:B@example.com | From: sip:B@example.com | |||
| To: sip:conference34@servers.com | To: sip:conference34@servers.com | |||
| 4.3 Scalability | 4.3 Scalability | |||
| The scalability of this model is limited by the bandwidth and | The scalability of this model is limited by the bandwidth and | |||
| processing power of the conference server. If there are N | processing power of the conference server. If there are N | |||
| participants in a conference, M of which are sending media streams, | participants in a conference, M of which are sending media streams, | |||
| the server will need to manage N signaling relationships, perform N | the server will need to manage N signaling relationships, perform M | |||
| RTP stream decodes, and N RTP stream encodes (assuming M > 0). The | RTP stream decodes, and N RTP stream encodes (assuming M > 0). The | |||
| encoding is the primary processing bottleneck, and the sending of the | encoding is the primary processing bottleneck, and the sending of the | |||
| N media streams is the primary bandwidth bottleneck. However, | N media streams is the primary bandwidth bottleneck. However, | |||
| conference servers can be built using heavy duty hardware, and have | conference servers can be built using heavy duty hardware, and have | |||
| high bandwith access. | high bandwith access. | |||
| Furthermore, since we are using the request URI to name the | Furthermore, since we are using the request URI to name the | |||
| conferences, we can use standard SIP techniques for distributing | conferences, we can use standard SIP techniques for distributing | |||
| conferences across servers [9]. | conferences across servers [9]. | |||
| skipping to change at page 11, line 19 ¶ | skipping to change at page 12, line 30 ¶ | |||
| 4.5 Discovering Participant Identities | 4.5 Discovering Participant Identities | |||
| The identities of other participants in the conference are NOT known | The identities of other participants in the conference are NOT known | |||
| through SIP. Rather, it is learned through RTP. THe conference server | through SIP. Rather, it is learned through RTP. THe conference server | |||
| is an RTP mixer. As such, it takes the RTCP SDES of the streams it | is an RTP mixer. As such, it takes the RTCP SDES of the streams it | |||
| mixes, and aggregrates them into the RTCP stream sent out. This will | mixes, and aggregrates them into the RTCP stream sent out. This will | |||
| allow participants to gradually (over a few seconds), learn the | allow participants to gradually (over a few seconds), learn the | |||
| identities of the other participants. | identities of the other participants. | |||
| As an implementation choice, the conference server can generate the | ||||
| RTCP SDES of its participants, rather than using those provided by | ||||
| the participants. The reason for this is authenticity. A conference | ||||
| server can use SIP authentication mechanisms to identify the | ||||
| participants in the conference. This may allow it to validate the | ||||
| RTCP SDES provided by the participants. A conference server could | ||||
| remove any false information, and regenerate the SDES using the | ||||
| correct user identity as validated through SIP. | ||||
| 5 Ad-hoc Centralized Conferences | 5 Ad-hoc Centralized Conferences | |||
| In an ad-hoc centralized conference, two users A and B start with a | In an ad-hoc centralized conference, two users A and B start with a | |||
| normal SIP call. At some point later, they decide to add a third | normal SIP call. At some point later, they decide to add a third | |||
| party. Instead of using end system mixing, they would prefer to use a | party. Instead of using end system mixing, they would prefer to use a | |||
| conference server, as defined in Section 4. | conference server, as defined in Section 4. | |||
| This model corresponds roughly to the centralized multipoint | The call flow for starting this kind of conference is shown in Figure | |||
| conference model of H.323. | 4. Initially, A calls B (1-3). At some point, B decides to add a | |||
| user, C, to the call, and begins the transition to a conference | ||||
| One of the participants takes responsibility for transitioning to a | server. The first step in this process is the discovery of a | |||
| conference server. The first step in this process is the discovery of | conference server that supports ad-hoc conferences. This can be done | |||
| a conference server that supports ad-hoc conferences. This can be | through static configuration, or through any of a number of standard | |||
| done through static configuration, or through any of a number of | service discovery protocols, such as the Service Location Protocol | |||
| standard service discovery protocols, such as the Service Location | [12]. | |||
| Protocol [12]. | ||||
| Once the server is discovered, a conference ID is chosen. This ID | Once the server is discovered, a conference ID is chosen. This ID | |||
| must be globally unique. The conference ID is then prepended to the | must be globally unique. The conference ID is then prepended to the | |||
| server, and a SIP URL for the ad-hoc conference is formed. For | server, and a SIP URL for the ad-hoc conference is formed. For | |||
| example, if the server "a.servers.com" is used, and the unique ID is | example, if the server "a.servers.com" is used, and the unique ID is | |||
| "a7hytaskp09878a", the SIP URL for this conference is | "a7hytaskp09878a", the SIP URL for this conference is | |||
| sip:a7hytaskp09878a@a.servers.com. | sip:a7hytaskp09878a@a.servers.com. | |||
| The user who is performing the transition (say, user A) then sends an | B then sends an INVITE to this URL (4). This creates the initial | |||
| INVITE to this URL. This creates the initial conference state in the | conference state in the server. The conference server accepts the | |||
| server. A then sends a REFER to the other party in the call (say B), | call (5) and B sends an ACK (6). B then sends a REFER to A (7), | |||
| referring them to sip:a7hytaskp09878a@a.servers.com. B sends an | referring them to sip:a7hytaskp09878a@a.servers.com. A accepts the | |||
| INVITE to this address, and is added to the conference. Once the 200 | referral (8) and this triggers an INVITE to this address (9). This | |||
| OK response to the REFER is sent from B to A, A hangs up to B. A and | causes A to be added to the conference. The conference server accepts | |||
| B are now in a conference using a conference server. From here, | the INVITE (10), and an ACK is generated (11). Once the NOTIFY | |||
| operation is identical to the system described in Section 4. | request (indicating successful completion of the referred call) is | |||
| sent from A to B (12), A responds with a 200 OK. Since B is now | ||||
| assured that A is connected through the conference server, B hangs up | ||||
| to A with a BYE (14). | ||||
| OPEN ISSUE: Its not clear that this is the best flow. An | ||||
| alternative flow is for B to REFER the conference server to | ||||
| A, using a call replacement mechanism. This is probably | ||||
| more correct, since this is not so much a transfer as a | ||||
| call leg replacement. | ||||
| Finally, B can add C to the call. This is identical to the procedures | ||||
| described in Section 4 for adding userst to the conference. First, B | ||||
| generates a REFER (16) to C. The Refer-To header contains the | ||||
| conference URL, sip:a7hytaskp09878a@a.servers.com. C responds to the | ||||
| referral with a 200 OK (17). C then INVITEs itself to the conference | ||||
| (18-20). C then generates a NOTIFY informing B that the REFER has | ||||
| completed (21). | ||||
| It is also possible to transition from a end system mixed conference | It is also possible to transition from a end system mixed conference | |||
| (even one with a complex connection topology), to a centralized | (even one with a complex connection topology), to a centralized | |||
| conference server. One user takes responsibility for initiating the | conference server. Consider a end-system mixed conference with the | |||
| transition. It proceeds as described above. However, the REFER | topology of Figure 2. User A wishes to transition to a centralized | |||
| request is sent to all SIP peers adjacent to the user. In addition, | conference server in order to add another participant. The transition | |||
| when a SIP UA receives a REFER, they must not only act on it as | is shown in Figures 5 and 6. | |||
| described above, but also generate a REFER to any of their adjacent | ||||
| SIP peers. In essence, the REFER message is propagated along the | First, user A discovers a conference server, and creates a new | |||
| connection graph, starting at the root (which is the user who | A B Conference C | |||
| initiates the transition). The transition will work so long as the | Server | |||
| graph has no cycles (which is needed anyway, as discussed above), and | |(1) INVITE | | | | |||
| so long as only one user attempts to initiate the transition. If | |-------------->| | | | |||
| multiple users attempt to initiate the transition at the same time, | |(2) 200 OK | | | | |||
| the conference will break into two disjoint ad-hoc conferences, with | |<--------------| | | | |||
| membership depending on the temporal dynamics of the REFER | |(3) ACK | | | | |||
| propagation. | |-------------->| | | | |||
| | |(4) INVITE | | | ||||
| | |-------------->| | | ||||
| | |(5) 200 OK | | | ||||
| | |<--------------| | | ||||
| | |(6) ACK | | | ||||
| |(7) REFER |-------------->| | | ||||
| |<--------------| | | | ||||
| |(8) 200 OK | | | | ||||
| |-------------->| | | | ||||
| |(9) INVITE | | | | ||||
| |------------------------------>| | | ||||
| |(10) 200 OK | | | | ||||
| |<------------------------------| | | ||||
| |(11) ACK | | | | ||||
| |------------------------------>| | | ||||
| |(12) NOTIFY | | | | ||||
| |-------------->| | | | ||||
| |(13) 200 OK | | | | ||||
| |<--------------| | | | ||||
| |(14) BYE | | | | ||||
| |<--------------| | | | ||||
| |(15) 200 OK | | | | ||||
| |-------------->|(16) REFER | | | ||||
| | |------------------------------>| | ||||
| | |(17) 200 OK | | | ||||
| | |<------------------------------| | ||||
| | | |(18) INVITE | | ||||
| | | |<--------------| | ||||
| | | |(19) 200 OK | | ||||
| | | |-------------->| | ||||
| | | |(20) ACK | | ||||
| | | |<--------------| | ||||
| | |(21) NOTIFY | | | ||||
| | |<------------------------------| | ||||
| | |(22) 200 OK | | | ||||
| | |------------------------------>| | ||||
| | | | | | ||||
| Figure 4: Transitioning to ad-hoc | ||||
| |(1) INVITE | | | | | ||||
| |---------------------------------------------------------->| | ||||
| |(2) 200 OK | | | | | ||||
| |<----------------------------------------------------------| | ||||
| |(3) ACK | | | | | ||||
| |---------------------------------------------------------->| | ||||
| |(4) REFER | | | | | ||||
| |------------->| | | | | ||||
| |(5) 200 OK | | | | | ||||
| |<-------------| | | | | ||||
| |(6) REFER | | | | | ||||
| |---------------------------->| | | | ||||
| |(7) 200 OK | | | | | ||||
| |<----------------------------| | | | ||||
| | |(8) INVITE | | | | ||||
| | |------------------------------------------->| | ||||
| | |(9) 200 OK | | | | ||||
| | |<-------------------------------------------| | ||||
| | |(10) ACK | | | | ||||
| | |------------------------------------------->| | ||||
| | | |(11) INVITE | | | ||||
| | | |---------------------------->| | ||||
| | | |(12) 200 OK | | | ||||
| | | |<----------------------------| | ||||
| | | |(13) ACK | | | ||||
| | | |---------------------------->| | ||||
| | | |(14) REFER | | | ||||
| | | |------------->| | | ||||
| | | |(15) 200 OK | | | ||||
| | | |<-------------|(16) INVITE | | ||||
| | | | |------------->| | ||||
| | | | |(17) 200 OK | | ||||
| | | | |<-------------| | ||||
| | | | |(18) ACK | | ||||
| | | | |------------->| | ||||
| | | | | | | ||||
| | | | | | | ||||
| A B C D Conf. | ||||
| Server | ||||
| Figure 5: Adhoc transition from end-system mixed: part I | ||||
| conference by sending an INVITE to it (1-3). A then REFERs the two | ||||
| end systems it is connected to (B and C), to the server (4-5 and 6-7 | ||||
| respectively). This causes B to INVITE itself to the conference | ||||
| server (8-10), and C to do the same (11-13). Since C had gotten a | ||||
| REFER from B, it "passes it on" to D by sending a REFER to it (14- | ||||
| 15). This causes D to join the conference server by sending it an | ||||
| INVITE (16-18). | ||||
| Once the REFER triggered INVITEs complete, notifications start to get | ||||
| sent. Since B completed first, it will be the first to send a NOTIFY | ||||
| to A (19) followed by C (21). At this point, A can terminate its legs | ||||
| to B and C (23-24 and 25-26 respectively). Since D completed its | ||||
| REFER triggered INVITE next, it generates a NOTIFY to C (27). This | ||||
| causes C to terminate its leg with D (29). The call has now | ||||
| transitioned to a centralized server. | ||||
| OPEN ISSUE: There is no way for A to know that the entire | ||||
| conference has transitioned. Also, as above, its not clear | ||||
| that a REFER from the conference server wouldn't be better. | ||||
| Once the conference has been formed, further operation is identical | ||||
| to the dial-in conferencing model of Section 4. The only difference | ||||
| in the conferences is that the conference identifier is dynamic in | ||||
| this case, and static in Section 4. This makes users asynchronously | ||||
| joining nearly impossible. | ||||
| 5.1 Inviting Users to Join | 5.1 Inviting Users to Join | |||
| Once the ad-hoc conference has been created on the server, inviting | Once the ad-hoc conference has been created on the server, inviting | |||
| users proceeds as defined in Section 4.1. | users proceeds as defined in Section 4.1. | |||
| 5.2 Users Joining | 5.2 Users Joining | |||
| Once the ad-hoc conference has been created on the server, joining | Once the ad-hoc conference has been created on the server, joining | |||
| proceeds as defined in Section 4.2. | proceeds as defined in Section 4.2. | |||
| skipping to change at page 12, line 44 ¶ | skipping to change at page 17, line 5 ¶ | |||
| The scalability of this conference model is identical to that of | The scalability of this conference model is identical to that of | |||
| dial-in conference servers, as described in Section 4.3. | dial-in conference servers, as described in Section 4.3. | |||
| 5.4 Location of Service Logic | 5.4 Location of Service Logic | |||
| The logic for handling the transition process must be located in at | The logic for handling the transition process must be located in at | |||
| least one UA in the conference. All UAs that are mixers in a end | least one UA in the conference. All UAs that are mixers in a end | |||
| system mixed conference must know to propagate the REFER requests | system mixed conference must know to propagate the REFER requests | |||
| they receive during the transition. | they receive during the transition. | |||
| |(19) NOTIFY | | | | | ||||
| |<-------------| | | | | ||||
| |(20) 200 OK | | | | | ||||
| |------------->| | | | | ||||
| |(21) NOTIFY | | | | | ||||
| |<----------------------------| | | | ||||
| |(22) 200 OK | | | | | ||||
| |---------------------------->| | | | ||||
| |(23) BYE | | | | | ||||
| |------------->| | | | | ||||
| |(24) 200 OK | | | | | ||||
| |<-------------| | | | | ||||
| |(25) BYE | | | | | ||||
| |---------------------------->| | | | ||||
| |(26) 200 OK | | | | | ||||
| |<----------------------------|(27) NOTIFY | | | ||||
| | | |<-------------| | | ||||
| | | |(28) 200 OK | | | ||||
| | | |------------->| | | ||||
| | | |(29) BYE | | | ||||
| | | |------------->| | | ||||
| | | |(30) 200 OK | | | ||||
| | | |<-------------| | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| | | | | | | ||||
| A B C D Conf. | ||||
| Server | ||||
| Figure 6: Adhoc transition from end-system mixed: part II | ||||
| 5.5 Discovering Participant Identities | 5.5 Discovering Participant Identities | |||
| Once the ad-hoc conference is established, conference identities are | Once the ad-hoc conference is established, conference identities are | |||
| determined through RTCP, as in the dial-in case. | ||||
| 6 Dial-Out Conferences | 6 Dial-Out Conferences | |||
| Dial-out conferences are a simple variation on dial-in conferences. | Dial-out conferences are a simple variation on dial-in conferences. | |||
| Instead of the users joining the conference by sending an INVITE to | Instead of the users joining the conference by sending an INVITE to | |||
| the server, the server chooses the users who are to be members of the | the server, the server chooses the users who are to be members of the | |||
| conference, and then sends them the INVITE. Typically dial out | conference, and then sends them the INVITE. Typically dial out | |||
| conferences are pre-arranged, with specific start times and an | conferences are pre-arranged, with specific start times and an | |||
| initial group membership list. | initial group membership list. However, there are other means for the | |||
| dial-out server to determine the list of participants, including user | ||||
| presence [13]. The model in no way limits the means by which the | ||||
| server determines the set of users. | ||||
| Once the users accept or reject the call from the dial out server, | Once the users accept or reject the call from the dial out server, | |||
| the behavior of this system is identical to the dial-in server case | the behavior of this system is identical to the dial-in server case | |||
| of Section 4. Thus, a dial-out conference server will generally need | of Section 4. Thus, a dial-out conference server will generally need | |||
| to support dial-in access for the same conference, if it wishes to | to support dial-in access for the same conference, if it wishes to | |||
| allow joining after the conference begins. | allow joining after the conference begins. | |||
| Note that, from the participants perspective, they will learn the | Note that, from the participants perspective, they will learn the | |||
| conference identity (the URL) from the From field in the INVITE | conference identity (the URL) from the From field in the INVITE | |||
| messages received from the server. | messages received from the server. | |||
| OPEN ISSUE: Or is the Contact more appropriate? | ||||
| 6.1 Inviting Users to Join | 6.1 Inviting Users to Join | |||
| Once the conference is established, inviting users to join is | Once the conference is established, inviting users to join is | |||
| identical to the scenario described in Section 4.1. Note that the URL | identical to the scenario described in Section 4.1. Note that the URL | |||
| to be used in the REFER is obtained from the From field of the INVITE | to be used in the REFER is obtained from the From field of the INVITE | |||
| received from the dial-out server. | received from the dial-out server. | |||
| 6.2 Users Joining | 6.2 Users Joining | |||
| Once the conference is established, joining is identical to the | Once the conference is established, joining is identical to the | |||
| skipping to change at page 14, line 14 ¶ | skipping to change at page 19, line 24 ¶ | |||
| 7 Centralized Signaling, Distributed Media | 7 Centralized Signaling, Distributed Media | |||
| In this conferencing model, there is a centralized controller, as in | In this conferencing model, there is a centralized controller, as in | |||
| the dial-in and dial-out cases. However, the centralized server | the dial-in and dial-out cases. However, the centralized server | |||
| handles signaling only. The media is still sent directly between | handles signaling only. The media is still sent directly between | |||
| participants, using either multicast or multi-unicast. Multi-unicast | participants, using either multicast or multi-unicast. Multi-unicast | |||
| is when a user sends multiple packets (one for each recipient, | is when a user sends multiple packets (one for each recipient, | |||
| addressed to that recipient). This is referred to as a "Decentralized | addressed to that recipient). This is referred to as a "Decentralized | |||
| Multipoint Conference" in H.323. Interestingly, this conference model | Multipoint Conference" in H.323. Interestingly, this conference model | |||
| is possible baseline SIP. | is possible with baseline SIP. | |||
| It works through third party call control [13]. The conference server | It works through third party call control [14]. The conference server | |||
| uses re-INVITEs to each participant when a new one joins. The re- | uses re-INVITEs to each participant when a new one joins. The re- | |||
| INVITEs add a media stream that gets sent to the new participant (and | INVITEs add a media stream that gets sent to the new participant (and | |||
| similarly in the reverse direction). | similarly in the reverse direction). | |||
| Let us assume for the moment that a conference already exists with | Let us assume for the moment that a conference already exists with | |||
| three participants. In this state, each participant is sending media | three participants. In this state, each participant is sending media | |||
| directly to each other. This is because the SDP that the conference | directly to each other. This is because the SDP that the conference | |||
| server has given to each participant contains three media lines, each | server has given to each participant contains three media lines, each | |||
| of type audio, with connection addresses and ports corresponding to | of type audio, with connection addresses and ports corresponding to | |||
| each of the three users. | each of the three users. | |||
| The call flow from here is shown in Figure 3. A new participant | The call flow from here is shown in Figure 7. In the figure, the word | |||
| joins the conference. It does so by sending an INVITE (1)to the | after the INV or SIP response code refers to the connection | |||
| server, with the conference ID in the request URI. The SDP in the | adress(es) in the SDP in the message. +X means the addition of a | |||
| INVITE contains a single media stream, with an IP address and port | stream with X as the recipient address. | |||
| where it would like to receive media (D). The 200 response from the | ||||
| conference server (2) contains a single media line with an IP address | A new participant joins the conference. It does so by sending an | |||
| of 0.0.0.0 and a random port, indicating hold. | INVITE (1)to the server, with the conference ID in the request URI. | |||
| The SDP in the INVITE contains a single media stream, with an IP | ||||
| address and port where it would like to receive media (D). The 200 | ||||
| response from the conference server (2) contains a single media line | ||||
| with an IP address of 0.0.0.0 and a random port, indicating hold. | ||||
| The next step is for the server to obtain two more addresses where | The next step is for the server to obtain two more addresses where | |||
| the new participant will be receiving media (it already has one from | the new participant will be receiving media (it already has one from | |||
| the original INVITE). To do this, it sends a re-INVITE to the new | the original INVITE). To do this, it sends a re-INVITE to the new | |||
| participant (4). This reINVITE contains two additional media streams | participant (4). This re-INVITE contains two additional media streams | |||
| (for three total), all three of which are on hold. The 200 response | (for three total), all three of which are on hold. The 200 response | |||
| to the re-INVITE (5) contains two additional IP addresses and ports | to the re-INVITE (5) contains two additional IP addresses and ports | |||
| where the user is willing to receive media. | where the user is willing to receive media. | |||
| Now the server needs to inform the other parties that they should | Now the server needs to inform the other parties that they should | |||
| begin sending media to the new user. It first sends a re-INVITE to | begin sending media to the new user. It first sends a re-INVITE to | |||
| user C (7). This re-INVITE adds an additional media stream to the two | user C (7). This re-INVITE adds an additional media stream to the two | |||
| already that C has been sending. This new media stream uses one of | already that C has been sending. This new media stream uses one of | |||
| the three connection addresses and ports returned by D in message | the three connection addresses and ports returned by D in message | |||
| (5). Call this address/port D1. The other two are D2 and D3. The 200 | (5). Call this address/port D1. The other two are D2 and D3. The 200 | |||
| skipping to change at page 15, line 20 ¶ | skipping to change at page 20, line 35 ¶ | |||
| two already in use by C) using address/port D2. The response (11) | two already in use by C) using address/port D2. The response (11) | |||
| contains a new address/port to send media to B. Call this port B3. In | contains a new address/port to send media to B. Call this port B3. In | |||
| the re-INVITE to A (13), the server adds an additional media line | the re-INVITE to A (13), the server adds an additional media line | |||
| using address/port D3. The response (14) contains a new address/port | using address/port D3. The response (14) contains a new address/port | |||
| to send media to A. Call this port A3. | to send media to A. Call this port A3. | |||
| Finally, the server sends a re-INVITE (15) to the new party. This | Finally, the server sends a re-INVITE (15) to the new party. This | |||
| re-INVITE takes all three streams off hold, and updates their | re-INVITE takes all three streams off hold, and updates their | |||
| connection addresses and ports with C3, B3, and A3, respectively. The | connection addresses and ports with C3, B3, and A3, respectively. The | |||
| 200 OK response (16) returns the same ports and addresses returned in | 200 OK response (16) returns the same ports and addresses returned in | |||
| message (5) (as noted in [13], these addresses/ports MUST NOT | message (5) (as noted in [14], these addresses/ports MUST NOT | |||
| change). Now, D can send media to A,B and C. | change). Now, D can send media to A,B and C. | |||
| The result of these manipulations is, indeed, a full mesh of unicast | The result of these manipulations is, indeed, a full mesh of unicast | |||
| RTP streams between all participants. Unlike the case of end system | RTP streams between all participants. Unlike the case of end system | |||
| mixing, the stream sent by any participant to all of the others is | mixing, the stream sent by any participant to all of the others is | |||
| identical. Each particpant needs to mix, but it mixes the media it | identical. Each particpant needs to mix, but it mixes the media it | |||
| receives, and plays that out the speakers. This is normal behavior | receives, and plays that out the speakers. This is normal behavior | |||
| for multiple streams of the same type. Note that the SIP relationship | for multiple streams of the same type. Note that the SIP relationship | |||
| is still point-to-point. There are four calls at the end of Figure 3, | is still point-to-point. There are four calls at the end of Figure 7, | |||
| one from each participant to the server, each with a different Call- | one from each participant to the server, each with a different Call- | |||
| ID. | ID. | |||
| Note that hybrids are easily possible. Certain users can instead be | Note that hybrids are easily possible. Certain users can instead be | |||
| mixed (sending audio to the conference server), while others are set | mixed (sending audio to the conference server), while others are set | |||
| to send audio to each other. | to send audio to each other. | |||
| 7.1 Inviting Users to Join | ||||
| Inviting users to join works identically to the dial-in conference | ||||
| bridge scenario 4. | ||||
| 7.2 Users Joining | ||||
| A user joins in the same way described in section 4. | ||||
| 7.3 Scalability | ||||
| The scalability of this conferencing model depends on many factors. | ||||
| From a media perspective, the conference server never even touches a | ||||
| single media stream. However, for N participants, each participant | ||||
| needs to be able to receive, decode, and mix N-1 media streams. For | ||||
| | | | |(1) INV D | | | | | |(1) INV D | | |||
| | | | |-------------->| | | | | |-------------->| | |||
| | | | |(2) 200 hold | | | | | |(2) 200 hold | | |||
| | | | |<--------------| | | | | |<--------------| | |||
| | | | |(3) ACK | | | | | |(3) ACK | | |||
| | | | |-------------->| | | | | |-------------->| | |||
| | | | |(4) INV 3held | | | | | |(4) INV 3held | | |||
| | | | |<--------------| | | | | |<--------------| | |||
| | | | |(5) 200 3recv | | | | | |(5) 200 3recv | | |||
| | | | |-------------->| | | | | |-------------->| | |||
| skipping to change at page 16, line 49 ¶ | skipping to change at page 21, line 50 ¶ | |||
| | | | |<--------------| | | | | |<--------------| | |||
| | | | | | | | | | | | | |||
| | | | | | | | | | | | | |||
| | | | | | | | | | | | | |||
| | | | | | | | | | | | | |||
| | | | | | | | | | | | | |||
| | | | | | | | | | | | | |||
| A B C D Server | A B C D Server | |||
| Figure 3: Centralized Signaling, Decentralized Media | Figure 7: Centralized Signaling, Decentralized Media | |||
| 7.1 Inviting Users to Join | ||||
| Inviting users to join works identically to the dial-in conference | ||||
| bridge scenario 4. | ||||
| 7.2 Users Joining | ||||
| A user joins in the same way described in section 4. | ||||
| 7.3 Scalability | ||||
| The scalability of this conferencing model depends on many factors. | ||||
| From a media perspective, the conference server never even touches a | ||||
| single media stream. However, for N participants, each participant | ||||
| needs to be able to receive, decode, and mix N-1 media streams. For | ||||
| users accessing the server through dial-in modems, this will severely | users accessing the server through dial-in modems, this will severely | |||
| limit the sizes of these conferences. However, the processing burden | limit the sizes of these conferences. However, the processing burden | |||
| is much less than that of the end system mixing model. This is | is much less than that of the end system mixing model. This is | |||
| because each end user needs to decode N-1 streams, but only encode 1. | because each end user needs to decode N-1 streams, but only encode 1. | |||
| Decoding is much, much cheaper than encoding, so supporting many | Decoding is much, much cheaper than encoding, so supporting many | |||
| decodes is not necessarily a problem. This is especially the case | decodes is not necessarily a problem. This is especially the case | |||
| when silence suppression is in use. In that case, streams are only | when silence suppression is in use. In that case, streams are only | |||
| sent by talking users. This means any given user only needs to decode | sent by talking users. This means any given user only needs to decode | |||
| (and receive) as many streams at a time as there are users talking. | (and receive) as many streams at a time as there are users talking. | |||
| THis can vastly improve scalability of the conference. | THis can vastly improve scalability of the conference. | |||
| skipping to change at page 17, line 36 ¶ | skipping to change at page 22, line 52 ¶ | |||
| generally faster. | generally faster. | |||
| 7.4 Location of Service Logic | 7.4 Location of Service Logic | |||
| Nearly all of the logic for implementing this conferencing service | Nearly all of the logic for implementing this conferencing service | |||
| lives in the server itself. | lives in the server itself. | |||
| The only requirement from the end users is that they support | The only requirement from the end users is that they support | |||
| multiple, parallel media streams of the same type, and that they be | multiple, parallel media streams of the same type, and that they be | |||
| prepared to mix those streams together. They must also support the | prepared to mix those streams together. They must also support the | |||
| third party control primitives [13], which don't require anything | third party control primitives [14], which don't require anything | |||
| beyond baseline SIP, but are not likely supported unless explicit | beyond baseline SIP, but are not likely supported unless explicit | |||
| actions are taken to do so. | actions are taken to do so. | |||
| It is this combination - no need for media processing in the server, | It is this combination - no need for media processing in the server, | |||
| combined with no need for specialized SIP processing in the end | combined with no need for specialized SIP processing in the end | |||
| systems, that makes this model attractive. | systems, that makes this model attractive. | |||
| 7.5 Discovering Participant Identities | 7.5 Discovering Participant Identities | |||
| Conference identities are discovered through RTCP. Each user will | Conference identities are discovered through RTCP. Each user will | |||
| skipping to change at page 18, line 4 ¶ | skipping to change at page 23, line 18 ¶ | |||
| combined with no need for specialized SIP processing in the end | combined with no need for specialized SIP processing in the end | |||
| systems, that makes this model attractive. | systems, that makes this model attractive. | |||
| 7.5 Discovering Participant Identities | 7.5 Discovering Participant Identities | |||
| Conference identities are discovered through RTCP. Each user will | Conference identities are discovered through RTCP. Each user will | |||
| receive N-1 RTP streams, each of which has its own RTCP channel that | receive N-1 RTP streams, each of which has its own RTCP channel that | |||
| carries the participant identification. | carries the participant identification. | |||
| 8 Summary of Models | 8 Summary of Models | |||
| Table 1 shows a summary of the differences between the various | Table 1 shows a summary of the differences between the various | |||
| models. | models. | |||
| Table 1: Summary of Models | Table 1: Summary of Models | |||
| Name signaling media inviting joining discovering scale | Name signaling media inviting joining discovering scale | |||
| End-Mixing tree tree normal normal RTCP small | End-Mixing tree tree normal normal RTCP small | |||
| invite invite | invite invite | |||
| Multicast pairs m-cast normal multicast RTCP large | Multicast pairs m-cast normal multicast RTCP large | |||
| invite join | invite join | |||
| Dial-Up star star refer normal RTCP medium | Dial-Up star star refer normal RTCP medium | |||
| invite | invite | |||
| Ad-Hoc star star refer normal RTCP medium | Ad-Hoc star star refer normal RTCP medium | |||
| invite | invite | |||
| Dial-Out star star refer normal RTCP medium | Dial-Out star star refer normal RTCP medium | |||
| invite | invite | |||
| Decentral star fullmesh refer + normal RTCP medium | Decentral star fullmesh refer + normal RTCP medium | |||
| server invite and | server invite and | |||
| messaging server msg. | messaging server msg. | |||
| 9 Whats Missing - Full Mesh | 9 Security Considerations | |||
| The sections above cover a wide range of conferencing models, but not | ||||
| all of them. One model, in particular, is not supported by SIP. That | ||||
| model is the fully distributed multiparty model. | ||||
| In this conferencing model, each user has a point to point SIP | ||||
| relationship with every other user. Each user also has a point to | ||||
| point RTP relationship with every other user, as is done in the | ||||
| decentralized conference of Section 7. | ||||
| Two earlier drafts were written on the subject, but they specified | ||||
| protocols that were overly complex and still had race conditions and | ||||
| unhandled cases. The primary difficulty is that it requires every | ||||
| participant to learn the identity of every other participant. As | ||||
| participants come and go, this requires some kind of state flooding | ||||
| mechanism that causes this information to propagate, and eventually | ||||
| converge, across participants. While these kinds of distribution | ||||
| mechanisms have been done for multiparty conferences [14] Fitting | ||||
| such a distribution mechanism into SIP is not trivial, especially | ||||
| with the complex requirements that were initially targeted. | ||||
| Furthermore, the distributed nature of the signaling makes | ||||
| enforcement of any kind of conference policy pretty much impossible. | ||||
| Failures can also result in unusual conditions. Specifically, it is | ||||
| fairly easy for the conference mesh to break in certain places, | ||||
| resulting in a graph where every user hears most of the other users, | ||||
| but not all. This can happen, for example, if user A is invited into | ||||
| a conference, but is rejected by one of the users already into the | ||||
| conference (because the SIP relationships are point-to-point, a new | ||||
| user needs to establish a SIP call with all existing participants), | ||||
| this situation can occur. With large conferences, this becomes a very | ||||
| real possibility. Earlier work tried to avoid such conditions. | ||||
| We believe a solution can be found by simplifying the requirements. | ||||
| For example, we will abandon the requirement to only add a user to | ||||
| the conference if all other users agree to add them. We will also try | ||||
| to achieve gradual convergence in shared state, rather than the rapid | ||||
| convergence proposed in previous work. We will not worry about | ||||
| message efficiency or message frequency. The primary design objective | ||||
| should be KISS. | ||||
| As a baseline model, we believe that each INVITE, 200 OK response, | ||||
| and ACK simply contain a header called Members. This header is a list | ||||
| of URLs, and for each URL, there is a parameter that indicates | ||||
| whether they are in the conference right now, and when they joined, | ||||
| or whether they were previously in the conference, and when they | ||||
| left. A UA simply performs a re-INVITE as it receives new | ||||
| information. A periodic re-INVITE (ala session timer [15] will also | ||||
| be needed to heal partitions and deal with other conditions that may | ||||
| arise). | ||||
| More work is needed to validate the model and to see what other | ||||
| capabilities are needed. | ||||
| 10 Security Considerations | ||||
| The use of a server that performs the mixing on behalf of other | The use of a server that performs the mixing on behalf of other | |||
| users, which is the case for all but one of the conference models | users, which is the case for all but one of the conference models | |||
| described here, introduces security risks. That entity must be | described here, introduces security risks. That entity must be | |||
| trusted by the others to properly mix the media - not omitting a | trusted by the others to properly mix the media - not omitting a | |||
| stream, for example. As such, it is recommended that participants in | stream, for example. As such, it is recommended that participants in | |||
| a conference authenticate the identity of the server. In the dial-in, | a conference authenticate the identity of the server. In the dial-in, | |||
| dial-out, and decentralized conferences, this will require | dial-out, and decentralized conferences, this will require | |||
| authentication of responses by participants. | authentication of responses by participants. | |||
| Mixing also eliminates the privacy possible with end-to-end media | Mixing also eliminates the privacy possible with end-to-end media | |||
| transport with mixing in the receivers. Such privacy is still | transport with mixing in the receivers. Such privacy is still | |||
| possible in the large-scale multicast conferences, but requires | possible in the large-scale multicast conferences, but requires | |||
| shared keying material for the conference. Doing this for highly | shared keying material for the conference. Doing this for highly | |||
| dynamic groups is still an open research problem. | dynamic groups is still an open research problem. | |||
| 11 Conclusion | 10 Conclusion | |||
| In this draft, we have shown how to use baseline SIP (assuming | In this draft, we have shown how to use baseline SIP (assuming | |||
| endpoints that support the mixing and/or third party call control | endpoints that support the mixing and/or third party call control | |||
| feature sets) to construct several multiparty conferencing models. | feature sets) to construct several multiparty conferencing models. | |||
| These include end system mixing, large-scale multicast conferences, | These include end system mixing, large-scale multicast conferences, | |||
| dial-in conference servers, dial-out conferences, ad-hoc centralized | dial-in conference servers, dial-out conferences, ad-hoc centralized | |||
| conferences, and centralized signaling, distributed media | conferences, and centralized signaling, distributed media | |||
| conferences. | conferences. | |||
| We note that this covers all of the multipoint conferencing models | 11 Acknowledgements | |||
| described in H.323v1 [16]. Further work is needed to see how (and if) | ||||
| to support the hierarchical conference bridges defined in H.323v2 | ||||
| [17]. | ||||
| 12 Authors Addresses | We would like to thank Mary Barnes for her comments and input. | |||
| 12 Changes since -00 | ||||
| o Added call flow examples. | ||||
| o Added open issues within text. | ||||
| o Added additional call flow for adding users to conference, by | ||||
| sending REFER to conference server with Refer-To of new | ||||
| participant. | ||||
| o Discussed conference servers generating RTCP based on | ||||
| authenticated SIP identities. | ||||
| 13 Authors Addresses | ||||
| Jonathan Rosenberg | Jonathan Rosenberg | |||
| dynamicsoft | dynamicsoft | |||
| 200 Executive Drive | 72 Eagle Rock Avenue | |||
| Suite 120 | First Floor | |||
| West Orange, NJ 07052 | East Hanover, NJ 07936 | |||
| email: jdrosen@dynamicsoft.com | email: jdrosen@dynamicsoft.com | |||
| Henning Schulzrinne | Henning Schulzrinne | |||
| Columbia University | Columbia University | |||
| M/S 0401 | M/S 0401 | |||
| 1214 Amsterdam Ave. | 1214 Amsterdam Ave. | |||
| New York, NY 10027-7003 | New York, NY 10027-7003 | |||
| email: schulzrinne@cs.columbia.edu | email: schulzrinne@cs.columbia.edu | |||
| 13 Bibliography | 14 Bibliography | |||
| [1] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: | [1] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: | |||
| session initiation protocol," Request for Comments 2543, Internet | session initiation protocol," Request for Comments 2543, Internet | |||
| Engineering Task Force, Mar. 1999. | Engineering Task Force, Mar. 1999. | |||
| [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a | [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a | |||
| transport protocol for real-time applications," Request for Comments | transport protocol for real-time applications," Request for Comments | |||
| 1889, Internet Engineering Task Force, Jan. 1996. | 1889, Internet Engineering Task Force, Jan. 1996. | |||
| [3] M. Handley and V. Jacobson, "SDP: session description protocol," | [3] M. Handley and V. Jacobson, "SDP: session description protocol," | |||
| skipping to change at page 21, line 25 ¶ | skipping to change at page 25, line 42 ¶ | |||
| [7] D. Waitzman, C. Partridge, and S. E. Deering, "Distance vector | [7] D. Waitzman, C. Partridge, and S. E. Deering, "Distance vector | |||
| multicast routing protocol," Request for Comments 1075, Internet | multicast routing protocol," Request for Comments 1075, Internet | |||
| Engineering Task Force, Nov. 1988. | Engineering Task Force, Nov. 1988. | |||
| [8] J. Rosenberg and H. Schulzrinne, "Timer reconsideration for | [8] J. Rosenberg and H. Schulzrinne, "Timer reconsideration for | |||
| enhanced RTP scalability," in Proceedings of the Conference on | enhanced RTP scalability," in Proceedings of the Conference on | |||
| Computer Communications (IEEE Infocom) , (San Francisco, California), | Computer Communications (IEEE Infocom) , (San Francisco, California), | |||
| March/April 1998. | March/April 1998. | |||
| [9] J. Rosenberg, P. Mataga, and H. Schulzrinne, "An application | [9] J. Rosenberg, P. Mataga, and H. Schulzrinne, "An application | |||
| server component architecture for sip," Internet Draft, Internet | server component architecture for SIP," Internet Draft, Internet | |||
| Engineering Task Force, Nov. 2000. Work in progress. | Engineering Task Force, Mar. 2001. Work in progress. | |||
| [10] J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, | [10] J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, | |||
| A. Luotonen, and L. Stewart, "HTTP authentication: Basic and digest | A. Luotonen, and L. Stewart, "HTTP authentication: Basic and digest | |||
| access authentication," Request for Comments 2617, Internet | access authentication," Request for Comments 2617, Internet | |||
| Engineering Task Force, June 1999. | Engineering Task Force, June 1999. | |||
| [11] R. Sparks, "SIP call control," Internet Draft, Internet | [11] R. Sparks, "SIP call control," Internet Draft, Internet | |||
| Engineering Task Force, Sept. 2000. Work in progress. | Engineering Task Force, Feb. 2001. Work in progress. | |||
| [12] E. Guttman, C. Perkins, J. Veizades, and M. Day, "Service | [12] E. Guttman, C. Perkins, J. Veizades, and M. Day, "Service | |||
| location protocol, version 2," Request for Comments 2608, Internet | location protocol, version 2," Request for Comments 2608, Internet | |||
| Engineering Task Force, June 1999. | Engineering Task Force, June 1999. | |||
| [13] J. Rosenberg, H. Schulzrinne, and J. Peterson, "Third party call | [13] J. Rosenberg et al. , "SIP extensions for presence," Internet | |||
| control in SIP," Internet Draft, Internet Engineering Task Force, | Draft, Internet Engineering Task Force, Apr. 2001. Work in progress. | |||
| Mar. 2000. Work in progress. | ||||
| [14] C. Elliott, "A 'sticky' conference control protocol," | ||||
| Internetworking: Research and Experience , Vol. 5, pp. 97--119, | ||||
| 1994. | ||||
| [15] S. Donovan and J. Rosenberg, "SIP session timer," Internet | ||||
| Draft, Internet Engineering Task Force, Oct. 2000. Work in progress. | ||||
| [16] International Telecommunication Union, "Visual telephone systems | ||||
| and equipment for local area networks which provide a non-guaranteed | ||||
| quality of service," Recommendation H.323, Telecommunication | ||||
| Standardization Sector of ITU, Geneva, Switzerland, May 1996. | ||||
| [17] International Telecommunication Union, "Packet based multimedia | [14] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo, | |||
| communication systems," Recommendation H.323, Telecommunication | "Third party call control in SIP," Internet Draft, Internet | |||
| Standardization Sector of ITU, Geneva, Switzerland, Feb. 1998. | Engineering Task Force, Mar. 2001. Work in progress. | |||
| End of changes. 48 change blocks. | ||||
| 190 lines changed or deleted | 369 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||