Audio/Video Transport Extensions (avtext) ====================================================================== Chairs: Magnus Westerlund Keith Drage Minutes IETF 82, Taipei AVTEXT Status Update Magnus Westerlund reported on the WG status. Keith Drage was not present at the session. Jonathan Lennox commented that the client to mixer and mixer to client audio levels draft updates where done and would be soon submitted. IEEE 1588/802.1AS Synchronisation for RTP Streams ------------------------------------------------- draft-williams-avtext-avbsync-02 Aidan Williams was presenting. The focus of the presentation to provide what the requirements are on media playback when one like to replace analog cableing with networks in profesional PA systems. The leads to the usage of RTP in this area and what shortcomes exist and perceived need for extensions or improvements compared with the IEEE 1733. Concluding with a proposal on how to address these issues and short comming. Jonathan Lennox commented, there will be a bit of rounding error between NTP format and a nano counter. Adian responded that that is not a significant issue. Magnus Westerlund, comment that the XRBLOCK WG did already have some discussion of clock sources. Aidan promised to follow up on that topic. RTCP SDES Item SRCNAME to Label Individual Sources -------------------------------------------------- draft-westerlund-avtext-rtcp-sdes-srcname-00 Magnus Westerlund presented a individual proposal with the general problem statement on how to associate multiple media sources (SSRCs) within and between RTP sessions that works also when there are multiple original media sources using the same SDES CNAME item. Multiple cases where the existing mechanism are sub-optimal was presented. The authors asked if the WG was interested in pursuing a generic mechanism to address the presented problem. Cullen Jenning stated that he did not understand the problem statement. Magnus tried restating the problem using an example: If an endpoint has two cameras A and B which is to be encoded and transmitted in an RTP session. These two video streams is also to be protected using FEC that creates new Media streams with the protection data. In that case the RTP session will contain four SSRC that the receivers needs to correctly assoicate the pairs. Cullen followed up that asking if there is no method to do that in signalling. Magnus agreed that there are signalling solutions for the use case when all streams to be associated are within a single RTP sessions, or when all the media streams are in unique RTP sessions and all using the same SSRC value. However, there are no RTP/RTCP based mechanism that works in the case of a single session. Colin Perkins reinforced that CNAME can't be used in this case. With the increased interest in having fewer RTP sessions this becomes a real problem. Cullen commented that he did not see a need for having the FEC in the same RTP session as the media, but several others responded that they did. Jonathan Lennox stated that SSRC grouping do work, however it does require a new of offer/answer everytime a new FEC protected source joins the session. Which might be an issue depending on your session. Cullen Jennings stated that he doesn't want to push all the signalling into RTP and RTCP. Magnus agreed that pushing everything into RTP/RTCP is not appropriate. However, this is not session configuration information it is information associated with the existance of media flows that comes and goes, like in multi-party sessions. Magnus view is that you don't want to renegotiate the session just because an additional participant joins. Cullen disagreed, he thinks it is necessary to renegotiate the session when new participants joins. Thus he didn't see the use case. Colin Perkins commented that depends on how you build your system, and there has been systems that hasn't needed renegotiation. Cullen responded that you need a lot of additional information for your system to properly be able to process the data. Thus you anyway need signalling and have no reason to put it in RTP. Roni Even requested clarification on how one can avoid signalling in the simulcast. Magnus described that in their proposal the different encoding versions are determined based on which RTP session it appears, but by using SDES SRCNAME it would not be necssary to require the same SSRC in all the RTP sessions, and associate FEC or RTP retransmissions in the same RTP session as the encoding version is being sent. Roni thus asked if it is primarily is for avoiding to signal in dynamic cases. Magnus agreed that this is the main issue, and your signalling mechanism determines how significant this issue are for you. Roni followed up with that it only provides the association information, not anything else. Harald Alvestrand tried to summarize it as this being a solution need if one don't signal on every change, or have more than a single source being added at a time. Magnus agreed with the comment that it is difficult to know that you will only have one stream being added in a session at any given time. Roni commented that he did understand the FEC and RTP retransmission use cases which was clear. The others have limitations that isn't clear. Dave Oran commented that he had become more confused. He asked if we would have had this converstation if CNAME had been defined to be hierachical? Magnus responded probably not. Then Dave responded that he has a proposal to make ... Justin Umberi asked what about the SSRC-grouping semantics. What if you have multiple dimensions of grouping. Magnus responded that he thinks the proposal corresponds to one grouping semantics. Jonathan Lennox expanded on this with an example of two media sources, which each has two encoding version and each version are FECed. Then you have eight (8) streams where you need more than a source identification. Magnus agreed the need for multiple groupings. Colin Perkins commented that we do have a long history with RTP where SSRC can change and collide and we can't ignore that history. Thus we need to deal with the that they can change. Thus unless we signal every change we need some mechanism to associate streams. There has been many complaints about forcing renegoitation. Cullen, asked how often collisions happen and if that is relevant. Magnus responded that actually joining and leaving participants is a more relevant. Dave Oran raised the issue that you may use different FEC from one participant to another. Magnus agreed that there are cases where you might force session wide renegotation, however one can in mixer not doing media transcoding but forwarding or repacketization one can ensure that the the sent packets stay within the parameter negotated also for new participants. Dave Oran then commented that then you can pre-allocate a bunch of slots so that you have no state change. Cullen stated that he could not see a system working with two camera that didn't contain some signalling to provide additional information about these two cameras. Colin Perkins responded that the point is to avoid signalling for associating them. Harald wonders if the problem of associating the camera and the microphone is part of the problem statement or not. Because it is currently not in the draft. Magnus responded that it has been excluded due to previous discussion of the relation to the CLUE WG's work. Colin Perkins commented it is a possibility that it can be used. If one can associate media and FEC one can associate audio and video. Harald responded that it might be the use one doesn't use it for because that use case becomes complicated awfully fast. The question Harald sees is; Are there use case where one doesn't do signalling on change but still need association that are interesting enough? Magnus concluded the topic by stating that will try to answer Harald's question and follow up on the feedback. RTP Media Stream Pause and Resume --------------------------------- draft-westerlund-avtext-rtp-stream-pause-00 Magnus Westerlund presented his individual proposal. He stated that the main use case for the authors has been the Mixer case presented on slide 3. Matthew Kaufman asked if one do Voice Activity Detection on the stream one is not receiving? Magnus clarified that you likely receive the audio and just pause the video. Matthew responded that a signalling based solution would be better as you could pause the audio also. Magnus continue with requirements and continued with why not TMMBR. Roni Even commented that he don't think the reasons are relevant in Mixer topologies. A mixer would know if a media stream is shared or not and could those decide if it is possible to pause. A new session participant will only receive what the mixer sends, and those are the non-paused streams. For the semantics that is correct, but not relevant. As the mixer knows the stream it received prior to pausing. And TMMBR=0 is being used in deployed video conferncing systems. Magnus commented that this is a violation of a MUSST in the RFC. Roni followed up that the mixer is doing what the receivers desire and knows. Magnus followed up that the two first issues are about the certainty of the classification of the topology. Magnus skipped the proposal for discussion only noting that there is a relation to the Media Stream Selection. Hadriel Kaplan asked about the use cases, what prevents usage in the point to point situation? Because that would be a bad idea. The reason is that there a lot of deployed boxes that requires media packets when the signalling indicates that there should be packet flows in the signaling, e.g. SDP a=sendrecv. So without sending this in the signalling the proposal is doomed. Dave Oran commented that some signalling is encrypted and can't be read. Hadriel responded that the middleboxes he is talking about are in the signalling path. Cullen Jennings asked why this isn't done in a protocol like BFCP that appears to be a better match. Magnus responded that it has been considered and for some of the interactions that is very suitable and is discussed in Media Stream Selection draft (draft-westerlund-dispatch-stream-selection), however; BFCP seems a good match for receving client to mixer communication, however for the direction of the mixer to the sending client. Cullen sees no problem with having messages going from the mixer to the sending client. Roni Even commented that would in many case require a new control channel (BFCP) for just this functionality. There is already the CCM message in RTCP being sent. Stephan Wenger asked about the apparent contradiction between Hadriels statement and the usage of TMMBR=0. Is this an industry separation where the VoIP side does things different from the Video Conferencing? Hadriel would ask how much TMMBR=0 is deployed over the internet and the few thousand service providers that use SIP. Is it used with mixer, thats a tiny percent. It is no secret this happens. Roni Even, commented that in H.323 by doing flow control = 0, and TMMBR=0 is the similar way. It is primarily MCU that do this. Stephan Wenger commented that there are an aspect that might make him support this proposal. An really good video encoder that encodes the slide deck view (that had been static the last minute) would not produce a single bit, with the exception of any keep-alive etc. Thus there is a difference between a transport channel property of TMMBR=0 and the media having been switched off. Hadriel, like to clarify that calls don't get quickly torn down, it is a long term decision, however configurable. Comfort noise transmission and other selldom send patterns. If one would use pause, one would need to mandate keep-alive. Magnus concluded that we have some supporting the general functionality but a preference for using TMMBR = 0. We have some saying, go use the signalling channel. We clearly need more mailing list discussion. Matthew Kaufman commented that if we are going to do this typ of signalling, he would like to have rewind and volume up and down. Colin Perkins commented that this can be viewed as replacing an existing non-clear mechanism (TMMBR = 0) with a clear mechanism both using RTCP. Cullen, commented that it also adds ACK mechanism.