Minutes for RTCWEB WG Interim Meeting on the 8th of September Chairs: Magnus Westerlund, Ted Hardie, Cullen Jennings Thanks to three minutes taker: Cary Bran, Mary Barnes, and Stephan Wenger There are about 45 people on the call. List of people on call: Alan Johnston, Alissa Cooper, Allyn Romanow, Andy Hutton, Atin Banerjee, Francois Audet, Bernard Aboba, Bert Greevenbosch, Cary Bran, Charles Eckel, Christer Holmberg, Christian Schmidt, Colin Perkins, Cullen Jennings, Dan Burnett, Dan Romanescu, Daryl Malas, Dan York, Enzo (?), Eric Rescorla (EKR), Ernst Horvath, Francois Audet, Francois Daoust, Haralard Alvestrand, Jim McEachern, John Elwell, Jonathan Lennox, Justin Uberti, Keith Drage, Kevin Fleming, Krisztian Kiss, Magnus Westerlund, Markus Isomaki, Mary Barnes, Matthew Kaufman, Michael Lundbreg, Muthu Arul, Neil Straford, Olle Johansson, Parthasarathi Ravindran, pm(?), Ram Mohan, Randell Jesup, Salvatore Loreto, Sebastien Cubaud, Serge Lachapelle, Sohel Khan, Spencer Dawkins, Stefan Hakansson, Stephan Wenger, Ted Hardie, Thomas Stach, Tim Panton, Wolfgang Beck WG Chairs slides Note Well presented and applies Meeting will be recorded Agenda presented Use Cases (1 hr) ---------------- 20 minutes: On Recording (John Elwell) - There was a discussion about the architectural scheme for recording (Slide 6). No major controversies here - There was question on how common remote recording are compared local from Justin. Which could not be answered, but John commented that the need for remote is likely bigger in RTCWEB due to the lack of middleboxes to do recording in compared to SIP. - Ted asked what is the most common today, which is to record locally and then upload. The reasons for remote recording is the desire to do the recording in real-time. Ted saw three alternatives: 1) Middlebox alternative, 2) The recording server is a RTCWEB peer and performs analysis directly on received media, 3) Recording is done locally and analysis is done afterwards. Justin expressed a preference for 2, while Ted made it clear that he would prefer to avoid two streams. - Ralph Giles proposed that the API should give access to the media streams locally for local processing of the media. John Elwell asked if that really could be done in real-time in the end-point. - John E made it clear that his desire is an rtc-web architecture that do not preclude session recording in the future. - There is an underlying security question here: do you need to know the set of participants in the the rtc-web session. And are you allowed to forward the media from one participants to another participant or set of participant without their explicit consent? Will defer this question to the EKR security discussion. - Ted concluded that method 2) above can be supported assuming no security issues. And in general there seems to be nothing in the architecture beyond the security that precludes recording use cases. There will be need for text in the use case document. Action Item: John Elwell to work with use case draft owners to add the recording use case(s) to the requirements draft. Review and discussion of other use cases proposed ------------------------------------------------- (Use case draft author team). Stefan H?kansson presenting. Magnus Westerlund commented about the implications of adding more use cases results in more functionality, which means more work to complete before documents can be approved. TURN Case: Cullen presented the use case he originally proposed. There was a question of what requirements really this imposes. Magnus asked if this primarily is an implementation question allowing for local configuration of additional TURN servers, similar to how web proxies work today in browsers.? Justin asked if these could be detected. Cullen thought it is similar to socks proxies but not certain. Francois asked if this really special, isn't it similar to the other parameters that you need, in for example a SIP call? Harald commented that he hoped that there are not as many ways to detect proxies to find a TURN server. Action Item: Needs more discussion and clarification on the mailing list. E911 Case: Ted clarified that there are two different cases here. The first being you go to 911 provider web site and use it to communicate. The second being is federated case or a dial out case where a calling service reaches beyond its domain to the E911 service. Ted believes the first one is not different from a normal RTCWEB service. The second is out of scope and the client aren't really providing location information, and if something is available it is unassured data, which NENA folks don't' want. Cullen commented that he don't buy the unassured data reason. People will build services that provide PSTN like connectivity. In those cases people will call emergency services. We should not stick our head in the sand, but we also should not create requirements that no one will implement. Ted added that the service needs to know which emergency center to use, thus it needs LOST so it can provide the server with its location. From a service perspective, not terrible. There is also DoS concerns by tieing up resources at the emergency service centers. Cullen, thought the DoS angle was silly as this already easily can be done. Ted thought the issues and the use case in RTCWEB's context should be run by ECRIT. Bernard Aboba commented that there is a doc in ECRIT about this. Stephan Wenger agreed with Ted arguments. He also commented that we can look at the regulatory requirements in some cases, like Skype out. Manufactures or service providers are required to provide support for emergency services. If we rule this out of scope we are sticking our heads in the sand. Francois commented that there are large variations depending on countries or jurisdiction. We our out of our depth here. Ted supported his earlier argument that we should take this to ECRIT and look at it from their perspective first. Randall J brought up the requirement to get location data in the browser requires security and user permission issues. Stephan W commented that for 911 calls in the US the caller ID blocking is overridden. Randall clarified that is ok, but you don't want to expose the information to other services. Stefan H responded that this is already resolved as there exist a location API in W3C that gets permission from the user. Action Item: Paul and Stephan to talk to ECRIT chairs to identify subject matter expert to help with E911 use cases. Emergency access for disabled: Randall commented that this use case is not only for emergency, it will be useful in general. Bernard added that there is not a lot more in this use case than what is in the Emergency Service use case. The added functionality is conferencing support to allow an interpreter to be added into the emergency call. Action Item: Bernard to work with emergency use case proponents to add the additional requirement. CLUE Case: Magnus stated his view that he didn't see CLUE being clear enough what it meant. Mary Barnes (CLUE WG chair) commented that their timeline is similar to RTCWEB's and the use case are mostly done. Thus RTCWEB shouldn't do something that precludes CLUE use cases. Also in cases like multi-stream we should not have different solutions. Christer Holmberg commented that CLUE use cases include the requirement to support single stream end-points. So although then may not get the full experience they will be able to communicate. Cullen stated that he doesn't understand what is really required to support CLUE. Mary responded that one will need to support the data for the functionality that is being passed, and that is currently being worked on in CLUE and an interim meeting are coming up which it would be good if some RTCWEB people could attend. Markus Isomaki asked if these requirements would be on the JS application rather than the browser that implements RTCWEB? Mary say we don't quite know that, including if RTCWEB will use SDP O/A or something else. Justin commented that also RTCWEB will need to handle multiple source inside SDP. Roni Even commented that the multi-stream being talked in CLUE is multiple video streams, which is different than most of RTCWEBs use cases where there are multiple pairs of streams. Also the negotiation should not be a big issue, but we don't know that yet. Stephan Wenger's view is that as CLUE is mostly about call setup and control which appears to reside in the web application, thus the requirement is more in the reverse direction. It is more important for CLUE to consider that what they develop can be implemented in JS for a RTCWEB compliant browser. Magnus Westerlund stated the that it is very unclear what the requirements really are. Is it compatible media planes, or that interoperable applications at all can be developed. So what is this use case as we don't know in detail what CLUE is. Mary stated as neither group is far enough to real be into the details She thinks CLUE interim should add to its agenda a discussion of use case that are overlapping and see what requirements are needed in each group. It is desirable for a RTCWEB end-point to be able to participate in a telepresence conference. Stephan Wenger commented that participation is likely not an issue, what needs consideration is to provide the more rich telepresence experience in the RTCWEB context. John Elwell agreed that RTCWEB client needs to be able to get the benefits for CLUE. However, what the requirements on the API and functions are not clear. Cullen concluded the topic and pointed out the clear Action Item and future alignment discussion will be needed Action Item: Mary Barnes to allocate time at CLUE interim meeting to discuss RTC Web/CLUE inter-operability, RTC-Web representation should attend. Large Multiparty session Stefan presented the use case. The discussion on the list seems to indicate that the use case is very similar to the centralized multi-party conference that is present. The suggestion is to skip this use case. No one objected. Security camera/baby monitor Randall commented that there clearly are security discussion that needs to discussed. Justin asked if this include pan, tilt and zoom functions. Randall answered that it would be handy. Remote assistance Randall this is a common use case with installed applications, and it would be good to do this without external apps or plugin. Harald commented that this might belong in W3C space. It is primarily about being able to take the screen as input. Randall agreed that video and audio from the system is needed. Jonathan Lennox asked if there is reverse direction also, where input is transferred? Randall answered that this is possible. The data stream might not be IETF issue as all and all be an W3C defined opaque stream from IETF perspective. Clearly there is some security concerns around the control. Justin commented that this seems to imply that there is some reliable data transport between the peers. Randall agreed. Cullen Jennings are very interested in the use case, especially the screen sharing to be able to implement WEBEX in RTCWEB. The remote control aspect is less important. Stephan Wenger asked if the reverse scope really is in scope. The screen sharing clearly is a common use case. The reverse path appear to be out of scope. Randall commented that the screen may not be transported as video, but over some reliable or unreliable protocol, like VNC. General for use cases Stefan H raised the issue of how we should work around use cases in the near future. Ted presented the WG chairs thinking on the issue. The WG chair will run a series of consensus calls for the use cases. It is important that the use cases captures the requirement they have on functionality so that it is clear what impact a use case has. If that isn't present the chairs may declare that there is no consensus. Stefan Wenger objected that the is harsh demands. One can probably argue the use cases, but not be able to determine the requirements. Ted responded that we are not going to be mean. But chairs might declare that the consensus is not determined and that additional work is required before coming to consensus. Signalling (1 hr) ----------------- --15 minutes Issue Overview (Matthew Kaufmann) Presentation: What needs to be standardized 1) between browser and web server - http is already there - does anything else need to be standardized 2) media transport needs to be standardized. 3) between web servers - do we need to standardize signaling federation. There are already existing protocols - i.e., SIP, SDP O/A 4) within browsers: API between application and Javascript APIs. IETF needs to provide requirements, but this is a W3C problem. 1) Options: leave to appl developer, SIP, SIP-lite, not SIP but should be SDP O/A 2) Propose to use existing media transport protocols 3) Options: SIP, Other (e.g., XMPP Jingle), up to SPs Cullen: another option - SIP might be used but SPs could also do what they want. John E: suggest this doesn't have to be SIP but if it is SIP should be compliant. Matthew: thinks this is premature (i.e., federations). Note, this doesn't define what should be between browser and web server. 4) While this isn't an IETF problem, it may be influenced by the solutions for others (e.g., 1) - How much of calling built-in? -- 1) Based on SIP -- 2) Javascript developer gets peerConnection object. Lots of room for innovation. -- 3) Intermediate choices. - How does address selection and NAT traversal work? -- 1) Peer connection is passed SDP blob -- 2) Peer connection is passed a candidate list, etc. -- 3) Peer connection has APIs for ICE, etc. - How does codec selection work? -- 1) SDP O/A --- Javascript can manipulate SDP, thus not totally bound -- 2) Javascript APIs --- Appls can query for capabilities, allows for more complex APIS, leverages W3C APIs Direct javascripts allow the developer to query capabilities: -- can still leverage MMUSIC -- can also use SDP as a command mechanism (and not O/A) Recommendations: -- avoid solving problems out of scope -- maximize flexibility for applications - turn browser into new operating system and not just bolting on SIP phone What needs to be standardized 1) No. 2) DTLS-SRTP 3) Leave to SPs for now. Look at SIP later 4) Don't build a SIP phone into browser. Implement ICE natively. Don't build O/A into PeerConnection object. APIs for codec choice, etc. Discussion: Justin agreed with many presumptions here. However 4) is thorny. Codec selection should be done with something other than SDP. Matthew responded that we don't need to fire SDP offer out of peerConnection. Call a javascript API if you want to use O/A. This may not be in scope for IETF - should be done in W3C. Justin remarked that if one don't pass SDP blobs, then query for object of capabilities and then configure with another object, would that address the functionality Matthew wants? Matthew confirmed yes, but may reuse some of the blobs from MMUSIC. Justin then thought that in the trivial case one could ask for blob and pass that blob, which is similar to the PeerConnection model that can generate O/A. Matthew commented that is as simple as getting a JSON blob and sending to other end and taking that blob out at the other end. Cullen: What we're talking about here is what can we pull out the blob for codecs, etc. Matthew answered that we need to come up with a minimal set of reqruiements and not defining blobs. Cullen responded that you've said advertisement model allows you to do more innovative things than O/A. Thinks SDP O/A is rich enough to control this. Matthew responded that the API as currently proposed for W3C says you don't get to know capabilities until you generate an offer. Example: 10 party conference call, everyone has selected codes that work for them. What do you tell an 11th party that doesn't support those codecs. API knows all the capabilities. Thus, web server can make a more intelligent decision - e.g., switch codecs for call. Cullen responded that O/A allows you to accomplish the same. Current APIs don't work. Don't see why O/A changes innovation. Matthew believes they are mappable. Can generate an offer and deconstruct it. This is about building an O/S. This question should be answered at W3C. Thinks that either of these is equivalent. Lots of Q&As on 4) have slopped over into 1) and 3). Francois remarked that we should be thinking about how to maximize possibility that we're successful. Seems to me that we should focus on the things in our mandate - i.e., protocol for transport. Thinks for Q3, should say that federation protocol is SIP. Agree with Matthew that protocol to Web Server is to be done by W3C. They should figure out what is best. 15 minutes Offer/Answer architectural text (Cullen Jennings) ----------------------------------------------------------------------- Design principles: 1) Need SDP O/A semantics as used by SIP rather than reinvent. 2) will be possible to gateway between legacy SIP devices that support ICE 3) When a new codec is defined, JS API doesn't change. Matthew asked if you defining what goes on the wire? Cullen answered no. Matthew agrees with 2) and 3) Thinks client can choose 1) Suggest SIP as a federation protocol. But, no SIP between Web Server and Browser. SDP blobs are not enough by themselves. Prefer SIP O/A semantics and not just RFC 3264. Requirements: - able to pass SDP O/A - indicate context of passing O/A - Deal with two phase SIP - 180/200 - signal errors in SDP ... Summary: need more than just SDP. Needs to be mappable for SIP. Discussion: Francois agrees you need more than O/A. If this was all at server to server I/F, would 100% agree that is all that is need. Concerns over browser-Server I/F where we shouldn't go out of comfort zone and not mandate an implementation. Should focus on Server-Server. May not need O/A on wire. Like now when is calling from my SIP phone to Skype. Cullen responded that he knows that Skype isn't mappable to SIP directly. Yes, talking about federation protocol. But, do think that what comes in and out of API, then goes through magical protocol - that protocol must be simple and MUST be mappable. The 3 interfaces need the information - don't necessarily need SIP O/A on all those interfaces. Francois thinks this is moving in the right direction. Keith commented that SIP level doesn't have anything to do with media. Does A server need to know anything about server B. Do we need feature tags? Cullen responded that focus has been on info needed at RTP level. Justin thinks this is the right direction. Need errors, timers etc. for glare, etc. Concerned about the "much more along these lines" - that's a slippery slope. Will we pull in more SIP stuff later? Harald commented also think that if we can push away two phase media commit at API level, that would be good. SIP has spent a lot of effort getting ? right. Would prefer if we could with a somewhat simpler model. Cullen responded that the thing he dislikes the most about the whole SIP discussion is the slippery slope. Obviously I don't have concreate proposal. We need a proposal where we can go have a detailed debate. For example on issues like 2-phase commit, which our use case currently have canned. I am glad to have the discussion, but we need a document that describes details. Harald: Hurray! Cullen wants to get into Harald's document "Design Principles" detailed on the slide. In short, points 1) and 2) needs to be mappable to SIP O/A. Francois commented that point 1) should be soften and strengthen point 2) a little would maximize our chances to get things right. Discussion: Jonathan Lennox asked if DTLS-SRTP and SAVPF is required, or are we going to have SDP CAPNEG which everyone dislikes, and if the former does everything else need to go to media G/W? Cullen assumes that direction is that we want to negotiate secure versus insecure media. Thus we need some negotiation with limited device and the browser. Accepting that we must interop with SDP and it is not trivial. Only thing harder is to replace SDP with something that is simpler that still does what people wants. Everytime that has been attempted it has failed. And like to persuade the WG to avoid going down that road, as this is a multi-year effort. And if that happens what the WG decides will be totally irrelevant as code will have shipped a long time ago. Kevin Fleming asked if it is that SDP is hard or is some problem space where we use SDP is hard? Cullen responded that yes, it's probably the latter. Ted: need more discussion of Matthew's and Cullen's presentations (for 10 minutes after the break) Continued discussion! Ted started up the discussion after the break. There is a trade-off here. The more we have standardized below the API the simpler the JS code can become. Yes, the JS-library can attempt to cover such functionality; however they are commonly frustrating incomplete in functionality. The one place where I am quite concerned is congestion control. Multiple streams in multiple layers trying to do control across this in the javascript portion seems quite difficult. If that is the case, how can we have the thin part below the API and still have the congestion control in the browser. Has anyone thought of this? Cullen commented that both CC and security function in browser needs to have a fairly deep understanding of what the application intends. That is likely easier if the browser selects the RTP parameters. Also the more successful HTML APIs have been easy to demonstrate for people. Thus the simple stuff needs to be really simple to accomplish, and the more advanced only possible. Harald commented about congestion control (CC), that one issue is trust. If CC is implemented in the JS then we need to trust all the JS authors. If implemented in browser we need to trust the browser authors. Eric Rescorla asked if with trust Harald was concerned with malice or incompetence. Harald responded that any sufficiently advanced incompetence can't be distinguished from malice. Eric clarified that he was interesting in how strong guarantees are needed that appropriate congestion control actions are performed? Harald like the assurance from browser that independently of what mess the JS tries to create nothing truly bad will happen. Randall Jessup agreed with this. Eric Rescorla followed that up asking that is CC is implemented in the browser how is the signaling and response to congestion handled if the JS application is responsible. Ted commented that he is worried if all JS applications needs this complicated back and forth handling of events and response selection. Harald agree. Justin commented that on the positive this is a differentiating factor. If in browser it is likely we can do it correct, but there will be people that think they can do it better if they had the possibility. Randall? commented that there is a connection between application response and congestion control. But the determination is best done in the browser. Justin agrees, but in for example a conference scenario then it is the application that needs to make a decision to drop a stream. Dan York asked on how we define congestion control. Eric rescorla asked what is required to build a very basic soft phone. He thinks he understands what needed in the case of Cullen's model. There is very little JS code and quite slime server side. But has anyone tried out what Kauffman is proposing? Harald commented that he hasn't seen enough details of Kauffman proposal to build a site. I certainly can't(?) see how I build a site fulfilling Cullen's forward compatible principle. Dan York commented that they are building something that are similar to Kauffman's principles. Cullen interjected that in his discussion with Kauffman's believe what Dan Y is doing is a type example of what Kauffman hates. So there might be some confusion here. Bernard Aboba asked, but that is a JS example, not SIP in the browser? Dan York followed up saying that there are definitely people building in JS real-time stuff that has commonalities. Eric Rescorla formulated the two questions, how much code does oneself as site programmer have to write, and secondly, how much code does one have to download? Cullen is wondering if the principles presented are okay. Cullen has written several prosals, but is not willing to spend more time on writing up an offer/answer based proposal along the principles if there is sufficient agreement. I want to make progress. Parthasarathi, Ravindran? why don't we take SIP as framework, not the applications and have the browser be an UA. Cullen responded, yet that is possible. And it appears like all the demos are based on some softphone. Ted summarized that we gotten a lot of good material and discussion. Requirements for federation and browser server have clearly made progress. Have a quick round among the people driving a design principle. Then have two week consensus calls for the actual principles. Justin requested that if there is anything the other model can't do, please detail that. Ted agreed and clarified that in the input to the consensus call such input is highly desirable. Action Item: Take this up on the consensus call with the signaling design advocates, present results to mailing list. Security (1 hr) Note: new version of the security document Presentation Overview: - RTCWEB functionality is too dangerous to enable by default -users must consent - How do they consent intelligently - Objective of discussion - work through common cases. Common Themes: 1) Consent issues: - making long term grants secure - user expectations Potential long-term consent security features: a) live with it b) require user interaction with browser for all calls c) require user interaction with browser for new calls d) require JS to be delivered over HTTPs 2) Authenticating the person you are talking to (suggest this is less important than consent) Discussion: Cullen commented that EKR is phrasing consent in terms of access to camera. You've captured those options. But, if you frame it as control who your computer is sending media to and that would lead to a different set of options. Ekr exemplifies this by going to a poker site and connect to Ted (which he doesn't now, just a random guy). How do I avoid faked pokerweb to connect me to some random guy? Cullen responded that it depends on the identity provider which is pokerweb. It is another issue to separate between the real pokerweb, and the faked one. Ekr two concerns, first one that even with TLS I can't be certain that I am talking to the real pokerweb. The second is the question, do I have any protection against the real pokerweb tapping its user. Cullen thinks we will need two fundamental authentication processes. Ted asked can you talk about in terms of assurances rather than UI. Which assurance is being provided and whether it has been successfully provided. Cullen responded that wasn't saying there were two assurances - there are two groups that can be selected. Assurance where you install application like Skype is different than having a conversation with X and disallowing that thereafter. Ekr responded that after bulk auth, the amount of protection is quite limited. Not sure how to improve this without interfering with user. Minimal assurance of sites you authorize (with HTTPs). Assurance before making a call or assurance when you've made a call before. Eg., a user been on pokerweb before and already had a call with two people. If I am willing to only have calls to these then that limits what the website can do to bug the calls. This of course has UI consequences. Short-term consent: - single call to people that you have no prior relationship - Conflicting requirements; low-impact, not something users click through, can we do anything to help here. This is really a W3C issue. -- It was suggested to do like facetime and mirror the UI. - Characterization: User doesn't know who they are calling. Don't have technical means to give this kind of identity. -- Know domain and you need to figure out if it's connected to people you want to call. -- It requires a user leap to go from FQDN/user name to the company you're talking to. API impact of short-term consent: 1) show self picture 2) This implies some level of device access prior to permissions Suggests this is out of scope. What about site being visited: - should top level site get an opinion: Discussion: Sohel: Hearing problems - not solutions Ekr: proposal is to allow JS API for consent Enforcing Pseudonymity: - If you care about this use SRTP - site can enforce this - browser can support this - cryptographic continuity Verifying who you are talking to. Summary: - can't completely eliminate threats from long-term sites: -- Basic principle: trust but verify - short term consent is somewhat more secure: -- likely user will have to give consent Discussion: Jonathan asked if interface is for requesting long term or short term consent? Ekr responded it will be the web sites that decided which to use. The short term consent is happening as side effect of a JS call. The app can make call without long term permissions. The long-term consent process is something a little more heavy weight. Jonathan asked if the user is clear what they are consenting to. Ekr thinks so. Ted a question around the short term, when you consent only for this call, some long running process may be present in a Web application. How do you determine the length of a call compared to the UI interactions? Does the call end when you navigate away from tab or does it end when you quite the browser or when you close the tab? How do we manage the user expectation and consent? Ekr have no idea what the answer is. Basic requirement is that a user be aware of what and how many calls are in progress. Randall commented that this is edging into W3C side. Mozilla will be considering UI may have a Chrome indication that mic/camera are active. Likely part of the solution. Ted wonders if there is a need for an additional assurance type? Do I want to have an assurance that they have access to camera and microphone only when I am on tab containing this application. Do I have one assurance that is short term as long as this tab exist, and one that is long term. The assurance models are different. Would be hard to express to the users. At the same time having different types of assurances will be valuable. What does other believe? Cullen wants assurance that whenever end it, it truly ends. Ted stated we don't want to make this part of JS. Cullen stated that our threat model is that JS is inherently untrusted, or isn't it? Randell agrees that part of the indication is in the chrome, if you let end call be part of UI in the apple. If you hit end and you don't see that visibly in UI, you know it didn't do the right thing. That's not the whole solution, but parts of it. Ted stated that we don't appear to have a need for different assurance classes? Stefan commented we've beyond ending calls, also have recording. And as soon as you can record and send a file off the user has very little control. Ted restated this as, Is there any assurance that we would like to provide and is there any that we can provide, given that the number of participants is known. Ekr responded to that with yes and yes. There is an big importance between 0 and 1 participants, 1 and N is important, but less than 0 and 1. Browser should indicate both cases. Ted thought that was a different Q. Eg. I'm sending media to Stefan who is sharing media with recording service. Can we provide any assurance, or is simply that when your media left your device and arrived somewhere they can do anything. Randall responded, we can't really give an assurance, there's always an OS way to grab the media and forward it. Stefan stated that even if verified who I am talking to our, your application may also locally record it send it. Cullen commented we need to differentiate recording and what can happen to media when going out a speaker versus what a less than trustworthy JS application can do with the info. Handling the first case is just one bar to far. Something we can't deal with. Magnus tried to reinforce Stefan's point that local recording is possible and then can ship data off machine from JS. No Conclusion. Terminology Mapping (30 minutes) -------------------------------- Mapping WebRTC constructs to RTCWeb terms (Magnus Westerlund) Presentation Overview o Multi-media session versus RTP session o RTP related terminology o WEBRTC API terminology (MediaStream object, MediaStream Track, Label, PeerConnection) Discussion: MediaStream and Label - MediaStream track can be mapped to a SSRC in RTP session - MediaStream track has a synchronization context that can be represented by CNAME - a MediaStream sent by a PeerConnection can be presented by a list of RTP session SRTP tuples - The MediaSTream label has no matching construct: -- SDP a=label attribute labels RTP sessions, not a set of SSRCs in possibly several RTP sessions. -- label can't be CNAME Discussion: Harald haven't been able to track down a requirement for mediaStream label Magnus responded can have the same media source appear in multiple mediaStreams, which makes CNAME dubious Harald commented that multiple copies of same mediastream track used to send media in different directions. That means that it doesn't matter whether they have the same or different CNAMES because they won't end up at same receiving entity. Cullen commented that these are in different MultiMedia sessions from RTP view. Harald concluded that he don't see use case. Magnus responded that he have a use case where you might want to maintain CNAME across multiple sessions (next slide). Justin asked what do you expect application to do with mediastream label? Cullen responded with a use case where a video device is getting 3 streams. Have been told in signaling JS that you're getting Alice, Bob and Charlie - need to know how to map to media stream. Stefan added that an end-point can have multiple cameras. Cullen remarked maybe we need a label for a track and not one for a mediastream. Mediatracks need ssrc and cname. Harald disagreed. If media stream represents all that need to be coordinated. Need a label to refer to that construct. Magnus directed attention to the mixer case. Where a likely to receive a mixed audio stream, and several video streams. Harald responded that if sending audio and video, need two things, need you to know they belong to me and that they need to be synchronized which is CNAME. Why do you need another construct? How else can you refer to it? Need to map to an identifier to figure out the actual stream. Justin commented that you only need to map to a track. Stefan commented the model on application side has been to deal with stream. Justin responed that we have identified places where this breaks down - mapping to track and not stream provides more flexibility. Stefan thinks it makes it more complex. Cullen a good example: two camera views with Harald and both synced to one audio. Want to synchronize all three and different two camera views. Not proposing a solution, but this seems to be use case. Ekr asked what is application model? Either I'm in Matthew's world with a JS handle to both or in SDP browser world. Cullen responded we are talking about which JS handle that flows across the interfaces. Ekr wonder where do you envision this being signaled. In SDP according to Justin. Ekr asked what are you trying to say. Justin clarified that the application is asking for the wide angle shot, or the zoomed in. Cullen added that we are already receiving both. Want to display the wide on right hand and zoomed on left hand. Justin that works as long as you have video track tags and use url to get. Ekr was not clear on whether application does this locally or if this sent across wire. Justin and Stefan agreed that it is local operations, but Stefan added that you need information from the other side. Ekr commented that one have a JS object, why do I need another label? Cullen refered how does incoming RTP packet get sent to right JS object is the question. Justin remarked by the SSRC. Colin Perkins you would need same CNAME for synchronization. But, can easily define a new identifier/label if we need one and put it in RTCP. Conclusion: Discussion to continue on mailing list. Congestion Control (10 minutes) Harald Alvestrand Randall is in favor of this. Need to run congestion control across all streams, not just RTP. Harald remarked that he agrees and Google are experimenting with joint control, but the RTP identifiers aren't setup to make this easy. Justin remarked that we have continuous (likey to fill the pipe) as well as discontinuous traffic. What kind of assurance can you give users that their data traffic makes it? We likely need something to help their traffic getting through. Magnus commented, that it depends on what timescale you want to react. If one at least frame-level you can throttle back your video encoder to make room for the data. Justin, if media is running without saturating the pipe and the application dumps 300k of data, then it will be an RTT before we know if there was a problem or not. Randall, we likely need something like TCP slow start. The data is clearly asynchronous and may not be a new connection. We need something the details are the question. Cullen commented that in use case where data was much less than video, then no issue with data. As the spike from an I-frame is much larger than some single data packet. May be easier to solve this in some cases - e.g., low data rates. Tim Terriberry remind that a proposed use case was to do direct file transfer without relay to the server. Justin responded this argues for reliable data channel. Randall noted that the JS application may need to give indication of prioritization and e.g., percent of channel devoted to data. Justin agreed may need to request a reservation to send more data. Agree with Cullen's remark, and that implies one should avoid I-frames. Colin there is a missing requirement to not impact media flow. Issue is that what is good for the network, like slow start is bad for the media. Randell commented there are cases where timely delivery versus media quality is more important. This is an application decision. Justin agrees that QoS is something we should let applications to set. Ted concluded that draft-alvestrand-rtcweb-congestion-00 is not yet ready for WG adoption. Wants Harald to take this forward. Harald wants someone else to take on the task to write requirements for congestion. This document just documents what Google is doing. Randall was willing to cooperate on a draft. Conclusion: Randall will write up a first pass at requirements (along with Harald) and discuss on the ML.