<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC5763 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5763.xml">
<!ENTITY RFC5764 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5764.xml">
<!ENTITY RFC4568 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4568.xml">
<!ENTITY I-D.ietf-rtcweb-overview SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rtcweb-overview.xml">
<!ENTITY I-D.ietf-rtcweb-use-cases-and-requirements SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rtcweb-use-cases-and-requirements.xml">
<!ENTITY I-D.ietf-rtcweb-security SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rtcweb-security.xml">

]>

<?xml-stylesheet type='text/xsl' 
    href='http://xml.resource.org/authoring/rfc2629.xslt' ?>

<?rfc strict="yes" ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc compact="yes" ?>    <!-- conserve vertical whitespace -->
<?rfc subcompact="no" ?>  <!-- but keep a blank line between list items -->


<rfc category="info" ipr='trust200902' docName="draft-ohlsson-rtcweb-sdes-support-00">

    <front>
        <title>Support of SDES in WebRTC</title>
        <author initials='O.O' surname='Ohlsson' fullname='Oscar Ohlsson'>
            <organization>Ericsson</organization>
            <address>
<postal>
                <street>Farogatan 6</street>
                <city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
                <email>oscar.ohlsson@ericsson.com</email>
            </address>
        </author>
        <date day='22' month='January' year='2012'/>
        <workgroup>Network Working Group</workgroup>
        <abstract>
<t>
Which key management protocols to support has been lively debated in WebRTC on 
several occasions. This document explains the benefits of SDES and argues why 
allowing it as an alternative option has little impact on security.
</t>
        </abstract>
    </front>

    <middle>

<section title="Introduction">
<t>
Which key management protocols to support has been lively debated in WebRTC on 
several occasions. The main question is the following: Should applications be 
restricted to DTLS-SRTP or could SDES be allowed as an alternative option?
</t>
<t>
In this document we identify and address the issues that have been 
raised. We explain the benefits of SDES and argue why allowing it as an 
alternative option has little impact on security.
</t>
</section>

<section title="Benefits of Supporting SDES">
<t>
Being able to communicate from WebRTC applications to existing 
SIP/RTP endpoints is a highly desirable use case. The SIP installed base is 
huge and contains millions of devices and a large number of applications 
(e.g. conferencing and voicemail). Even more important, nearly all mobile 
phones and landlines are reachable through SIP/RTP gateways deployed in service 
provider networks. The same can also be said for other signaling protocols, 
such as XMPP or H.323. As a sidenote, the recent work on the DTMF tone API in 
WebRTC proves that many members consider legacy interworking to be important.
</t>

<section title="Reduced Complexity of WebRTC-SIP Gateway">

<t>
Communication between the Browser and SIP/RTP endpoint will most likely require 
some form om media-plane gateway (due to the need to terminate ICE). The development 
and testing costs for such gateways are typically very high since they need to handle a large number of 
users and often contain special purpose hardware. It is definitely worthwhile to 
try to reduce costs by lowering the complexity and removing functionality that is 
not strictly required. This would result in lower prices which will 
lead to a higher degree of interconnectivity between WebRTC and existing SIP 
deployments.
</t>

<t>Already today there are SBCs that perform SRTP termination on behalf of endpoints with SDES based keying (there are SBCs that support DTLS-SRTP but this is uncommon). If the browser also supported SDES, the WebRTC gateway could simply forward all SRTP packets to the SBC and let it decide whether to terminate encryption or not (depending on the capabilities of the receiving endpoint). </t>
</section>
<section title="Reduced Processing (Less SRTP Terminations)">
<t>
A large part of existing SIP/RTP devices support SRTP and most of them that 
do, use SDES based keying. This is confirmed in the report from the latest 
SIPit event which stated that:
<list style='symbols'>
<t>80 percent of the tested implementations supported SRTP</t>
<t>100 percent of the SRTP implementations supported SDES</t>
<t>0 percent of the SRTP implementations supported DTLS-SRTP</t>
</list>
Although these figures may not be entirely accurate, they at least provide an 
indication of the current situation. Assuming that SDES is supported by browsers, 
a major part of all calls (80 percent if the 
above figures were correct) would not need to be encrypted/decrypted by an 
intermediate gateway. This is a substantial reduction in processing cost for 
the gateway. Another benefit is that for those endpoints that support SDES the 
call will be protected end-to-end for free. Achieving this with 
DTLS-SRTP would require the gateway to first decrypt and then re-encrypt traffic.
</t>

<t>
Note that the important question is whether the gateway needs to terminate 
SRTP at all. Processing wise there is probably not that much difference in 
terminating an SRTP + SDES or an SRTP + DTLS-SRTP call.
</t>
</section>
<section title="Reduced Call Setup Time">
<t>
One common argument against SDES is its inability to handle early media (i.e. 
media that arrives at the SDP offerer before the SDP answer arrives). 
However, since at least one full signaling roundtrip is required to conclude ICE,
this argument is not applicable in WebRTC. In fact, media starts to flow later 
with DTLS-SRTP than with SDES since additional time is required for the DTLS 
handshake to complete.
</t>
<t>
Note that what is said above only applies to the first offer/answer exhange. 
If an additional media stream is added later in time and multiplexing is used 
(i.e. single 5-tuple carrying all flows established with a single run of ICE), 
then the problem with early media could arise when SDES is used (but never with 
DTLS-SRTP).
</t>
</section>
</section>
<section title="Security Considerations">
<t>
At this point most readers should agree that SDES is favourable from an 
interworking point of view. It is also clear that implementing SDES in WebRTC 
is a relatively straight forward task. What remains to be considered are its 
impacts on security.</t>

<t>
We distinguish between the following two types of attackers:
    <list style='hanging' hangIndent='20'>
        <t hangText="Outside Attacker">An external party attempts to intercept a call (e.g. a host located on the same WLAN as the user)</t>
        <t hangText="Inside Attacker">The web application itself (or the signaling server, in case the web server and signaling server are separated) attempts to intercept a call</t>
    </list>
</t>


 
<section title="SDES vs DTLS-SRTP in Case of Inside Attacker">
<t>
By requiring that signaling is secured using TLS, an outside attacker that 
monitors network traffic will not be able to extract the SDES keys. Therefore,  
in this scenario both SDES and DTLS-SRTP provide a sufficient level of 
protection.
</t>
<t>
The two other types of attacks that have been mentioned in this context are 
extraction of log data and code injection, each of which are considered below.
</t>


<section title="Extraction of Log Data">
<t>
In this scenario the attacker manages to decrypt a previously recorded call by 
attacking the signaling server and extracting the SDES keys from the server log.
</t>
<t>
First of all, if the attacker gets as far as reading the logging data then 
eavesdropping of past calls is probably not the only problem. The effort 
required to break into the server is also related to the amount of trust the 
user assigns to the web application: well trusted sites often have well protected 
servers.
</t>
<t>
Secondly, it can be questioned how common this type of extensive logging really 
is. For example, user login via HTML forms is extremely common yet one seldom 
hear of passwords being extracted from server logs.
</t>
<t>
Finally, SDES will primarily be used when interworking with existing SIP systems 
deployed within enterprises or service providers. These have been using SDES for 
a long time and know that it is critical to protect the plain text keys.
</t>
</section>
<section title="Script Injection">
<t>
In this scenario the attacker manages to inject his own piece of JavaScript
into the WebRTC application. The next time a user downloads the application and 
places a call, the script will execute and start eavesdropping on the conversation. 
</t>
<t>
There are three major ways in which code can be injected into a web application:
<list style='symbols'>
<t>
The page itself or one of its included JavaScript files is downloaded over 
a non-HTTPS link and is modified en route
</t>
<t>
The web application intentionally includes JavaScript supplied by the attacker
(e.g. a third-party library or advertisement) 
</t>
<t>HTML form input or URL parameters are not properly sanitized (i.e. classical 
XSS vulnerability)</t>
</list>
</t>
<t>
Modification en route can be ignored by requiring HTTPS to be used for all 
content. Whether the two other injection techniques are feasible or not largely 
depends on the application.
</t>
<t>
If script injection occurs then there are other methods to intercept a call, 
like establishing additional PeerConnection objects or use a recording interface 
and send the data using WebSocket. As long as these methods are available
it does not matter much whether the application uses SDES or DTLS-SRTP.
</t>
<t>
In general, if an attacker manages to execute even a small piece of JavaScript 
then he has effectively gained full control of the application (additional 
code can be included and HTML elements removed/inserted). Since this situation is exactly 
the same as the situation with an inside attacker, script injection will not be 
discussed further. 
</t>
</section>
</section>
<section title="SDES vs DTLS-SRTP in Case of Inside Attacker">
<t>
First of all, it can be questioned if we really want to protect 
ourselves against an inside attacker. If consent is required every time the 
application wants to record or forward media then the user experience will suffer. 
One could also imagine future applications that want to use their own codecs or 
filters (for example a voice scrambler or face detection software), something which is 
difficult to achieve without access to the underlying bitstreams.
</t>
<t>
We ignore this problem for now and simply assume that the application cannot 
access the media from within the browser. In other words, we only consider  
protection of the media during transport.
</t>
<section title="Downgrade Attack">
<t>
The major argument against SDES is that it would make it trivial for the 
application to perform interception. Let us compare what would be 
required in both cases.
</t>

<t>
Interception of SDES call:
<list style='numbers'>
  <t>Copy and store the 'a=crypto:' lines in the offer/answer SDP</t>
  <t>Force media to pass through TURN server by deleting all 
     candidates except the relayed one</t>
  <t>Store all SRTP packets that pass through the TURN server and decrypt them 
     later on (using the keys from step 1)</t>
</list>
</t>

<t>
Interception of DTLS-SRTP call:
<list style='numbers'>
<t>Replace the 'a=fingerprint:' lines in the offer/answer SDP with the fingerprint of a public key generated by the application</t>
<t>Force the media to go through the TURN server by deleting all ICE candidates except the relayed one</t>
<t>Modify an existing TURN server implementation so that it decrypts and re-encrypts the DTLS traffic (using the public-private key pair from step 1)</t>
</list>
</t>

<t>
Putting the modified TURN server into place is the hardest part of intercepting a DTLS-SRTP call. Once this is done however, the remaining steps are fairly straightforward. This shows that neither DTLS-SRTP nor SDES provides any significant protection against an inside attacker. 
</t>

<t>
There is one benefit of DTLS-SRTP that is not directly apparent from the above description. If both users read their respective fingerprint values over the voice channel then they can detect if the conversation is being intercepted. However, it is very unlikely that the average user would bother doing this.
</t>
</section>
<section title="Difficulties with Key Continuity">
<t>
The comparison in the previous section is somewhat simplified since it does not consider DTLS-SRTP key continuity. The way this mechanism works is that the browser will notify the user whenever it receives a certificate which has not previously been seen (i.e. not present in the browser cache). Since the user will receive this notification every time he calls someone new and whenever someone changes browser, it is very likely that he/she will simply ignore it.
</t>
<t>
Reuse of public keys also has privacy implications as it enables user tracking. A user that wants to remain anonymous towards a service provider would need to generate a fresh key for each interaction. Furthermore, in order to avoid colluding service providers (e.g. medical clinics and insurance agencies) from linking a user's activities, separate certificates are needed for different domains. However, storing domain names together with the certificates would allow the next browser user (e.g. a family member) to see which sites the previous user visited. All of this leads to more certificates being generated which in turn results in even more "new key" notifications.
</t>
<t>
It is also important to understand that the cached certificates are not bound to any identity (the certificates are simple containers for the public key without any additional information). This means that if just one of the cached keys is compromised any user call can be intercepted without causing the "new key" notification to be displayed. Note that the risk of this happening is directly related to the size of the cache, which grows over time.  
</t>
</section>
<section title="3rd Party Identity Assertion">
<t>
<xref target="I-D.ietf-rtcweb-security" /> suggests a way to strengthen the security of DTLS-SRTP by validating the received fingerprint via an identity provider. At the time of writing the proposal is still a bit vague (for example, it is not clear how the identity provider is selected in practice) but it definitely seems promising. Such a mechanism (including the necessary browser chrome) would make it significantly harder for the application to act as man-in-the-middle.
</t>
<t>
The question is whether the identity mechanism is optional or not, i.e. will it be possible for an application to use "plain" DTLS-SRTP. The answer is most likely "yes" due to the following reasons:
<list style='symbols'>
<t>Many applications are already trusted by the user</t>
<t>Some applications do not want to depend on third parties</t>
<t>Some users do not have any identity provider account</t>
<t>Users may not always want to reveal their identity</t>
<t>Working out all the details of the identity mechanism will take time (and if it is not mandatory from start there are backward compatibility issues)</t>
</list>
Note that allowing an application to be its own identity provider is effectively the same as allowing plain DTLS-SRTP (the user trusts the application) only more complicated.
</t>
</section>
</section>
</section>
<section title="Discussion and Conclusion">
<t>
We are not looking to replace DTLS-SRTP with SDES. The 20-line WebRTC developer 
will continue to use the default option which is DTLS-SRTP, while others who are 
interested in interworking will select SDES. The latter group will be required 
to use HTTPS for all content and can be informed of the necessary precautions 
(secure storage of log files or otherwise no extensive logging).
</t>
<t>
The main issue that appears to concern members is the application's 
ability to downgrade security. But as we have seen it is not significantly 
harder for the application to attack DTLS-SRTP. The main advantage of DTLS-SRTP 
is the possibility to detect when a call is being intercepted. However, doing so 
requires an effort from the user and a certain degree of technical skill. 
It can also be questioned to what extent the application should be restricted from   
accessing media since this limits usability and innovativity.
</t>
<t>
Finally, it has been suggested that additional identity mechanisms could prevent 
the application from listening in on calls. While this is certainly true, any such 
mechanism would most likely be made optional. If that is the case or if an  
application can be its own identity provider, then we are back at the situation 
where the user has to decide which sites to trust.
</t>
</section>
</middle>

<back>
	<references title='Informative References'>
            &RFC5763;
            &RFC5764;
            &RFC4568;
            &I-D.ietf-rtcweb-overview;
            &I-D.ietf-rtcweb-use-cases-and-requirements;
            &I-D.ietf-rtcweb-security;

<reference anchor="I-D.kaplan-rtcweb-sip-interworking-requirements">
<front>
<title>Requirements for Interworking WebRTC with Current SIP Deployments</title>

<author initials="H.K" surname="Kaplan" fullname="Hadriel Kaplan">
    <organization>Acme Packet</organization>
</author>

<date month="November" day="22" year="2011"/>

</front>

<seriesInfo name="Internet-Draft" value="draft-kaplan-rtcweb-sip-interworking-requirements-02"/>
<format type="TXT" target="http://www.ietf.org/internet-drafts/draft-kaplan-rtcweb-sip-interworking-requirements-02.txt"/>
</reference>

<reference anchor="SIPit" target="https://www.sipit.net/SIPit27_Summary">
  <front>
    <title>SIPit27 Summary</title>
    <author/>
    <date/>
  </front>
</reference>

        </references>
</back>

</rfc>

