idnits 2.17.1 draft-ietf-rtcweb-security-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 12, 2012) is 4399 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.kaufman-rtcweb-security-ui' is defined on line 839, but no explicit reference was found in the text == Outdated reference: A later version (-20) exists of draft-ietf-rtcweb-security-arch-00 == Outdated reference: A later version (-01) exists of draft-rescorla-rtcweb-generic-idp-00 -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4347 (Obsoleted by RFC 6347) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTC-Web E. Rescorla 3 Internet-Draft RTFM, Inc. 4 Intended status: Standards Track March 12, 2012 5 Expires: September 13, 2012 7 Security Considerations for RTC-Web 8 draft-ietf-rtcweb-security-02 10 Abstract 12 The Real-Time Communications on the Web (RTC-Web) working group is 13 tasked with standardizing protocols for real-time communications 14 between Web browsers. The major use cases for RTC-Web technology are 15 real-time audio and/or video calls, Web conferencing, and direct data 16 transfer. Unlike most conventional real-time systems (e.g., SIP- 17 based soft phones) RTC-Web communications are directly controlled by 18 some Web server, which poses new security challenges. For instance, 19 a Web browser might expose a JavaScript API which allows a server to 20 place a video call. Unrestricted access to such an API would allow 21 any site which a user visited to "bug" a user's computer, capturing 22 any activity which passed in front of their camera. This document 23 defines the RTC-Web threat model and defines an architecture which 24 provides security within that threat model. 26 Legal 28 THIS DOCUMENT AND THE INFORMATION CONTAINED THEREIN ARE PROVIDED ON 29 AN "AS IS" BASIS AND THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 30 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 31 IETF TRUST, AND THE INTERNET ENGINEERING TASK FORCE, DISCLAIM ALL 32 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 33 WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE 34 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 35 FOR A PARTICULAR PURPOSE. 37 Status of this Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at http://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on September 13, 2012. 54 Copyright Notice 56 Copyright (c) 2012 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 This document may contain material from IETF Documents or IETF 70 Contributions published or made publicly available before November 71 10, 2008. The person(s) controlling the copyright in some of this 72 material may not have granted the IETF Trust the right to allow 73 modifications of such material outside the IETF Standards Process. 74 Without obtaining an adequate license from the person(s) controlling 75 the copyright in such materials, this document may not be modified 76 outside the IETF Standards Process, and derivative works of it may 77 not be created outside the IETF Standards Process, except to format 78 it for publication as an RFC or to translate it into languages other 79 than English. 81 Table of Contents 83 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 84 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 85 3. The Browser Threat Model . . . . . . . . . . . . . . . . . . . 5 86 3.1. Access to Local Resources . . . . . . . . . . . . . . . . 6 87 3.2. Same Origin Policy . . . . . . . . . . . . . . . . . . . . 6 88 3.3. Bypassing SOP: CORS, WebSockets, and consent to 89 communicate . . . . . . . . . . . . . . . . . . . . . . . 7 90 4. Security for RTC-Web Applications . . . . . . . . . . . . . . 7 91 4.1. Access to Local Devices . . . . . . . . . . . . . . . . . 8 92 4.1.1. Calling Scenarios and User Expectations . . . . . . . 8 93 4.1.1.1. Dedicated Calling Services . . . . . . . . . . . . 8 94 4.1.1.2. Calling the Site You're On . . . . . . . . . . . . 9 95 4.1.1.3. Calling to an Ad Target . . . . . . . . . . . . . 9 96 4.1.2. Origin-Based Security . . . . . . . . . . . . . . . . 10 97 4.1.3. Security Properties of the Calling Page . . . . . . . 11 98 4.2. Communications Consent Verification . . . . . . . . . . . 12 99 4.2.1. ICE . . . . . . . . . . . . . . . . . . . . . . . . . 12 100 4.2.2. Masking . . . . . . . . . . . . . . . . . . . . . . . 13 101 4.2.3. Backward Compatibility . . . . . . . . . . . . . . . . 13 102 4.2.4. IP Location Privacy . . . . . . . . . . . . . . . . . 14 103 4.3. Communications Security . . . . . . . . . . . . . . . . . 14 104 4.3.1. Protecting Against Retrospective Compromise . . . . . 15 105 4.3.2. Protecting Against During-Call Attack . . . . . . . . 16 106 4.3.2.1. Key Continuity . . . . . . . . . . . . . . . . . . 16 107 4.3.2.2. Short Authentication Strings . . . . . . . . . . . 17 108 4.3.2.3. Third Party Identity . . . . . . . . . . . . . . . 18 109 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18 110 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 111 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 112 7.1. Normative References . . . . . . . . . . . . . . . . . . . 19 113 7.2. Informative References . . . . . . . . . . . . . . . . . . 19 114 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 21 116 1. Introduction 118 The Real-Time Communications on the Web (RTC-Web) working group is 119 tasked with standardizing protocols for real-time communications 120 between Web browsers. The major use cases for RTC-Web technology are 121 real-time audio and/or video calls, Web conferencing, and direct data 122 transfer. Unlike most conventional real-time systems, (e.g., SIP- 123 based[RFC3261] soft phones) RTC-Web communications are directly 124 controlled by some Web server. A simple case is shown below. 126 +----------------+ 127 | | 128 | Web Server | 129 | | 130 +----------------+ 131 ^ ^ 132 / \ 133 HTTP / \ HTTP 134 / \ 135 / \ 136 v v 137 JS API JS API 138 +-----------+ +-----------+ 139 | | Media | | 140 | Browser |<---------->| Browser | 141 | | | | 142 +-----------+ +-----------+ 144 Figure 1: A simple RTC-Web system 146 In the system shown in Figure 1, Alice and Bob both have RTC-Web 147 enabled browsers and they visit some Web server which operates a 148 calling service. Each of their browsers exposes standardized 149 JavaScript calling APIs which are used by the Web server to set up a 150 call between Alice and Bob. While this system is topologically 151 similar to a conventional SIP-based system (with the Web server 152 acting as the signaling service and browsers acting as softphones), 153 control has moved to the central Web server; the browser simply 154 provides API points that are used by the calling service. As with 155 any Web application, the Web server can move logic between the server 156 and JavaScript in the browser, but regardless of where the code is 157 executing, it is ultimately under control of the server. 159 It should be immediately apparent that this type of system poses new 160 security challenges beyond those of a conventional VoIP system. In 161 particular, it needs to contend with malicious calling services. For 162 example, if the calling service can cause the browser to make a call 163 at any time to any callee of its choice, then this facility can be 164 used to bug a user's computer without their knowledge, simply by 165 placing a call to some recording service. More subtly, if the 166 exposed APIs allow the server to instruct the browser to send 167 arbitrary content, then they can be used to bypass firewalls or mount 168 denial of service attacks. Any successful system will need to be 169 resistant to this and other attacks. 171 A companion document [I-D.ietf-rtcweb-security-arch] describes a 172 security architecture intended to address the issues raised in this 173 document. 175 2. Terminology 177 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 178 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 179 document are to be interpreted as described in RFC 2119 [RFC2119]. 181 3. The Browser Threat Model 183 The security requirements for RTC-Web follow directly from the 184 requirement that the browser's job is to protect the user. Huang et 185 al. [huang-w2sp] summarize the core browser security guarantee as: 187 Users can safely visit arbitrary web sites and execute scripts 188 provided by those sites. 190 It is important to realize that this includes sites hosting arbitrary 191 malicious scripts. The motivation for this requirement is simple: 192 it is trivial for attackers to divert users to sites of their choice. 193 For instance, an attacker can purchase display advertisements which 194 direct the user (either automatically or via user clicking) to their 195 site, at which point the browser will execute the attacker's scripts. 196 Thus, it is important that it be safe to view arbitrarily malicious 197 pages. Of course, browsers inevitably have bugs which cause them to 198 fall short of this goal, but any new RTC-Web functionality must be 199 designed with the intent to meet this standard. The remainder of 200 this section provides more background on the existing Web security 201 model. 203 In this model, then, the browser acts as a TRUSTED COMPUTING BASE 204 (TCB) both from the user's perspective and to some extent from the 205 server's. While HTML and JS provided by the server can cause the 206 browser to execute a variety of actions, those scripts operate in a 207 sandbox that isolates them both from the user's computer and from 208 each other, as detailed below. 210 Conventionally, we refer to either WEB ATTACKERS, who are able to 211 induce you to visit their sites but do not control the network, and 212 NETWORK ATTACKERS, who are able to control your network. Network 213 attackers correspond to the [RFC3552] "Internet Threat Model". In 214 general, it is desirable to build a system which is secure against 215 both kinds of attackers, but realistically many sites do not run 216 HTTPS [RFC2818] and so our ability to defend against network 217 attackers is necessarily somewhat limited. Most of the rest of this 218 section is devoted to web attackers, with the assumption that 219 protection against network attackers is provided by running HTTPS. 221 3.1. Access to Local Resources 223 While the browser has access to local resources such as keying 224 material, files, the camera and the microphone, it strictly limits or 225 forbids web servers from accessing those same resources. For 226 instance, while it is possible to produce an HTML form which will 227 allow file upload, a script cannot do so without user consent and in 228 fact cannot even suggest a specific file (e.g., /etc/passwd); the 229 user must explicitly select the file and consent to its upload. 230 [Note: in many cases browsers are explicitly designed to avoid 231 dialogs with the semantics of "click here to screw yourself", as 232 extensive research shows that users are prone to consent under such 233 circumstances.] 235 Similarly, while Flash SWFs can access the camera and microphone, 236 they explicitly require that the user consent to that access. In 237 addition, some resources simply cannot be accessed from the browser 238 at all. For instance, there is no real way to run specific 239 executables directly from a script (though the user can of course be 240 induced to download executable files and run them). 242 3.2. Same Origin Policy 244 Many other resources are accessible but isolated. For instance, 245 while scripts are allowed to make HTTP requests via the 246 XMLHttpRequest() API those requests are not allowed to be made to any 247 server, but rather solely to the same ORIGIN from whence the script 248 came.[RFC6454] (although CORS [CORS] and WebSockets [RFC6455] 249 provides a escape hatch from this restriction, as described below.) 250 This SAME ORIGIN POLICY (SOP) prevents server A from mounting attacks 251 on server B via the user's browser, which protects both the user 252 (e.g., from misuse of his credentials) and the server (e.g., from DoS 253 attack). 255 More generally, SOP forces scripts from each site to run in their 256 own, isolated, sandboxes. While there are techniques to allow them 257 to interact, those interactions generally must be mutually consensual 258 (by each site) and are limited to certain channels. For instance, 259 multiple pages/browser panes from the same origin can read each 260 other's JS variables, but pages from the different origins--or even 261 iframes from different origins on the same page--cannot. 263 3.3. Bypassing SOP: CORS, WebSockets, and consent to communicate 265 While SOP serves an important security function, it also makes it 266 inconvenient to write certain classes of applications. In 267 particular, mash-ups, in which a script from origin A uses resources 268 from origin B, can only be achieved via a certain amount of hackery. 269 The W3C Cross-Origin Resource Sharing (CORS) spec [CORS] is a 270 response to this demand. In CORS, when a script from origin A 271 executes what would otherwise be a forbidden cross-origin request, 272 the browser instead contacts the target server to determine whether 273 it is willing to allow cross-origin requests from A. If it is so 274 willing, the browser then allows the request. This consent 275 verification process is designed to safely allow cross-origin 276 requests. 278 While CORS is designed to allow cross-origin HTTP requests, 279 WebSockets [RFC6455] allows cross-origin establishment of transparent 280 channels. Once a WebSockets connection has been established from a 281 script to a site, the script can exchange any traffic it likes 282 without being required to frame it as a series of HTTP request/ 283 response transactions. As with CORS, a WebSockets transaction starts 284 with a consent verification stage to avoid allowing scripts to simply 285 send arbitrary data to another origin. 287 While consent verification is conceptually simple--just do a 288 handshake before you start exchanging the real data--experience has 289 shown that designing a correct consent verification system is 290 difficult. In particular, Huang et al. [huang-w2sp] have shown 291 vulnerabilities in the existing Java and Flash consent verification 292 techniques and in a simplified version of the WebSockets handshake. 293 In particular, it is important to be wary of CROSS-PROTOCOL attacks 294 in which the attacking script generates traffic which is acceptable 295 to some non-Web protocol state machine. In order to resist this form 296 of attack, WebSockets incorporates a masking technique intended to 297 randomize the bits on the wire, thus making it more difficult to 298 generate traffic which resembles a given protocol. 300 4. Security for RTC-Web Applications 301 4.1. Access to Local Devices 303 As discussed in Section 1, allowing arbitrary sites to initiate calls 304 violates the core Web security guarantee; without some access 305 restrictions on local devices, any malicious site could simply bug a 306 user. At minimum, then, it MUST NOT be possible for arbitrary sites 307 to initiate calls to arbitrary locations without user consent. This 308 immediately raises the question, however, of what should be the scope 309 of user consent. 311 For the rest of this discussion we assume that the user is somehow 312 going to grant consent to some entity (e.g., a social networking 313 site) to initiate a call on his behalf. This consent may be limited 314 to a single call or may be a general consent. In order for the user 315 to make an intelligent decision about whether to allow a call (and 316 hence his camera and microphone input to be routed somewhere), he 317 must understand either who is requesting access, where the media is 318 going, or both. So, for instance, one might imagine that at the time 319 access to camera and microphone is requested, the user is shown a 320 dialog that says "site X has requested access to camera and 321 microphone, yes or no" (though note that this type of in-flow 322 interface violates one of the guidelines in Section 3). The user's 323 decision will of course be based on his opinion of Site X. However, 324 as discussed below, this is a complicated concept. 326 4.1.1. Calling Scenarios and User Expectations 328 While a large number of possible calling scenarios are possible, the 329 scenarios discussed in this section illustrate many of the 330 difficulties of identifying the relevant scope of consent. 332 4.1.1.1. Dedicated Calling Services 334 The first scenario we consider is a dedicated calling service. In 335 this case, the user has a relationship with a calling site and 336 repeatedly makes calls on it. It is likely that rather than having 337 to give permission for each call that the user will want to give the 338 calling service long-term access to the camera and microphone. This 339 is a natural fit for a long-term consent mechanism (e.g., installing 340 an app store "application" to indicate permission for the calling 341 service.) A variant of the dedicated calling service is a gaming 342 site (e.g., a poker site) which hosts a dedicated calling service to 343 allow players to call each other. 345 With any kind of service where the user may use the same service to 346 talk to many different people, there is a question about whether the 347 user can know who they are talking to. In general, this is difficult 348 as most of the user interface is presented by the calling site. 350 However, communications security mechanisms can be used to give some 351 assurance, as described in Section 4.3.2. 353 4.1.1.2. Calling the Site You're On 355 Another simple scenario is calling the site you're actually visiting. 356 The paradigmatic case here is the "click here to talk to a 357 representative" windows that appear on many shopping sites. In this 358 case, the user's expectation is that they are calling the site 359 they're actually visiting. However, it is unlikely that they want to 360 provide a general consent to such a site; just because I want some 361 information on a car doesn't mean that I want the car manufacturer to 362 be able to activate my microphone whenever they please. Thus, this 363 suggests the need for a second consent mechanism where I only grant 364 consent for the duration of a given call. As described in 365 Section 3.1, great care must be taken in the design of this interface 366 to avoid the users just clicking through. Note also that the user 367 interface chrome must clearly display elements showing that the call 368 is continuing in order to avoid attacks where the calling site just 369 leaves it up indefinitely but shows a Web UI that implies otherwise. 371 4.1.1.3. Calling to an Ad Target 373 In both of the previous cases, the user has a direct relationship 374 (though perhaps a transient one) with the target of the call. 375 Moreover, in both cases he is actually visiting the site of the 376 person he is being asked to trust. However, this is not always so. 377 Consider the case where a user is a visiting a content site which 378 hosts an advertisement with an invitation to call for more 379 information. When the user clicks the ad, they are connected with 380 the advertiser or their agent. 382 The relationships here are far more complicated: the site the user 383 is actually visiting has no direct relationship with the advertiser; 384 they are just hosting ads from an ad network. The user has no 385 relationship with the ad network, but desires one with the 386 advertiser, at least for long enough to learn about their products. 387 At minimum, then, whatever consent dialog is shown needs to allow the 388 user to have some idea of the organization that they are actually 389 calling. 391 However, because the user also has some relationship with the hosting 392 site, it is also arguable that the hosting site should be allowed to 393 express an opinion (e.g., to be able to allow or forbid a call) since 394 a bad experience with an advertiser reflect negatively on the hosting 395 site [this idea was suggested by Adam Barth]. However, this 396 obviously presents a privacy challenge, as sites which host 397 advertisements often learn very little about whether individual users 398 clicked through to the ads, or even which ads were presented. 400 4.1.2. Origin-Based Security 402 As discussed in Section 3.2, the basic unit of Web sandboxing is the 403 origin, and so it is natural to scope consent to origin. 404 Specifically, a script from origin A MUST only be allowed to initiate 405 communications (and hence to access camera and microphone) if the 406 user has specifically authorized access for that origin. It is of 407 course technically possible to have coarser-scoped permissions, but 408 because the Web model is scoped to origin, this creates a difficult 409 mismatch. 411 Arguably, origin is not fine-grained enough. Consider the situation 412 where Alice visits a site and authorizes it to make a single call. 413 If consent is expressed solely in terms of origin, then at any future 414 visit to that site (including one induced via mash-up or ad network), 415 the site can bug Alice's computer, use the computer to place bogus 416 calls, etc. While in principle Alice could grant and then revoke the 417 privilege, in practice privileges accumulate; if we are concerned 418 about this attack, something else is needed. There are a number of 419 potential countermeasures to this sort of issue. 421 Individual Consent 422 Ask the user for permission for each call. 424 Callee-oriented Consent 425 Only allow calls to a given user. 427 Cryptographic Consent 428 Only allow calls to a given set of peer keying material or to a 429 cryptographically established identity. 431 Unfortunately, none of these approaches is satisfactory for all 432 cases. As discussed above, individual consent puts the user's 433 approval in the UI flow for every call. Not only does this quickly 434 become annoying but it can train the user to simply click "OK", at 435 which point the consent becomes useless. Thus, while it may be 436 necessary to have individual consent in some case, this is not a 437 suitable solution for (for instance) the calling service case. Where 438 necessary, in-flow user interfaces must be carefully designed to 439 avoid the risk of the user blindly clicking through. 441 The other two options are designed to restrict calls to a given 442 target. Callee-oriented consent provided by the calling site not 443 work well because a malicious site can claim that the user is calling 444 any user of his choice. One fix for this is to tie calls to a 445 cryptographically established identity. While not suitable for all 446 cases, this approach may be useful for some. If we consider the 447 advertising case described in Section 4.1.1.3, it's not particularly 448 convenient to require the advertiser to instantiate an iframe on the 449 hosting site just to get permission; a more convenient approach is to 450 cryptographically tie the advertiser's certificate to the 451 communication directly. We're still tying permissions to origin 452 here, but to the media origin (and-or destination) rather than to the 453 Web origin. [I-D.ietf-rtcweb-security-arch] and 454 [I-D.rescorla-rtcweb-generic-idp] describe mechanisms which 455 facilitate this sort of consent. 457 Another case where media-level cryptographic identity makes sense is 458 when a user really does not trust the calling site. For instance, I 459 might be worried that the calling service will attempt to bug my 460 computer, but I also want to be able to conveniently call my friends. 461 If consent is tied to particular communications endpoints, then my 462 risk is limited. Naturally, it is somewhat challenging to design UI 463 primitives which express this sort of policy. 465 4.1.3. Security Properties of the Calling Page 467 Origin-based security is intended to secure against web attackers. 468 However, we must also consider the case of network attackers. 469 Consider the case where I have granted permission to a calling 470 service by an origin that has the HTTP scheme, e.g., 471 http://calling-service.example.com. If I ever use my computer on an 472 unsecured network (e.g., a hotspot or if my own home wireless network 473 is insecure), and browse any HTTP site, then an attacker can bug my 474 computer. The attack proceeds like this: 476 1. I connect to http://anything.example.org/. Note that this site 477 is unaffiliated with the calling service. 478 2. The attacker modifies my HTTP connection to inject an IFRAME (or 479 a redirect) to http://calling-service.example.com 480 3. The attacker forges the response apparently 481 http://calling-service.example.com/ to inject JS to initiate a 482 call to himself. 484 Note that this attack does not depend on the media being insecure. 485 Because the call is to the attacker, it is also encrypted to him. 486 Moreover, it need not be executed immediately; the attacker can 487 "infect" the origin semi-permanently (e.g., with a web worker or a 488 popunder) and thus be able to bug me long after I have left the 489 infected network. This risk is created by allowing calls at all from 490 a page fetched over HTTP. 492 Even if calls are only possible from HTTPS sites, if the site embeds 493 active content (e.g., JavaScript) that is fetched over HTTP or from 494 an untrusted site, because that JavaScript is executed in the 495 security context of the page [finer-grained]. Thus, it is also 496 dangerous to allow RTC-Web functionality from HTTPS origins that 497 embed mixed content. Note: this issue is not restricted to PAGES 498 which contain mixed content. If a page from a given origin ever 499 loads mixed content then it is possible for a network attacker to 500 infect the browser's notion of that origin semi-permanently. 502 4.2. Communications Consent Verification 504 As discussed in Section 3.3, allowing web applications unrestricted 505 network access via the browser introduces the risk of using the 506 browser as an attack platform against machines which would not 507 otherwise be accessible to the malicious site, for instance because 508 they are topologically restricted (e.g., behind a firewall or NAT). 509 In order to prevent this form of attack as well as cross-protocol 510 attacks it is important to require that the target of traffic 511 explicitly consent to receiving the traffic in question. Until that 512 consent has been verified for a given endpoint, traffic other than 513 the consent handshake MUST NOT be sent to that endpoint. 515 4.2.1. ICE 517 Verifying receiver consent requires some sort of explicit handshake, 518 but conveniently we already need one in order to do NAT hole- 519 punching. ICE [RFC5245] includes a handshake designed to verify that 520 the receiving element wishes to receive traffic from the sender. It 521 is important to remember here that the site initiating ICE is 522 presumed malicious; in order for the handshake to be secure the 523 receiving element MUST demonstrate receipt/knowledge of some value 524 not available to the site (thus preventing the site from forging 525 responses). In order to achieve this objective with ICE, the STUN 526 transaction IDs must be generated by the browser and MUST NOT be made 527 available to the initiating script, even via a diagnostic interface. 528 Verifying receiver consent also requires verifying the receiver wants 529 to receive traffic from a particular sender, and at this time; for 530 example a malicious site may simply attempt ICE to known servers that 531 are using ICE for other sessions. ICE provides this verification as 532 well, by using the STUN credentials as a form of per-session shared 533 secret. Those credentials are known to the Web application, but 534 would need to also be known and used by the STUN-receiving element to 535 be useful. 537 There also needs to be some mechanism for the browser to verify that 538 the target of the traffic continues to wish to receive it. 539 Obviously, some ICE-based mechanism will work here, but it has been 540 observed that because ICE keepalives are indications, they will not 541 work here, so some other mechanism is needed. 543 4.2.2. Masking 545 Once consent is verified, there still is some concern about 546 misinterpretation attacks as described by Huang et al.[huang-w2sp]. 547 As long as communication is limited to UDP, then this risk is 548 probably limited, thus masking is not required for UDP. I.e., once 549 communications consent has been verified, it is most likely safe to 550 allow the implementation to send arbitrary UDP traffic to the chosen 551 destination, provided that the STUN keepalives continue to succeed. 552 In particular, this is true for the data channel if DTLS is used 553 because DTLS (with the anti-chosen plaintext mechanisms required by 554 TLS 1.1) does not allow the attacker to generate predictable 555 ciphertext. However, with TCP the risk of transparent proxies 556 becomes much more severe. If TCP is to be used, then WebSockets 557 style masking MUST be employed. [Note: current thinking in the 558 RTCWEB WG is not to support TCP and to support SCTP over DTLS, thus 559 removing the need for masking.] 561 4.2.3. Backward Compatibility 563 A requirement to use ICE limits compatibility with legacy non-ICE 564 clients. It seems unsafe to completely remove the requirement for 565 some check. All proposed checks have the common feature that the 566 browser sends some message to the candidate traffic recipient and 567 refuses to send other traffic until that message has been replied to. 568 The message/reply pair must be generated in such a way that an 569 attacker who controls the Web application cannot forge them, 570 generally by having the message contain some secret value that must 571 be incorporated (e.g., echoed, hashed into, etc.). Non-ICE 572 candidates for this role (in cases where the legacy endpoint has a 573 public address) include: 575 o STUN checks without using ICE (i.e., the non-RTC-web endpoint sets 576 up a STUN responder.) 577 o Use or RTCP as an implicit reachability check. 579 In the RTCP approach, the RTC-Web endpoint is allowed to send a 580 limited number of RTP packets prior to receiving consent. This 581 allows a short window of attack. In addition, some legacy endpoints 582 do not support RTCP, so this is a much more expensive solution for 583 such endpoints, for which it would likely be easier to implement ICE. 584 For these two reasons, an RTCP-based approach does not seem to 585 address the security issue satisfactorily. 587 In the STUN approach, the RTC-Web endpoint is able to verify that the 588 recipient is running some kind of STUN endpoint but unless the STUN 589 responder is integrated with the ICE username/password establishment 590 system, the RTC-Web endpoint cannot verify that the recipient 591 consents to this particular call. This may be an issue if existing 592 STUN servers are operated at addresses that are not able to handle 593 bandwidth-based attacks. Thus, this approach does not seem 594 satisfactory either. 596 If the systems are tightly integrated (i.e., the STUN endpoint 597 responds with responses authenticated with ICE credentials) then this 598 issue does not exist. However, such a design is very close to an 599 ICE-Lite implementation (indeed, arguably is one). An intermediate 600 approach would be to have a STUN extension that indicated that one 601 was responding to RTC-Web checks but not computing integrity checks 602 based on the ICE credentials. This would allow the use of standalone 603 STUN servers without the risk of confusing them with legacy STUN 604 servers. If a non-ICE legacy solution is needed, then this is 605 probably the best choice. 607 Once initial consent is verified, we also need to verify continuing 608 consent, in order to avoid attacks where two people briefly share an 609 IP (e.g., behind a NAT in an Internet cafe) and the attacker arranges 610 for a large, unstoppable, traffic flow to the network and then 611 leaves. The appropriate technologies here are fairly similar to 612 those for initial consent, though are perhaps weaker since the 613 threats is less severe. 615 4.2.4. IP Location Privacy 617 Note that as soon as the callee sends their ICE candidates, the 618 callee learns the callee's IP addresses. The callee's server 619 reflexive address reveals a lot of information about the callee's 620 location. In order to avoid tracking, implementations may wish to 621 suppress the start of ICE negotiation until the callee has answered. 622 In addition, either side may wish to hide their location entirely by 623 forcing all traffic through a TURN server. 625 4.3. Communications Security 627 Finally, we consider a problem familiar from the SIP world: 628 communications security. For obvious reasons, it MUST be possible 629 for the communicating parties to establish a channel which is secure 630 against both message recovery and message modification. (See 631 [RFC5479] for more details.) This service must be provided for both 632 data and voice/video. Ideally the same security mechanisms would be 633 used for both types of content. Technology for providing this 634 service (for instance, DTLS [RFC4347] and DTLS-SRTP [RFC5763]) is 635 well understood. However, we must examine this technology to the 636 RTC-Web context, where the threat model is somewhat different. 638 In general, it is important to understand that unlike a conventional 639 SIP proxy, the calling service (i.e., the Web server) controls not 640 only the channel between the communicating endpoints but also the 641 application running on the user's browser. While in principle it is 642 possible for the browser to cut the calling service out of the loop 643 and directly present trusted information (and perhaps get consent), 644 practice in modern browsers is to avoid this whenever possible. "In- 645 flow" modal dialogs which require the user to consent to specific 646 actions are particularly disfavored as human factors research 647 indicates that unless they are made extremely invasive, users simply 648 agree to them without actually consciously giving consent. 649 [abarth-rtcweb]. Thus, nearly all the UI will necessarily be 650 rendered by the browser but under control of the calling service. 651 This likely includes the peer's identity information, which, after 652 all, is only meaningful in the context of some calling service. 654 This limitation does not mean that preventing attack by the calling 655 service is completely hopeless. However, we need to distinguish 656 between two classes of attack: 658 Retrospective compromise of calling service. 659 The calling service is is non-malicious during a call but 660 subsequently is compromised and wishes to attack an older call. 662 During-call attack by calling service. 663 The calling service is compromised during the call it wishes to 664 attack. 666 Providing security against the former type of attack is practical 667 using the techniques discussed in Section 4.3.1. However, it is 668 extremely difficult to prevent a trusted but malicious calling 669 service from actively attacking a user's calls, either by mounting a 670 MITM attack or by diverting them entirely. (Note that this attack 671 applies equally to a network attacker if communications to the 672 calling service are not secured.) We discuss some potential 673 approaches and why they are likely to be impractical in 674 Section 4.3.2. 676 4.3.1. Protecting Against Retrospective Compromise 678 In a retrospective attack, the calling service was uncompromised 679 during the call, but that an attacker subsequently wants to recover 680 the content of the call. We assume that the attacker has access to 681 the protected media stream as well as having full control of the 682 calling service. 684 If the calling service has access to the traffic keying material (as 685 in SDES [RFC4568]), then retrospective attack is trivial. This form 686 of attack is particularly serious in the Web context because it is 687 standard practice in Web services to run extensive logging and 688 monitoring. Thus, it is highly likely that if the traffic key is 689 part of any HTTP request it will be logged somewhere and thus subject 690 to subsequent compromise. It is this consideration that makes an 691 automatic, public key-based key exchange mechanism imperative for 692 RTC-Web (this is a good idea for any communications security system) 693 and this mechanism SHOULD provide perfect forward secrecy (PFS). The 694 signaling channel/calling service can be used to authenticate this 695 mechanism. 697 In addition, the system MUST NOT provide any APIs to extract either 698 long-term keying material or to directly access any stored traffic 699 keys. Otherwise, an attacker who subsequently compromised the 700 calling service might be able to use those APIs to recover the 701 traffic keys and thus compromise the traffic. 703 4.3.2. Protecting Against During-Call Attack 705 Protecting against attacks during a call is a more difficult 706 proposition. Even if the calling service cannot directly access 707 keying material (as recommended in the previous section), it can 708 simply mount a man-in-the-middle attack on the connection, telling 709 Alice that she is calling Bob and Bob that he is calling Alice, while 710 in fact the calling service is acting as a calling bridge and 711 capturing all the traffic. While in theory it is possible to 712 construct techniques which protect against this form of attack, in 713 practice these techniques all require far too much user intervention 714 to be practical, given the user interface constraints described in 715 [abarth-rtcweb]. 717 4.3.2.1. Key Continuity 719 One natural approach is to use "key continuity". While a malicious 720 calling service can present any identity it chooses to the user, it 721 cannot produce a private key that maps to a given public key. Thus, 722 it is possible for the browser to note a given user's public key and 723 generate an alarm whenever that user's key changes. SSH [RFC4251] 724 uses a similar technique. (Note that the need to avoid explicit user 725 consent on every call precludes the browser requiring an immediate 726 manual check of the peer's key). 728 Unfortunately, this sort of key continuity mechanism is far less 729 useful in the RTC-Web context. First, much of the virtue of RTC-Web 730 (and any Web application) is that it is not bound to particular piece 731 of client software. Thus, it will be not only possible but routine 732 for a user to use multiple browsers on different computers which will 733 of course have different keying material (SACRED [RFC3760] 734 notwithstanding.) Thus, users will frequently be alerted to key 735 mismatches which are in fact completely legitimate, with the result 736 that they are trained to simply click through them. As it is known 737 that users routinely will click through far more dire warnings 738 [cranor-wolf], it seems extremely unlikely that any key continuity 739 mechanism will be effective rather than simply annoying. 741 Moreover, it is trivial to bypass even this kind of mechanism. 742 Recall that unlike the case of SSH, the browser never directly gets 743 the peer's identity from the user. Rather, it is provided by the 744 calling service. Even enabling a mechanism of this type would 745 require an API to allow the calling service to tell the browser "this 746 is a call to user X". All the calling service needs to do to avoid 747 triggering a key continuity warning is to tell the browser that "this 748 is a call to user Y" where Y is close to X. Even if the user actually 749 checks the other side's name (which all available evidence indicates 750 is unlikely), this would require (a) the browser to trusted UI to 751 provide the name and (b) the user to not be fooled by similar 752 appearing names. 754 4.3.2.2. Short Authentication Strings 756 ZRTP [RFC6189] uses a "short authentication string" (SAS) which is 757 derived from the key agreement protocol. This SAS is designed to be 758 read over the voice channel and if confirmed by both sides precludes 759 MITM attack. The intention is that the SAS is used once and then key 760 continuity (though a different mechanism from that discussed above) 761 is used thereafter. 763 Unfortunately, the SAS does not offer a practical solution to the 764 problem of a compromised calling service. "Voice conversion" 765 systems, which modify voice from one speaker to make it sound like 766 another, are an active area of research. These systems are already 767 good enough to fool both automatic recognition systems 768 [farus-conversion] and humans [kain-conversion] in many cases, and 769 are of course likely to improve in future, especially in an 770 environment where the user just wants to get on with the phone call. 771 Thus, even if SAS is effective today, it is likely not to be so for 772 much longer. Moreover, it is possible for an attacker who controls 773 the browser to allow the SAS to succeed and then simulate call 774 failure and reconnect, trusting that the user will not notice that 775 the "no SAS" indicator has been set (which seems likely). 777 Even were SAS secure if used, it seems exceedingly unlikely that 778 users will actually use it. As discussed above, the browser UI 779 constraints preclude requiring the SAS exchange prior to completing 780 the call and so it must be voluntary; at most the browser will 781 provide some UI indicator that the SAS has not yet been checked. 782 However, it it is well-known that when faced with optional mechanisms 783 such as fingerprints, users simply do not check them [whitten-johnny] 784 Thus, it is highly unlikely that users will ever perform the SAS 785 exchange. 787 Once uses have checked the SAS once, key continuity is required to 788 avoid them needing to check it on every call. However, this is 789 problematic for reasons indicated in Section 4.3.2.1. In principle 790 it is of course possible to render a different UI element to indicate 791 that calls are using an unauthenticated set of keying material 792 (recall that the attacker can just present a slightly different name 793 so that the attack shows the same UI as a call to a new device or to 794 someone you haven't called before) but as a practical matter, users 795 simply ignore such indicators even in the rather more dire case of 796 mixed content warnings. 798 4.3.2.3. Third Party Identity 800 The conventional approach to providing communications identity has of 801 course been to have some third party identity system (e.g., PKI) to 802 authenticate the endpoints. Such mechanisms have proven to be too 803 cumbersome for use by typical users (and nearly too cumbersome for 804 administrators). However, a new generation of Web-based identity 805 providers (BrowserID, Federated Google Login, Facebook Connect, 806 OAuth, OpenID, WebFinger), has recently been developed and use Web 807 technologies to provide lightweight (from the user's perspective) 808 third-party authenticated transactions. It is possible (see 809 [I-D.rescorla-rtcweb-generic-idp]) to use systems of this type to 810 authenticate RTCWEB calls, linking them to existing user notions of 811 identity (e.g., Facebook adjacencies). Calls which are authenticated 812 in this fashion are naturally resistant even to active MITM attack by 813 the calling site. 815 5. Security Considerations 817 This entire document is about security. 819 6. Acknowledgements 821 Bernard Aboba, Harald Alvestrand, Cullen Jennings, Hadriel Kaplan (S 822 4.2.1), Matthew Kaufman, Magnus Westerland. 824 7. References 825 7.1. Normative References 827 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 828 Requirement Levels", BCP 14, RFC 2119, March 1997. 830 7.2. Informative References 832 [CORS] van Kesteren, A., "Cross-Origin Resource Sharing". 834 [I-D.ietf-rtcweb-security-arch] 835 Rescorla, E., "RTCWEB Security Architecture", 836 draft-ietf-rtcweb-security-arch-00 (work in progress), 837 January 2012. 839 [I-D.kaufman-rtcweb-security-ui] 840 Kaufman, M., "Client Security User Interface Requirements 841 for RTCWEB", draft-kaufman-rtcweb-security-ui-00 (work in 842 progress), June 2011. 844 [I-D.rescorla-rtcweb-generic-idp] 845 Rescorla, E., "RTCWEB Generic Identity Provider 846 Interface", draft-rescorla-rtcweb-generic-idp-00 (work in 847 progress), January 2012. 849 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. 851 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 852 A., Peterson, J., Sparks, R., Handley, M., and E. 853 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 854 June 2002. 856 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 857 Text on Security Considerations", BCP 72, RFC 3552, 858 July 2003. 860 [RFC3760] Gustafson, D., Just, M., and M. Nystrom, "Securely 861 Available Credentials (SACRED) - Credential Server 862 Framework", RFC 3760, April 2004. 864 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 865 Protocol Architecture", RFC 4251, January 2006. 867 [RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer 868 Security", RFC 4347, April 2006. 870 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 871 Description Protocol (SDP) Security Descriptions for Media 872 Streams", RFC 4568, July 2006. 874 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 875 (ICE): A Protocol for Network Address Translator (NAT) 876 Traversal for Offer/Answer Protocols", RFC 5245, 877 April 2010. 879 [RFC5479] Wing, D., Fries, S., Tschofenig, H., and F. Audet, 880 "Requirements and Analysis of Media Security Management 881 Protocols", RFC 5479, April 2009. 883 [RFC5763] Fischl, J., Tschofenig, H., and E. Rescorla, "Framework 884 for Establishing a Secure Real-time Transport Protocol 885 (SRTP) Security Context Using Datagram Transport Layer 886 Security (DTLS)", RFC 5763, May 2010. 888 [RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media 889 Path Key Agreement for Unicast Secure RTP", RFC 6189, 890 April 2011. 892 [RFC6454] Barth, A., "The Web Origin Concept", RFC 6454, 893 December 2011. 895 [RFC6455] Fette, I. and A. Melnikov, "The WebSocket Protocol", 896 RFC 6455, December 2011. 898 [abarth-rtcweb] 899 Barth, A., "Prompting the user is security failure", RTC- 900 Web Workshop. 902 [cranor-wolf] 903 Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., and 904 L. cranor, "Crying Wolf: An Empirical Study of SSL Warning 905 Effectiveness", Proceedings of the 18th USENIX Security 906 Symposium, 2009. 908 [farus-conversion] 909 Farrus, M., Erro, D., and J. Hernando, "Speaker 910 Recognition Robustness to Voice Conversion". 912 [finer-grained] 913 Barth, A. and C. Jackson, "Beware of Finer-Grained 914 Origins", W2SP, 2008. 916 [huang-w2sp] 917 Huang, L-S., Chen, E., Barth, A., Rescorla, E., and C. 918 Jackson, "Talking to Yourself for Fun and Profit", W2SP, 919 2011. 921 [kain-conversion] 922 Kain, A. and M. Macon, "Design and Evaluation of a Voice 923 Conversion Algorithm based on Spectral Envelope Mapping 924 and Residual Prediction", Proceedings of ICASSP, May 925 2001. 927 [whitten-johnny] 928 Whitten, A. and J. Tygar, "Why Johnny Can't Encrypt: A 929 Usability Evaluation of PGP 5.0", Proceedings of the 8th 930 USENIX Security Symposium, 1999. 932 Author's Address 934 Eric Rescorla 935 RTFM, Inc. 936 2064 Edgewood Drive 937 Palo Alto, CA 94303 938 USA 940 Phone: +1 650 678 2350 941 Email: ekr@rtfm.com