idnits 2.17.1 draft-ietf-rtcweb-security-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 05, 2012) is 4340 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.kaufman-rtcweb-security-ui' is defined on line 917, but no explicit reference was found in the text == Unused Reference: 'RFC2818' is defined on line 927, but no explicit reference was found in the text == Outdated reference: A later version (-20) exists of draft-ietf-rtcweb-security-arch-01 -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4347 (Obsoleted by RFC 6347) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTC-Web E. Rescorla 3 Internet-Draft RTFM, Inc. 4 Intended status: Standards Track June 05, 2012 5 Expires: December 7, 2012 7 Security Considerations for RTC-Web 8 draft-ietf-rtcweb-security-03 10 Abstract 12 The Real-Time Communications on the Web (RTC-Web) working group is 13 tasked with standardizing protocols for real-time communications 14 between Web browsers. The major use cases for RTC-Web technology are 15 real-time audio and/or video calls, Web conferencing, and direct data 16 transfer. Unlike most conventional real-time systems (e.g., SIP- 17 based soft phones) RTC-Web communications are directly controlled by 18 some Web server, which poses new security challenges. For instance, 19 a Web browser might expose a JavaScript API which allows a server to 20 place a video call. Unrestricted access to such an API would allow 21 any site which a user visited to "bug" a user's computer, capturing 22 any activity which passed in front of their camera. This document 23 defines the RTC-Web threat model and defines an architecture which 24 provides security within that threat model. 26 Legal 28 THIS DOCUMENT AND THE INFORMATION CONTAINED THEREIN ARE PROVIDED ON 29 AN "AS IS" BASIS AND THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 30 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 31 IETF TRUST, AND THE INTERNET ENGINEERING TASK FORCE, DISCLAIM ALL 32 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 33 WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE 34 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 35 FOR A PARTICULAR PURPOSE. 37 Status of this Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at http://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on December 7, 2012. 54 Copyright Notice 56 Copyright (c) 2012 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 This document may contain material from IETF Documents or IETF 70 Contributions published or made publicly available before November 71 10, 2008. The person(s) controlling the copyright in some of this 72 material may not have granted the IETF Trust the right to allow 73 modifications of such material outside the IETF Standards Process. 74 Without obtaining an adequate license from the person(s) controlling 75 the copyright in such materials, this document may not be modified 76 outside the IETF Standards Process, and derivative works of it may 77 not be created outside the IETF Standards Process, except to format 78 it for publication as an RFC or to translate it into languages other 79 than English. 81 Table of Contents 83 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 84 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 85 3. The Browser Threat Model . . . . . . . . . . . . . . . . . . . 5 86 3.1. Access to Local Resources . . . . . . . . . . . . . . . . 6 87 3.2. Same Origin Policy . . . . . . . . . . . . . . . . . . . . 6 88 3.3. Bypassing SOP: CORS, WebSockets, and consent to 89 communicate . . . . . . . . . . . . . . . . . . . . . . . 7 90 4. Security for RTC-Web Applications . . . . . . . . . . . . . . 7 91 4.1. Access to Local Devices . . . . . . . . . . . . . . . . . 7 92 4.1.1. Calling Scenarios and User Expectations . . . . . . . 8 93 4.1.1.1. Dedicated Calling Services . . . . . . . . . . . . 9 94 4.1.1.2. Calling the Site You're On . . . . . . . . . . . . 9 95 4.1.1.3. Calling to an Ad Target . . . . . . . . . . . . . 10 96 4.1.2. Origin-Based Security . . . . . . . . . . . . . . . . 10 97 4.1.3. Security Properties of the Calling Page . . . . . . . 12 98 4.2. Communications Consent Verification . . . . . . . . . . . 12 99 4.2.1. ICE . . . . . . . . . . . . . . . . . . . . . . . . . 13 100 4.2.2. Masking . . . . . . . . . . . . . . . . . . . . . . . 13 101 4.2.3. Backward Compatibility . . . . . . . . . . . . . . . . 14 102 4.2.4. IP Location Privacy . . . . . . . . . . . . . . . . . 15 103 4.3. Communications Security . . . . . . . . . . . . . . . . . 15 104 4.3.1. Protecting Against Retrospective Compromise . . . . . 16 105 4.3.2. Protecting Against During-Call Attack . . . . . . . . 17 106 4.3.2.1. Key Continuity . . . . . . . . . . . . . . . . . . 17 107 4.3.2.2. Short Authentication Strings . . . . . . . . . . . 18 108 4.3.2.3. Third Party Identity . . . . . . . . . . . . . . . 19 109 4.3.2.4. Page Access to Media . . . . . . . . . . . . . . . 19 110 5. Security Considerations . . . . . . . . . . . . . . . . . . . 20 111 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 112 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 113 7.1. Normative References . . . . . . . . . . . . . . . . . . . 20 114 7.2. Informative References . . . . . . . . . . . . . . . . . . 20 115 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22 117 1. Introduction 119 The Real-Time Communications on the Web (RTC-Web) working group is 120 tasked with standardizing protocols for real-time communications 121 between Web browsers. The major use cases for RTC-Web technology are 122 real-time audio and/or video calls, Web conferencing, and direct data 123 transfer. Unlike most conventional real-time systems, (e.g., SIP- 124 based[RFC3261] soft phones) RTC-Web communications are directly 125 controlled by some Web server. A simple case is shown below. 127 +----------------+ 128 | | 129 | Web Server | 130 | | 131 +----------------+ 132 ^ ^ 133 / \ 134 HTTP / \ HTTP 135 / \ 136 / \ 137 v v 138 JS API JS API 139 +-----------+ +-----------+ 140 | | Media | | 141 | Browser |<---------->| Browser | 142 | | | | 143 +-----------+ +-----------+ 145 Figure 1: A simple RTC-Web system 147 In the system shown in Figure 1, Alice and Bob both have RTC-Web 148 enabled browsers and they visit some Web server which operates a 149 calling service. Each of their browsers exposes standardized 150 JavaScript calling APIs (implementated as browser built-ins) which 151 are used by the Web server to set up a call between Alice and Bob. 152 While this system is topologically similar to a conventional SIP- 153 based system (with the Web server acting as the signaling service and 154 browsers acting as softphones), control has moved to the central Web 155 server; the browser simply provides API points that are used by the 156 calling service. As with any Web application, the Web server can 157 move logic between the server and JavaScript in the browser, but 158 regardless of where the code is executing, it is ultimately under 159 control of the server. 161 It should be immediately apparent that this type of system poses new 162 security challenges beyond those of a conventional VoIP system. In 163 particular, it needs to contend with malicious calling services. For 164 example, if the calling service can cause the browser to make a call 165 at any time to any callee of its choice, then this facility can be 166 used to bug a user's computer without their knowledge, simply by 167 placing a call to some recording service. More subtly, if the 168 exposed APIs allow the server to instruct the browser to send 169 arbitrary content, then they can be used to bypass firewalls or mount 170 denial of service attacks. Any successful system will need to be 171 resistant to this and other attacks. 173 A companion document [I-D.ietf-rtcweb-security-arch] describes a 174 security architecture intended to address the issues raised in this 175 document. 177 2. Terminology 179 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 180 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 181 document are to be interpreted as described in RFC 2119 [RFC2119]. 183 3. The Browser Threat Model 185 The security requirements for RTC-Web follow directly from the 186 requirement that the browser's job is to protect the user. Huang et 187 al. [huang-w2sp] summarize the core browser security guarantee as: 189 Users can safely visit arbitrary web sites and execute scripts 190 provided by those sites. 192 It is important to realize that this includes sites hosting arbitrary 193 malicious scripts. The motivation for this requirement is simple: 194 it is trivial for attackers to divert users to sites of their choice. 195 For instance, an attacker can purchase display advertisements which 196 direct the user (either automatically or via user clicking) to their 197 site, at which point the browser will execute the attacker's scripts. 198 Thus, it is important that it be safe to view arbitrarily malicious 199 pages. Of course, browsers inevitably have bugs which cause them to 200 fall short of this goal, but any new RTC-Web functionality must be 201 designed with the intent to meet this standard. The remainder of 202 this section provides more background on the existing Web security 203 model. 205 In this model, then, the browser acts as a TRUSTED COMPUTING BASE 206 (TCB) both from the user's perspective and to some extent from the 207 server's. While HTML and JS provided by the server can cause the 208 browser to execute a variety of actions, those scripts operate in a 209 sandbox that isolates them both from the user's computer and from 210 each other, as detailed below. 212 Conventionally, we refer to either WEB ATTACKERS, who are able to 213 induce you to visit their sites but do not control the network, and 214 NETWORK ATTACKERS, who are able to control your network. Network 215 attackers correspond to the [RFC3552] "Internet Threat Model". Note 216 that for HTTP traffic, a network attacker is also a Web attacker, 217 since it can inject traffic as if it were any non-HTTPS Web site. 218 Thus, when analyzing HTTP connections, we must assume that traffic is 219 going to the attacker. 221 3.1. Access to Local Resources 223 While the browser has access to local resources such as keying 224 material, files, the camera and the microphone, it strictly limits or 225 forbids web servers from accessing those same resources. For 226 instance, while it is possible to produce an HTML form which will 227 allow file upload, a script cannot do so without user consent and in 228 fact cannot even suggest a specific file (e.g., /etc/passwd); the 229 user must explicitly select the file and consent to its upload. 230 [Note: in many cases browsers are explicitly designed to avoid 231 dialogs with the semantics of "click here to screw yourself", as 232 extensive research shows that users are prone to consent under such 233 circumstances.] 235 Similarly, while Flash SWFs can access the camera and microphone, 236 they explicitly require that the user consent to that access. In 237 addition, some resources simply cannot be accessed from the browser 238 at all. For instance, there is no real way to run specific 239 executables directly from a script (though the user can of course be 240 induced to download executable files and run them). 242 3.2. Same Origin Policy 244 Many other resources are accessible but isolated. For instance, 245 while scripts are allowed to make HTTP requests via the 246 XMLHttpRequest() API those requests are not allowed to be made to any 247 server, but rather solely to the same ORIGIN from whence the script 248 came.[RFC6454] (although CORS [CORS] and WebSockets [RFC6455] 249 provides a escape hatch from this restriction, as described below.) 250 This SAME ORIGIN POLICY (SOP) prevents server A from mounting attacks 251 on server B via the user's browser, which protects both the user 252 (e.g., from misuse of his credentials) and the server (e.g., from DoS 253 attack). 255 More generally, SOP forces scripts from each site to run in their 256 own, isolated, sandboxes. While there are techniques to allow them 257 to interact, those interactions generally must be mutually consensual 258 (by each site) and are limited to certain channels. For instance, 259 multiple pages/browser panes from the same origin can read each 260 other's JS variables, but pages from the different origins--or even 261 iframes from different origins on the same page--cannot. 263 3.3. Bypassing SOP: CORS, WebSockets, and consent to communicate 265 While SOP serves an important security function, it also makes it 266 inconvenient to write certain classes of applications. In 267 particular, mash-ups, in which a script from origin A uses resources 268 from origin B, can only be achieved via a certain amount of hackery. 269 The W3C Cross-Origin Resource Sharing (CORS) spec [CORS] is a 270 response to this demand. In CORS, when a script from origin A 271 executes what would otherwise be a forbidden cross-origin request, 272 the browser instead contacts the target server to determine whether 273 it is willing to allow cross-origin requests from A. If it is so 274 willing, the browser then allows the request. This consent 275 verification process is designed to safely allow cross-origin 276 requests. 278 While CORS is designed to allow cross-origin HTTP requests, 279 WebSockets [RFC6455] allows cross-origin establishment of transparent 280 channels. Once a WebSockets connection has been established from a 281 script to a site, the script can exchange any traffic it likes 282 without being required to frame it as a series of HTTP request/ 283 response transactions. As with CORS, a WebSockets transaction starts 284 with a consent verification stage to avoid allowing scripts to simply 285 send arbitrary data to another origin. 287 While consent verification is conceptually simple--just do a 288 handshake before you start exchanging the real data--experience has 289 shown that designing a correct consent verification system is 290 difficult. In particular, Huang et al. [huang-w2sp] have shown 291 vulnerabilities in the existing Java and Flash consent verification 292 techniques and in a simplified version of the WebSockets handshake. 293 In particular, it is important to be wary of CROSS-PROTOCOL attacks 294 in which the attacking script generates traffic which is acceptable 295 to some non-Web protocol state machine. In order to resist this form 296 of attack, WebSockets incorporates a masking technique intended to 297 randomize the bits on the wire, thus making it more difficult to 298 generate traffic which resembles a given protocol. 300 4. Security for RTC-Web Applications 302 4.1. Access to Local Devices 304 As discussed in Section 1, allowing arbitrary sites to initiate calls 305 violates the core Web security guarantee; without some access 306 restrictions on local devices, any malicious site could simply bug a 307 user. At minimum, then, it MUST NOT be possible for arbitrary sites 308 to initiate calls to arbitrary locations without user consent. This 309 immediately raises the question, however, of what should be the scope 310 of user consent. 312 In order for the user to make an intelligent decision about whether 313 to allow a call (and hence his camera and microphone input to be 314 routed somewhere), he must understand either who is requesting 315 access, where the media is going, or both. As detailed below, there 316 are two basic conceptual models: 318 You are sending your media to entity A because you want to talk to 319 Entity A (e.g., your mother). 320 Entity A (e.g., a calling service) asks to access the user's 321 devices with the assurance that it will transfer the media to 322 entity B (e.g., your mother) 324 In either case, identity is at the heart of any consent decision. 325 Moreover, identity is all that the browser can meaningfully enforce; 326 if you are calling A, A can simply forward the media to C. Similarly, 327 if you authorize A to place a call to B, A can call C instead. In 328 either case, all the browser is able to do is verify and check 329 authorization for whoever is controlling where the media goes. The 330 target of the media can of course advertise a security/privacy 331 policy, but this is not something that the browser can enforce. Even 332 so, there are a variety of different consent scenarios that motivate 333 different technical consent mechanisms. We discuss these mechanisms 334 in the sections below. 336 It's important to understand that consent to access local devices is 337 largely orthogonal to consent to transmit various kinds of data over 338 the network (see Section 4.2. Consent for device access is largely a 339 matter of protecting the user's privacy from malicious sites. By 340 contrast, consent to send network traffic is about preventing the 341 user's browser from being used to attack its local network. Thus, we 342 need to ensure communications consent even if the site is not able to 343 access the camera and microphone at all (hence WebSockets's consent 344 mechanism) and similarly we need to be concerned with the site 345 accessing the user's camera and microphone even if the data is to be 346 sent back to the site via conventional HTTP-based network mechanisms 347 such as HTTP POST. 349 4.1.1. Calling Scenarios and User Expectations 351 While a large number of possible calling scenarios are possible, the 352 scenarios discussed in this section illustrate many of the 353 difficulties of identifying the relevant scope of consent. 355 4.1.1.1. Dedicated Calling Services 357 The first scenario we consider is a dedicated calling service. In 358 this case, the user has a relationship with a calling site and 359 repeatedly makes calls on it. It is likely that rather than having 360 to give permission for each call that the user will want to give the 361 calling service long-term access to the camera and microphone. This 362 is a natural fit for a long-term consent mechanism (e.g., installing 363 an app store "application" to indicate permission for the calling 364 service.) A variant of the dedicated calling service is a gaming 365 site (e.g., a poker site) which hosts a dedicated calling service to 366 allow players to call each other. 368 With any kind of service where the user may use the same service to 369 talk to many different people, there is a question about whether the 370 user can know who they are talking to. If I grant permission to 371 calling service A to make calls on my behalf, then I am implicitly 372 granting it permission to bug my computer whenever it wants. This 373 suggests another consent model in which a site is authorized to make 374 calls but only to certain target entities (identified via media-plane 375 cryptographic mechanisms as described in Section 4.3.2 and especially 376 Section 4.3.2.3.) Note that the question of consent here is related 377 to but distinct from the question of peer identity: I might be 378 willing to allow a calling site to in general initiate calls on my 379 behalf but still have some calls via that site where I can be sure 380 that the site is not listening in. 382 4.1.1.2. Calling the Site You're On 384 Another simple scenario is calling the site you're actually visiting. 385 The paradigmatic case here is the "click here to talk to a 386 representative" windows that appear on many shopping sites. In this 387 case, the user's expectation is that they are calling the site 388 they're actually visiting. However, it is unlikely that they want to 389 provide a general consent to such a site; just because I want some 390 information on a car doesn't mean that I want the car manufacturer to 391 be able to activate my microphone whenever they please. Thus, this 392 suggests the need for a second consent mechanism where I only grant 393 consent for the duration of a given call. As described in 394 Section 3.1, great care must be taken in the design of this interface 395 to avoid the users just clicking through. Note also that the user 396 interface chrome must clearly display elements showing that the call 397 is continuing in order to avoid attacks where the calling site just 398 leaves it up indefinitely but shows a Web UI that implies otherwise. 400 4.1.1.3. Calling to an Ad Target 402 In both of the previous cases, the user has a direct relationship 403 (though perhaps a transient one) with the target of the call. 404 Moreover, in both cases he is actually visiting the site of the 405 person he is being asked to trust. However, this is not always so. 406 Consider the case where a user is a visiting a content site which 407 hosts an advertisement with an invitation to call for more 408 information. When the user clicks the ad, they are connected with 409 the advertiser or their agent. 411 The relationships here are far more complicated: the site the user 412 is actually visiting has no direct relationship with the advertiser; 413 they are just hosting ads from an ad network. The user has no 414 relationship with the ad network, but desires one with the 415 advertiser, at least for long enough to learn about their products. 416 At minimum, then, whatever consent dialog is shown needs to allow the 417 user to have some idea of the organization that they are actually 418 calling. 420 However, because the user also has some relationship with the hosting 421 site, it is also arguable that the hosting site should be allowed to 422 express an opinion (e.g., to be able to allow or forbid a call) since 423 a bad experience with an advertiser reflect negatively on the hosting 424 site [this idea was suggested by Adam Barth]. However, this 425 obviously presents a privacy challenge, as sites which host 426 advertisements in IFRAMEs often learn very little about whether 427 individual users clicked through to the ads, or even which ads were 428 presented. 430 4.1.2. Origin-Based Security 432 Now that we have seen another use case, we can start to reason about 433 the security requirements. 435 As discussed in Section 3.2, the basic unit of Web sandboxing is the 436 origin, and so it is natural to scope consent to origin. 437 Specifically, a script from origin A MUST only be allowed to initiate 438 communications (and hence to access camera and microphone) if the 439 user has specifically authorized access for that origin. It is of 440 course technically possible to have coarser-scoped permissions, but 441 because the Web model is scoped to origin, this creates a difficult 442 mismatch. 444 Arguably, origin is not fine-grained enough. Consider the situation 445 where Alice visits a site and authorizes it to make a single call. 446 If consent is expressed solely in terms of origin, then at any future 447 visit to that site (including one induced via mash-up or ad network), 448 the site can bug Alice's computer, use the computer to place bogus 449 calls, etc. While in principle Alice could grant and then revoke the 450 privilege, in practice privileges accumulate; if we are concerned 451 about this attack, something else is needed. There are a number of 452 potential countermeasures to this sort of issue. 454 Individual Consent 455 Ask the user for permission for each call. 457 Callee-oriented Consent 458 Only allow calls to a given user. 460 Cryptographic Consent 461 Only allow calls to a given set of peer keying material or to a 462 cryptographically established identity. 464 Unfortunately, none of these approaches is satisfactory for all 465 cases. As discussed above, individual consent puts the user's 466 approval in the UI flow for every call. Not only does this quickly 467 become annoying but it can train the user to simply click "OK", at 468 which point the consent becomes useless. Thus, while it may be 469 necessary to have individual consent in some case, this is not a 470 suitable solution for (for instance) the calling service case. Where 471 necessary, in-flow user interfaces must be carefully designed to 472 avoid the risk of the user blindly clicking through. 474 The other two options are designed to restrict calls to a given 475 target. Callee-oriented consent provided by the calling site not 476 work well because a malicious site can claim that the user is calling 477 any user of his choice. One fix for this is to tie calls to a 478 cryptographically established identity. While not suitable for all 479 cases, this approach may be useful for some. If we consider the 480 advertising case described in Section 4.1.1.3, it's not particularly 481 convenient to require the advertiser to instantiate an iframe on the 482 hosting site just to get permission; a more convenient approach is to 483 cryptographically tie the advertiser's certificate to the 484 communication directly. We're still tying permissions to origin 485 here, but to the media origin (and-or destination) rather than to the 486 Web origin. [I-D.ietf-rtcweb-security-arch] and 487 [I-D.rescorla-rtcweb-generic-idp] describe mechanisms which 488 facilitate this sort of consent. 490 Another case where media-level cryptographic identity makes sense is 491 when a user really does not trust the calling site. For instance, I 492 might be worried that the calling service will attempt to bug my 493 computer, but I also want to be able to conveniently call my friends. 494 If consent is tied to particular communications endpoints, then my 495 risk is limited. Naturally, it is somewhat challenging to design UI 496 primitives which express this sort of policy. The problem becomes 497 even more challenging in multi-user calling cases. 499 4.1.3. Security Properties of the Calling Page 501 Origin-based security is intended to secure against web attackers. 502 However, we must also consider the case of network attackers. 503 Consider the case where I have granted permission to a calling 504 service by an origin that has the HTTP scheme, e.g., 505 http://calling-service.example.com. If I ever use my computer on an 506 unsecured network (e.g., a hotspot or if my own home wireless network 507 is insecure), and browse any HTTP site, then an attacker can bug my 508 computer. The attack proceeds like this: 510 1. I connect to http://anything.example.org/. Note that this site 511 is unaffiliated with the calling service. 512 2. The attacker modifies my HTTP connection to inject an IFRAME (or 513 a redirect) to http://calling-service.example.com 514 3. The attacker forges the response apparently 515 http://calling-service.example.com/ to inject JS to initiate a 516 call to himself. 518 Note that this attack does not depend on the media being insecure. 519 Because the call is to the attacker, it is also encrypted to him. 520 Moreover, it need not be executed immediately; the attacker can 521 "infect" the origin semi-permanently (e.g., with a web worker or a 522 popunder) and thus be able to bug me long after I have left the 523 infected network. This risk is created by allowing calls at all from 524 a page fetched over HTTP. 526 Even if calls are only possible from HTTPS sites, if the site embeds 527 active content (e.g., JavaScript) that is fetched over HTTP or from 528 an untrusted site, because that JavaScript is executed in the 529 security context of the page [finer-grained]. Thus, it is also 530 dangerous to allow RTC-Web functionality from HTTPS origins that 531 embed mixed content. Note: this issue is not restricted to PAGES 532 which contain mixed content. If a page from a given origin ever 533 loads mixed content then it is possible for a network attacker to 534 infect the browser's notion of that origin semi-permanently. 536 4.2. Communications Consent Verification 538 As discussed in Section 3.3, allowing web applications unrestricted 539 network access via the browser introduces the risk of using the 540 browser as an attack platform against machines which would not 541 otherwise be accessible to the malicious site, for instance because 542 they are topologically restricted (e.g., behind a firewall or NAT). 543 In order to prevent this form of attack as well as cross-protocol 544 attacks it is important to require that the target of traffic 545 explicitly consent to receiving the traffic in question. Until that 546 consent has been verified for a given endpoint, traffic other than 547 the consent handshake MUST NOT be sent to that endpoint. 549 4.2.1. ICE 551 Verifying receiver consent requires some sort of explicit handshake, 552 but conveniently we already need one in order to do NAT hole- 553 punching. ICE [RFC5245] includes a handshake designed to verify that 554 the receiving element wishes to receive traffic from the sender. It 555 is important to remember here that the site initiating ICE is 556 presumed malicious; in order for the handshake to be secure the 557 receiving element MUST demonstrate receipt/knowledge of some value 558 not available to the site (thus preventing the site from forging 559 responses). In order to achieve this objective with ICE, the STUN 560 transaction IDs must be generated by the browser and MUST NOT be made 561 available to the initiating script, even via a diagnostic interface. 562 Verifying receiver consent also requires verifying the receiver wants 563 to receive traffic from a particular sender, and at this time; for 564 example a malicious site may simply attempt ICE to known servers that 565 are using ICE for other sessions. ICE provides this verification as 566 well, by using the STUN credentials as a form of per-session shared 567 secret. Those credentials are known to the Web application, but 568 would need to also be known and used by the STUN-receiving element to 569 be useful. 571 There also needs to be some mechanism for the browser to verify that 572 the target of the traffic continues to wish to receive it. 573 Obviously, some ICE-based mechanism will work here, but it has been 574 observed that because ICE keepalives are indications, they will not 575 work here, so some other mechanism is needed. 577 [[ OPEN ISSUE: Do we need some way of verifying the expected traffic 578 rate, not just consent to receive traffic at all.]] 580 4.2.2. Masking 582 Once consent is verified, there still is some concern about 583 misinterpretation attacks as described by Huang et al.[huang-w2sp]. 584 As long as communication is limited to UDP, then this risk is 585 probably limited, thus masking is not required for UDP. I.e., once 586 communications consent has been verified, it is most likely safe to 587 allow the implementation to send arbitrary UDP traffic to the chosen 588 destination, provided that the STUN keepalives continue to succeed. 589 In particular, this is true for the data channel if DTLS is used 590 because DTLS (with the anti-chosen plaintext mechanisms required by 591 TLS 1.1) does not allow the attacker to generate predictable 592 ciphertext. However, with TCP the risk of transparent proxies 593 becomes much more severe. If TCP is to be used, then WebSockets 594 style masking MUST be employed. [Note: current thinking in the 595 RTCWEB WG is not to support TCP and to support SCTP over DTLS, thus 596 removing the need for masking.] 598 4.2.3. Backward Compatibility 600 A requirement to use ICE limits compatibility with legacy non-ICE 601 clients. It seems unsafe to completely remove the requirement for 602 some check. All proposed checks have the common feature that the 603 browser sends some message to the candidate traffic recipient and 604 refuses to send other traffic until that message has been replied to. 605 The message/reply pair must be generated in such a way that an 606 attacker who controls the Web application cannot forge them, 607 generally by having the message contain some secret value that must 608 be incorporated (e.g., echoed, hashed into, etc.). Non-ICE 609 candidates for this role (in cases where the legacy endpoint has a 610 public address) include: 612 o STUN checks without using ICE (i.e., the non-RTC-web endpoint sets 613 up a STUN responder.) 614 o Use or RTCP as an implicit reachability check. 616 In the RTCP approach, the RTC-Web endpoint is allowed to send a 617 limited number of RTP packets prior to receiving consent. This 618 allows a short window of attack. In addition, some legacy endpoints 619 do not support RTCP, so this is a much more expensive solution for 620 such endpoints, for which it would likely be easier to implement ICE. 621 For these two reasons, an RTCP-based approach does not seem to 622 address the security issue satisfactorily. 624 In the STUN approach, the RTC-Web endpoint is able to verify that the 625 recipient is running some kind of STUN endpoint but unless the STUN 626 responder is integrated with the ICE username/password establishment 627 system, the RTC-Web endpoint cannot verify that the recipient 628 consents to this particular call. This may be an issue if existing 629 STUN servers are operated at addresses that are not able to handle 630 bandwidth-based attacks. Thus, this approach does not seem 631 satisfactory either. 633 If the systems are tightly integrated (i.e., the STUN endpoint 634 responds with responses authenticated with ICE credentials) then this 635 issue does not exist. However, such a design is very close to an 636 ICE-Lite implementation (indeed, arguably is one). An intermediate 637 approach would be to have a STUN extension that indicated that one 638 was responding to RTC-Web checks but not computing integrity checks 639 based on the ICE credentials. This would allow the use of standalone 640 STUN servers without the risk of confusing them with legacy STUN 641 servers. If a non-ICE legacy solution is needed, then this is 642 probably the best choice. 644 Once initial consent is verified, we also need to verify continuing 645 consent, in order to avoid attacks where two people briefly share an 646 IP (e.g., behind a NAT in an Internet cafe) and the attacker arranges 647 for a large, unstoppable, traffic flow to the network and then 648 leaves. The appropriate technologies here are fairly similar to 649 those for initial consent, though are perhaps weaker since the 650 threats is less severe. 652 4.2.4. IP Location Privacy 654 Note that as soon as the callee sends their ICE candidates, the 655 caller learns the callee's IP addresses. The callee's server 656 reflexive address reveals a lot of information about the callee's 657 location. In order to avoid tracking, implementations may wish to 658 suppress the start of ICE negotiation until the callee has answered. 659 In addition, either side may wish to hide their location entirely by 660 forcing all traffic through a TURN server. 662 4.3. Communications Security 664 Finally, we consider a problem familiar from the SIP world: 665 communications security. For obvious reasons, it MUST be possible 666 for the communicating parties to establish a channel which is secure 667 against both message recovery and message modification. (See 668 [RFC5479] for more details.) This service must be provided for both 669 data and voice/video. Ideally the same security mechanisms would be 670 used for both types of content. Technology for providing this 671 service (for instance, DTLS [RFC4347] and DTLS-SRTP [RFC5763]) is 672 well understood. However, we must examine this technology to the 673 RTC-Web context, where the threat model is somewhat different. 675 In general, it is important to understand that unlike a conventional 676 SIP proxy, the calling service (i.e., the Web server) controls not 677 only the channel between the communicating endpoints but also the 678 application running on the user's browser. While in principle it is 679 possible for the browser to cut the calling service out of the loop 680 and directly present trusted information (and perhaps get consent), 681 practice in modern browsers is to avoid this whenever possible. "In- 682 flow" modal dialogs which require the user to consent to specific 683 actions are particularly disfavored as human factors research 684 indicates that unless they are made extremely invasive, users simply 685 agree to them without actually consciously giving consent. 686 [abarth-rtcweb]. Thus, nearly all the UI will necessarily be 687 rendered by the browser but under control of the calling service. 689 This likely includes the peer's identity information, which, after 690 all, is only meaningful in the context of some calling service. 692 This limitation does not mean that preventing attack by the calling 693 service is completely hopeless. However, we need to distinguish 694 between two classes of attack: 696 Retrospective compromise of calling service. 697 The calling service is is non-malicious during a call but 698 subsequently is compromised and wishes to attack an older call. 700 During-call attack by calling service. 701 The calling service is compromised during the call it wishes to 702 attack. 704 Providing security against the former type of attack is practical 705 using the techniques discussed in Section 4.3.1. However, it is 706 extremely difficult to prevent a trusted but malicious calling 707 service from actively attacking a user's calls, either by mounting a 708 MITM attack or by diverting them entirely. (Note that this attack 709 applies equally to a network attacker if communications to the 710 calling service are not secured.) We discuss some potential 711 approaches and why they are likely to be impractical in 712 Section 4.3.2. 714 4.3.1. Protecting Against Retrospective Compromise 716 In a retrospective attack, the calling service was uncompromised 717 during the call, but that an attacker subsequently wants to recover 718 the content of the call. We assume that the attacker has access to 719 the protected media stream as well as having full control of the 720 calling service. 722 If the calling service has access to the traffic keying material (as 723 in SDES [RFC4568]), then retrospective attack is trivial. This form 724 of attack is particularly serious in the Web context because it is 725 standard practice in Web services to run extensive logging and 726 monitoring. Thus, it is highly likely that if the traffic key is 727 part of any HTTP request it will be logged somewhere and thus subject 728 to subsequent compromise. It is this consideration that makes an 729 automatic, public key-based key exchange mechanism imperative for 730 RTC-Web (this is a good idea for any communications security system) 731 and this mechanism SHOULD provide perfect forward secrecy (PFS). The 732 signaling channel/calling service can be used to authenticate this 733 mechanism. 735 In addition, the system MUST NOT provide any APIs to extract either 736 long-term keying material or to directly access any stored traffic 737 keys. Otherwise, an attacker who subsequently compromised the 738 calling service might be able to use those APIs to recover the 739 traffic keys and thus compromise the traffic. 741 4.3.2. Protecting Against During-Call Attack 743 Protecting against attacks during a call is a more difficult 744 proposition. Even if the calling service cannot directly access 745 keying material (as recommended in the previous section), it can 746 simply mount a man-in-the-middle attack on the connection, telling 747 Alice that she is calling Bob and Bob that he is calling Alice, while 748 in fact the calling service is acting as a calling bridge and 749 capturing all the traffic. While in theory it is possible to 750 construct techniques which protect against this form of attack, in 751 practice these techniques all require far too much user intervention 752 to be practical, given the user interface constraints described in 753 [abarth-rtcweb]. 755 4.3.2.1. Key Continuity 757 One natural approach is to use "key continuity". While a malicious 758 calling service can present any identity it chooses to the user, it 759 cannot produce a private key that maps to a given public key. Thus, 760 it is possible for the browser to note a given user's public key and 761 generate an alarm whenever that user's key changes. SSH [RFC4251] 762 uses a similar technique. (Note that the need to avoid explicit user 763 consent on every call precludes the browser requiring an immediate 764 manual check of the peer's key). 766 Unfortunately, this sort of key continuity mechanism is far less 767 useful in the RTC-Web context. First, much of the virtue of RTC-Web 768 (and any Web application) is that it is not bound to particular piece 769 of client software. Thus, it will be not only possible but routine 770 for a user to use multiple browsers on different computers which will 771 of course have different keying material (SACRED [RFC3760] 772 notwithstanding.) Thus, users will frequently be alerted to key 773 mismatches which are in fact completely legitimate, with the result 774 that they are trained to simply click through them. As it is known 775 that users routinely will click through far more dire warnings 776 [cranor-wolf], it seems extremely unlikely that any key continuity 777 mechanism will be effective rather than simply annoying. 779 Moreover, it is trivial to bypass even this kind of mechanism. 780 Recall that unlike the case of SSH, the browser never directly gets 781 the peer's identity from the user. Rather, it is provided by the 782 calling service. Even enabling a mechanism of this type would 783 require an API to allow the calling service to tell the browser "this 784 is a call to user X". All the calling service needs to do to avoid 785 triggering a key continuity warning is to tell the browser that "this 786 is a call to user Y" where Y is close to X. Even if the user actually 787 checks the other side's name (which all available evidence indicates 788 is unlikely), this would require (a) the browser to trusted UI to 789 provide the name and (b) the user to not be fooled by similar 790 appearing names. 792 4.3.2.2. Short Authentication Strings 794 ZRTP [RFC6189] uses a "short authentication string" (SAS) which is 795 derived from the key agreement protocol. This SAS is designed to be 796 read over the voice channel and if confirmed by both sides precludes 797 MITM attack. The intention is that the SAS is used once and then key 798 continuity (though a different mechanism from that discussed above) 799 is used thereafter. 801 Unfortunately, the SAS does not offer a practical solution to the 802 problem of a compromised calling service. "Voice conversion" 803 systems, which modify voice from one speaker to make it sound like 804 another, are an active area of research. These systems are already 805 good enough to fool both automatic recognition systems 806 [farus-conversion] and humans [kain-conversion] in many cases, and 807 are of course likely to improve in future, especially in an 808 environment where the user just wants to get on with the phone call. 809 Thus, even if SAS is effective today, it is likely not to be so for 810 much longer. Moreover, it is possible for an attacker who controls 811 the browser to allow the SAS to succeed and then simulate call 812 failure and reconnect, trusting that the user will not notice that 813 the "no SAS" indicator has been set (which seems likely). 815 Even were SAS secure if used, it seems exceedingly unlikely that 816 users will actually use it. As discussed above, the browser UI 817 constraints preclude requiring the SAS exchange prior to completing 818 the call and so it must be voluntary; at most the browser will 819 provide some UI indicator that the SAS has not yet been checked. 820 However, it it is well-known that when faced with optional mechanisms 821 such as fingerprints, users simply do not check them [whitten-johnny] 822 Thus, it is highly unlikely that users will ever perform the SAS 823 exchange. 825 Once uses have checked the SAS once, key continuity is required to 826 avoid them needing to check it on every call. However, this is 827 problematic for reasons indicated in Section 4.3.2.1. In principle 828 it is of course possible to render a different UI element to indicate 829 that calls are using an unauthenticated set of keying material 830 (recall that the attacker can just present a slightly different name 831 so that the attack shows the same UI as a call to a new device or to 832 someone you haven't called before) but as a practical matter, users 833 simply ignore such indicators even in the rather more dire case of 834 mixed content warnings. 836 Despite these difficulties, users should be afforded an opportunity 837 to view an SAS or fingerprint where available, as it is the only 838 mechanism for the user to directly verify the peer's identity without 839 trusting any third party identity system (assuming, of course, that 840 they trust their own software). 842 4.3.2.3. Third Party Identity 844 The conventional approach to providing communications identity has of 845 course been to have some third party identity system (e.g., PKI) to 846 authenticate the endpoints. Such mechanisms have proven to be too 847 cumbersome for use by typical users (and nearly too cumbersome for 848 administrators). However, a new generation of Web-based identity 849 providers (BrowserID, Federated Google Login, Facebook Connect, 850 OAuth, OpenID, WebFinger), has recently been developed and use Web 851 technologies to provide lightweight (from the user's perspective) 852 third-party authenticated transactions. It is possible (see 853 [I-D.rescorla-rtcweb-generic-idp]) to use systems of this type to 854 authenticate RTCWEB calls, linking them to existing user notions of 855 identity (e.g., Facebook adjacencies). Specifically, the third-party 856 identity system is used to bind the user's identity to cryptographic 857 keying material which is then used to authenticate the calling 858 endpoints. Calls which are authenticated in this fashion are 859 naturally resistant even to active MITM attack by the calling site. 861 Note that there is one special case in which PKI-style certificates 862 do provide a practical solution: calls from end-users to large 863 sites. For instance, if you are making a call to Amazon.com, then 864 Amazon can easily get a certificate to authenticate their media 865 traffic, just as they get one to authenticate their Web traffic. 866 This does not provide additional security value in cases in which the 867 calling site and the media peer are one in the same, but might be 868 useful in cases in which third parties (e.g., ad networks or 869 retailers) arrange for calls but do not participate in them. 871 4.3.2.4. Page Access to Media 873 Identifying the identity of the far media endpoint is a necessary but 874 not sufficient condition for providing media security. In RTCWEB, 875 media flows are rendered into HTML5 MediaStreams which can be 876 manipulated by the calling site. Obviously, if the site can modify 877 or view the media, then the user is not getting the level of 878 assurance they would expect from being able to authenticate their 879 peer. In many cases, this is acceptable because the user values 880 site-based special effects over complete security from the site. 882 However, there are also cases where users wish to know that the site 883 cannot interfere. In order to facilitate that, it will be necessary 884 to provide features whereby the site can verifiably give up access to 885 the media streams. This verification must be possible both from the 886 local side and the remote side. I.e., I must be able to verify that 887 the person I am calling has engaged a secure media mode. In order to 888 achieve this it will be necessary to cryptographically bind an 889 indication of the local media access policy into the cryptographic 890 authentication procedures detailed in the previous sections. 892 5. Security Considerations 894 This entire document is about security. 896 6. Acknowledgements 898 Bernard Aboba, Harald Alvestrand, Dan Druta, Cullen Jennings, Hadriel 899 Kaplan (S 4.2.1), Matthew Kaufman, Martin Thomson, Magnus Westerland. 901 7. References 903 7.1. Normative References 905 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 906 Requirement Levels", BCP 14, RFC 2119, March 1997. 908 7.2. Informative References 910 [CORS] van Kesteren, A., "Cross-Origin Resource Sharing". 912 [I-D.ietf-rtcweb-security-arch] 913 Rescorla, E., "RTCWEB Security Architecture", 914 draft-ietf-rtcweb-security-arch-01 (work in progress), 915 March 2012. 917 [I-D.kaufman-rtcweb-security-ui] 918 Kaufman, M., "Client Security User Interface Requirements 919 for RTCWEB", draft-kaufman-rtcweb-security-ui-00 (work in 920 progress), June 2011. 922 [I-D.rescorla-rtcweb-generic-idp] 923 Rescorla, E., "RTCWEB Generic Identity Provider 924 Interface", draft-rescorla-rtcweb-generic-idp-01 (work in 925 progress), March 2012. 927 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. 929 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 930 A., Peterson, J., Sparks, R., Handley, M., and E. 931 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 932 June 2002. 934 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 935 Text on Security Considerations", BCP 72, RFC 3552, 936 July 2003. 938 [RFC3760] Gustafson, D., Just, M., and M. Nystrom, "Securely 939 Available Credentials (SACRED) - Credential Server 940 Framework", RFC 3760, April 2004. 942 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 943 Protocol Architecture", RFC 4251, January 2006. 945 [RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer 946 Security", RFC 4347, April 2006. 948 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 949 Description Protocol (SDP) Security Descriptions for Media 950 Streams", RFC 4568, July 2006. 952 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 953 (ICE): A Protocol for Network Address Translator (NAT) 954 Traversal for Offer/Answer Protocols", RFC 5245, 955 April 2010. 957 [RFC5479] Wing, D., Fries, S., Tschofenig, H., and F. Audet, 958 "Requirements and Analysis of Media Security Management 959 Protocols", RFC 5479, April 2009. 961 [RFC5763] Fischl, J., Tschofenig, H., and E. Rescorla, "Framework 962 for Establishing a Secure Real-time Transport Protocol 963 (SRTP) Security Context Using Datagram Transport Layer 964 Security (DTLS)", RFC 5763, May 2010. 966 [RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media 967 Path Key Agreement for Unicast Secure RTP", RFC 6189, 968 April 2011. 970 [RFC6454] Barth, A., "The Web Origin Concept", RFC 6454, 971 December 2011. 973 [RFC6455] Fette, I. and A. Melnikov, "The WebSocket Protocol", 974 RFC 6455, December 2011. 976 [abarth-rtcweb] 977 Barth, A., "Prompting the user is security failure", RTC- 978 Web Workshop. 980 [cranor-wolf] 981 Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., and 982 L. cranor, "Crying Wolf: An Empirical Study of SSL Warning 983 Effectiveness", Proceedings of the 18th USENIX Security 984 Symposium, 2009. 986 [farus-conversion] 987 Farrus, M., Erro, D., and J. Hernando, "Speaker 988 Recognition Robustness to Voice Conversion". 990 [finer-grained] 991 Barth, A. and C. Jackson, "Beware of Finer-Grained 992 Origins", W2SP, 2008. 994 [huang-w2sp] 995 Huang, L-S., Chen, E., Barth, A., Rescorla, E., and C. 996 Jackson, "Talking to Yourself for Fun and Profit", W2SP, 997 2011. 999 [kain-conversion] 1000 Kain, A. and M. Macon, "Design and Evaluation of a Voice 1001 Conversion Algorithm based on Spectral Envelope Mapping 1002 and Residual Prediction", Proceedings of ICASSP, May 1003 2001. 1005 [whitten-johnny] 1006 Whitten, A. and J. Tygar, "Why Johnny Can't Encrypt: A 1007 Usability Evaluation of PGP 5.0", Proceedings of the 8th 1008 USENIX Security Symposium, 1999. 1010 Author's Address 1012 Eric Rescorla 1013 RTFM, Inc. 1014 2064 Edgewood Drive 1015 Palo Alto, CA 94303 1016 USA 1018 Phone: +1 650 678 2350 1019 Email: ekr@rtfm.com