[hybi] A WebSocket handshake

Adam Barth <ietf@adambarth.com> Tue, 05 October 2010 22:15 UTC

MIME-Version: 1.0
From: Adam Barth <ietf@adambarth.com>
Date: Tue, 05 Oct 2010 15:15:22 -0700
Message-ID: <AANLkTimQ5x-v+Mz_OHrNDdtVd94E+HOBWwo3_f1ktEeg@mail.gmail.com>
To: Hybi <hybi@ietf.org>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Subject: [hybi] A WebSocket handshake
Precedence: list

Please find below a proposal for a new WebSocket handshake. The
handshake attempts to combine the benefits of the HTTP handshake with
the benefits of a TLS-based handshake. The handshake incorporates
ideas from a number of the other handshakes discussed previously,
including those from Maciej Stachowiak, Ian Hickson, and Greg Wilkins.
In addition to proposing a handshake, the document also contains a
threat model and a security analysis. Feedback appreciated.

Kind regards,
Adam

Pretty HTML version:

https://docs0.google.com/document/edit?id=1hRLcVc8FHsXOQvaulG2KmvGKepgFffcevyJn-dAEsrI&hl=en&authkey=COOWhaAD&pli=1

Not-so-pretty text version:

= A WebSocket Handshake =

Adam Barth
Eric Rescorla
October 5, 2010

== Introduction ==

This document describes a handshake for the WebSocket protocol that
resists cross-protocol attacks. The handshake sends a fixed sequence
of bytes and a random nonce from the client to the server to establish
two keys for a bidirectional encrypted tunnel, which the parties then
use for further communication. Although an eavesdropper can determine
the encryption keys, computing the keys requires knowledge of a
globally unique identifier, making it unlikely that an observer
unfamiliar with the the WebSocket protocol will interpret the
encrypted bytes on the wire as anything other than random bytes.
Before explaining the handshake, we present a model of the threats
posed by exposing a new network protocol to untrusted content running
in a web browser. We then work through some simple handshake designs
to build intuition for what can go wrong in a flawed design.

== Threats ==

In this document, we evaluate the risks posed by exposing the
WebSocket protocol to untrusted web content in a standard web browser.
We make the usual assumption in web security that the user visits the
attacker’s web site.

Web browsers already expose an HTTP-based networking facility to
untrusted web content. In designing WebSockets, we are concerned with
the additional risks incurred by granting the attacker additional
network privileges. We are chiefly concerned with three scenarios:

1) The attacker uses the WebSocket protocol to attack a server that
does not support the WebSocket protocol. In this scenario, we are
concerned with protecting a wide variety of servers that implement a
wide variety of protocols.

a) We do not assume the server implements any particular protocol
exactly according to its specification. Instead, we aim for “real
world” security in which servers might have a number of common bugs.

b) We do not assume the server uses a strong authentication
scheme. In particular, we are concerned with protecting servers that
rely on connectivity alone for authentication (e.g., inside a
corporate intranet). Although using strong authentication is a best
practice, strong authentication is far from universal in deployments.

2) The attacker uses other network facilities in the browser to attack
a WebSocket server. For example, the attacker might use and HTML form
element to generate an HTTP message targeted at a WebSocket server.
In this scenario, do not assume the WebSocket server follows the
WebSocket protocol specification in every detail. Instead, we seek to
protect WebSocket servers that contain some implementation errors. Of
course, we cannot hope to protect servers with arbitrary
implementation errors (e.g., memory safety errors), but, when given a
choice, we prefer protocols whose security is robust to sloppy
implementation. We are concern with two kinds of attacks in this
model:

a) The attacker crafts an HTTP request that confuses the WebSocket
server into performing an undesirable mutation to its internal state.

b) The attacker crafts an HTTP request that confuses the WebSocket
server into responding with content that the browser then interprets
to the detriment of the server (e.g., allows the attacker to mount a
cross-site scripting attack against the server’s origin).

3) The attacker communicates with a WebSocket server, but the ensuing
traffic confuses a network intermediary. Without loss of generality,
we can assume that the WebSocket server colludes with the attacker to
aid him or her in confusing the intermediary. In particular, we are
especially concerned with transparent HTTP proxies in corporate
intranets because these proxies are common and confusing such as proxy
could let the attacker extract confidential information from the
corporation.

== Strawmen ==

One natural approach is to design the handshake to mimic an HTTP POST
request. Using a POST request as a template is attractive because an
attacker can already generate POST requests to many network locations
using the HTML form element. If WebSockets are less generative than
the form element, then we can argue by reduction the WebSockets does
not increase the attack surface for cross-protocol attacks. Here’s an
example WebSocket handshake templated on a POST request:

Client -> Server:
POST /path/of/attackers/choice HTTP/1.1
Host: host-of-attackers-choice.com
Sec-WebSocket-Key: <connection-key>

Server -> Client:
HTTP/1.1 200 OK
Sec-WebSocket-Accept: <connection-key>

The idea behind this protocol is that by echoing back the
connection-key, the server has agreed to establish a WebSocket
connection. Unfortunately, this handshake has serious problems. If
the attacker can host an htaccess file at any location a target HTTP
server, the attacker can opt the server into using WebSockets. The
server will believe the first HTTP request is complete and is
expecting another HTTP request on the socket. However, the attacker
can now send (roughly) arbitrary bytes on the socket, spoofing HTTP
requests and reading back the response.

To repair this vulnerability, we replace value of the
Sec-WebSocket-Accept response header with HMAC-SHA1(<connection-key>,
<uuid>), on the assumption that a simple configuration file will be
unable compute an HMAC. However, this modification is insufficient.

Consider, for example, a virtual hosting environment in which the
attacker can place PHP scripts on the server. For example, such
hosting environments are widely available commercially, such as from
1and1.com. Now, the attacker can complete the WebSocket handshake
because the PHP script can compute the HMAC and send the appropriate
response header. The attacker has now opted into the WebSocket
protocol on behalf of the rest of the entire socket. Unfortunately,
the attacker is only empowered to speak on behalf his own virtual
host. This privilege escalation is likely to be exploitable by
spoofing further HTTP requests in WebSocket message frames. In these
spoofed messages, the attacker can spoof the Host header and interact
with other virtual hosts reachable on the same socket.

To attempt to repair this vulnerability, we remove the attacker’s
ability to designate a PHP script on the server:

Client -> Server:
OPTIONS * HTTP/1.1
Host: host-of-attackers-choice.com
Sec-WebSocket-Key: <connection-key>

Server -> Client:
HTTP/1.1 200 OK
Sec-WebSocket-Accept: HMAC(<connection-key>, “...”)

This handshake still has problems in more sophisticated virtual
hosting scenarios, but let’s put those aside for the moment to
consider how this handshake interacts with transparent HTTP proxies.
Recall that the browser will not use the proxy version of the
handshake because the proxy is transparent.

Unfortunately, this handshake is likely to confuse a transparent
proxy. After seeing these messages exchanged, a transparent proxy
will likely believe that the next bytes emitted by the browser will be
another HTTP request. However, the browser believes it has
established a WebSocket connection and will let the attacker send
WebSocket frames to the transparent proxy. The attacker can likely
use these frames to spoof HTTP requests for intranet resources (again,
by spoofing the Host header) and read back the response, stealing
confidential information from the corporation’s intranet.

To attempt to repair this vulnerability, we add the Upgrade header to
inform the transparent proxy that the socket is switching protocols:

Client -> Server:
OPTIONS * HTTP/1.1
Host: host-of-attackers-choice.com
Connection: Upgrade
Sec-WebSocket-Key: <connection-key>
Upgrade: WebSocket

Server -> Client:
HTTP/1.1 101 Switching Protocols
Connection: Upgrade
Upgrade: WebSocket
Sec-WebSocket-Accept: HMAC(<connection-key>, “...”)

Unfortunately, the RFC 2817 HTTP upgrade mechanism is virtually unused
in practice. If you search the web for references to upgrade, you
either find links to RFC 2817 or discussion of the WebSocket protocol.
It seems entirely likely that some number of transparent proxies will
be oblivious to the HTTP upgrade mechanism. Organizations could
easily deploy such proxies and never have any operational issues with
them. For this reason, assuming that transparent proxies the HTTP
upgrade mechanism is a dangerous assumption. If the proxy is
oblivious to HTTP upgrade, the proxy could easily treat this handshake
the same way it would treat the previous iteration, which allows the
attacker to steal confidential information from corporate intranets.

Rather than relying upon the rarely used HTTP upgrade mechanism to
inform network intermediaries that the remainder of the socket is not
HTTP, we propose using the RFC 2817 CONNECT mechanism. This mechanism
is widely used on the Internet to tunnel TLS connections through
proxies. Proxy implementations that lack support for the CONNECT
mechanism will likely discover and repair that oversight quickly.

== Proposal ==

In this section, we present our proposal for a WebSocket handshake and
tunnel. The handshake established a shared “secret” between the
client and the server, which they use to encrypt subsequent traffic.
This handshake lacks a number of endpoint and extension negotiation
features of the current handshake. We expect the working group to add
these features inside the encrypted tunnel.

=== Handshake Request ===

To establish a WebSocket connection, the browser sends an RFC 2817
CONNECT request:

Client -> Server:
CONNECT 1C1BCE63-1DF8-455C-8235-08C2646A4F21.invalid:443 HTTP/1.1
Host: 1C1BCE63-1DF8-455C-8235-08C2646A4F21.invalid:443
Sec-WebSocket-Key: <connection-key1>

where <connection-key1> is a 128-bit random number encoded in base64.
This initial message has several desirable properties:

1) The attacker cannot influence any of the bytes included in the
message. Instead of using the attacker’s host name, we use an invalid
host name (per RFC 2606). Although we could use any invalid host
name, we use this host name as a globally unique identifier for the
WebSocket protocol.

2) Any intermediaries that understand this message according to its
HTTP semantics with route the request to a non-existent domain and
fail the request. In particular, they will not route the
Sec-WebSocket-Key to the attacker, making it difficult for the
attacker to perform actions based on the key.

3) Transparent proxies are likely to interpret this request as an
HTTPS connect request and assume the remainder of the socket is
unintelligible. Because the remainder of the bytes on the socket are
encrypted (see below), the attacker is unlikely to be able to trick
the transparent proxy into taking further action.

4) This message cannot be generated by a web attacker in today’s browsers.

5) A server that wishes to multiplex HTTP and WebSockets on the same
port can use the request-line to distinguish the two protocols.

The client can also include additional information in the first
handshake message by encrypting that information in AES-128-CTR using
the key HMAC-SHA1(<connection-key1>,
“C1BA787A-0556-49F3-B6AE-32E5376F992B”) and a counter block that is
the byte number represented in 128-bit network byte order
(big-endian). We expect browsers to use this additional information
to include additional meta-data about the connection (e.g., the origin
of the web site that created the WebSocket) rather than
application-layer messages.

Encrypting the additional information makes it difficult for the
attacker to predict the bytes that appear on the wire. Without the
ability to predict on-the-wire bytes, the attacker will have
difficulty crafting a network message that confuses a non-WebSocket
server or an intermediary. Effectively, the attacker is limited to
sending random traffic to a chosen server. To limit opportunities for
abuse, the browser should limit the amount of unsolicited data the
attacker can send (500 bytes?) before the server accepts the WebSocket
connection to avoid spamming unwitting servers with too much traffic.

=== Handshake Response ===

To accept the request, the server replies with the following message:

Server -> Client:
HTTP/1.1 200 OK
Sec-WebSocket-Accept: <hmac>
Sec-WebSocket-Key: <connection-key2>

where <hmac> is HMAC-SHA1(<connection-key1>,
“258EAFA5-E914-47DA-95CA-C5AB0DC85B11”) encoded in base64 and
<connection-key2> is a 128-bit random number encoded in base64. If
<connection-key2> is identical to <connection-key1>, the client aborts
the handshake. This message completes the CONNECT mechanism.

The entity that generated the HMAC has demonstrated understanding of
the WebSocket protocol by including the UUID in the HMAC. Because the
original network message did not designate any particular host, we can
have reasonable assurance that the entity that generated the HMAC
speaks on behalf of the entire socket (and not just on behalf of one
virtual host). Because the HMAC occurs near the beginning of the
socket (and is proceeded by a fixed string), we mitigate the risk that
the replying entity is actually speaking a non-HTTP, non-WebSocket
protocol.

After sending the handshake response, the server can begin sending
information over the encrypted tunnel described in the following
section. We expect that the first message sent by the server will
contain meta-data about the connection and that subsequent messages
will contain application-layer messages.

=== Tunnel ===

The handshake establishes two keys, which the client and server use to
form an encrypted tunnel for further communication:

Client -> Server Key:
HMAC-SHA1(<connection-key1> || <connection-key2>,
“363A6078-74D2-4C0B-8CBC-1E6A36E83442”)

Server -> Client Key:
HMAC-SHA1(<connection-key1> || <connection-key2>,
“2306C3BE-0ACF-42C0-B69E-DFFE02CFA346”)

All subsequent bytes are encrypted using AES-128-CTR with the
appropriate directional key and a counter block that is the byte
number represented in 128-bit network byte order (big-endian).

Encrypting the tunnel makes it difficult for an attacker to use the
browser’s HTTP network facilities to attack a poorly implemented
WebSocket server. Because the attacker is unable to learn the
<connection-key2> chosen by the server, the attacker will have
difficulty crafting an HTTP request that the WebSocket server will
decrypt to something sensible.

Encrypting the traffic from the server to the client makes it
difficult for the attacker to generate an HTTP request to an honest by
poorly implemented WebSocket server that causes its response to be
interpreted to its detriment by the browser. In particular, it is
unlikely that the server’s response will be treated as an HTML
document by the browser, preventing the attacker from leveraging the
WebSocket server to mount a cross-site scripting attack against the
server’s origin.

== Analysis ==

We analyze the risks of this protocol in the three scenarios of interest:

1) The attacker uses the WebSocket protocol to attack a server that
does not support the WebSocket protocol. There are two cases to
consider: the server is familiar with HTTP semantics or the server is
oblivious of HTTP:

a) The attacker will find it difficult to attacker a server that
is familiar with HTTP semantics with this handshake because the HTTP
semantics of the handshake point to routing the request to a
non-existent network location. If the request somehow routes to the
attacker, the HTTP semantics then point to transporting opaque data
over the socket.

b) The attacker will find it difficult to attack an HTTP-oblivious
server with this handshake because the attacker can send only a fixed
message followed by seemingly random bytes. None of the bytes sent to
the server can be controlled directly by the attacker. It seems
unlikely that the attacker will be able to advance the non-WebSocket
server very far down its state machine.

2) The attacker uses other network facilities in the browser to attack
a WebSocket server. There are two cases to consider: the server
implements the WebSocket protocol correctly or the server implements
an imperfect version of the WebSocket protocol:

a) If the server correctly implements the WebSocket protocol, the
attacker will be unable to use the other network facilities of the
browser to complete the handshake with the server because the attacker
is unable to generate the first network message.

b) If the server implements an imperfect version of the WebSocket
protocol, the attacker will be unable to learn the value of either of
the directional keys for the tunnel. Without knowledge of these keys,
the attacker will find it difficult (i) to craft a message that
decrypts to something meaningful to the WebSocket server and (ii) to
trick the WebSocket server into responding with something meaningful
to the browser.

3) The attacker communicates with a WebSocket server, but the ensuing
traffic confuses a network intermediary. If the intermediary attempts
to route the request (e.g., because the intermediary is an HTTP
proxy), the handshake will fail because the request does not contain
any routing information for the target server. If the handshake
completes and the intermediary understands HTTP semantics (as widely
used), the intermediary will likely reason that the remainder of the
socket is an opaque TLS connection. In either case, the intermediary
is unlikely to take undesirable actions as a result of the WebSocket
connection.

== Conclusion ==

We believe this handshake is superior to the current handshake because
this handshake has a stronger argument for security. Because the
attacker cannot control any of the bytes sent by the browser, the
attacker will have difficulty mounting a cross-protocol attack using
this handshake.

That said, there is no guarantee that this handshake resists
cross-protocol attacks. These security properties are not very well
studied, making designing protocols that achieve these properties more
art than science. However, the handshake we propose has a number of
heuristic properties that suggest it might stand up to further
scrutiny.

Re: [hybi] A WebSocket handshake Adam Barth
[hybi] A WebSocket handshake Adam Barth
Re: [hybi] A WebSocket handshake Adam Barth
Re: [hybi] A WebSocket handshake Willy Tarreau
Re: [hybi] A WebSocket handshake Eric Rescorla
Re: [hybi] A WebSocket handshake Willy Tarreau
Re: [hybi] A WebSocket handshake Adam Barth
Re: [hybi] A WebSocket handshake Willy Tarreau
Re: [hybi] A WebSocket handshake Adam Barth
Re: [hybi] A WebSocket handshake Adam Barth
Re: [hybi] A WebSocket handshake Willy Tarreau
Re: [hybi] A WebSocket handshake Greg Wilkins
Re: [hybi] A WebSocket handshake Greg Wilkins
Re: [hybi] A WebSocket handshake Willy Tarreau
Re: [hybi] A WebSocket handshake Greg Wilkins
Re: [hybi] A WebSocket handshake Willy Tarreau
Re: [hybi] A WebSocket handshake Adam Barth
[hybi] Strawman (was: A WebSocket handshake) S Moonesamy
Re: [hybi] A WebSocket handshake Maciej Stachowiak
Re: [hybi] A WebSocket handshake Adam Barth
Re: [hybi] A WebSocket handshake Maciej Stachowiak
Re: [hybi] A WebSocket handshake Maciej Stachowiak
Re: [hybi] A WebSocket handshake Adam Barth