Detecting and Defeating TCP/IP Hypercookie Attacks
ETH Zurich
ietf@trammell.ch
Internet Architecture Board
Privacy and Security Program
Internet-Draft
The TCP/IP stack provides protocol features that can potentially be abused by
on-path attackers to inject metadata about a traffic flow into that traffic
flow in band. When this injected metadata is provided by an entity with
knowledge about the natural person associated with a traffic flow, it becomes
a grave threat to privacy, which we term a hypercookie.
This document defines a threat model for hypercookie injection and hypercookie
coercion attacks, catalogs protocol features that may be used to achieve them,
and provides guidance for defeating these attacks, with an analysis of
protocol features that are disabled by the proposed defeat mechanism.
The deployment of firewalls that detect and reject abuse of protocol features
can help, but the relative ease of injecting metadata for attackers on path,
and trivial combination of metadata injection attacks, leads to a
recommendation to add cryptographic integrity protection to transport layer
headers to defend against injection attacks.
tl;dr: at least with respect to metadata injection in the current Internet
protocol stack, everything is ruined.
This document considers a specific threat model related to the pervasive
surveillance threat model defined in and correlation and
identification of users as defined in sections 5.2.1 and 5.2.2, respectively,
of . The attacker has access to the access network(s) connecting a
user to the Internet, by collaborating with, coopting, or otherwise exercising
influence over the user’s access provider. It can see all inbound and outbound
traffic from the user via that network, and can modify inbound and outbound
packets to the user. The attacker would like to add metadata to the user’s
traffic flows in order to expose that metadata to networks the user
communicates with, where it will be passively observed, and it would like this
metadata to appear in layers 3 or 4, in order to be completely transparent to
the application. For purposes of this analysis, we presume this metadata is a
user identifier or partial user identifier. We propose a colloquial term for
this type of sub-application identification: “hypercookie”. This can be seen
as a third-party implementation of the metadata insertion pattern described in
.
The attacker is variably interested in avoiding detection of hypercookie
injection techniques, and is variably interested in metadata reliability, but
requires that the injected metadata not interfere with normal protocol
operation, even if the exposed metadata is not used by any far endpoint.
The hypercookie injection attack is related to another, largely equivalent
attack, hypercookie coercion. In this attack, the attacker requires the client
endpoint to expose the hypercookie itself, and uses in-band verification
techniques to determine whether the hypercookie was correctly applied,
blocking traffic which does not carry it.
This document is concerned only with identification through hypercookie
injection at the transport and network layers, as this is possible even when
the application layer is encrypted using TLS or other encryption schemes that
operate above the transport layer. Application layer hypercookie injection is
out of scope, as are identification methods using traffic fingerprinting. It
is also concerned only with TCP as defined, not as implemented and deployed;
exploitation of other behaviors in implemented TCP stacks (e.g. as outlined in
may also be used for hypercookie exposure, albeit with
further risk of connection disruption.
Further, out-of-band identification methods, e.g. linking a flow’s five- or
six-tuple with an identifier and using some other protocol to export this
linkage, is also not considered, as it is practically impossible for users and
far endpoints to detect and defeat.
The metadata injection techniques presented in this document are EMPHATICALLY
NOT RECOMMENDED for use on the Internet; this document is intended to educate
members of the Internet engineering community about the potential for abuse in
TCP as defined and deployed.
As used in this document:
“Stateless TCP firewall” refers to a middlebox that selectively
drops single malformed TCP packets. A stateless TCP firewall can defeat TCP
metadata injection techniques which rely on noncompliant formation of
single TCP packets.
“Stateful TCP firewall” refers to a middlebox that selectively drops TCP
packets not conforming to the protocol by modeling the TCP state machine on
both endpoints. A stateful TCP firewall can defeat TCP metadata injection
techniques which relies on noncompliant formation of TCP packets and/or
flows.
“Split TCP proxy” refers to a middlebox which terminates a TCP connection
on one its Internet-facing side and opens a separate TCP connection on the
other side. Split TCP firewalls defeat most of the TCP-specific metadata
injection techniques in this document.
The metadata injection techniques described in share some general
properties: each places data into bits in the IP or TCP header, injection of
which is insignificant to the connectivity or performance of the connection
between the endpoints. To some extent, this is a consequence of cleartext
headers in IP and TCP and of Postel’s maxim . Being liberal in what
one accepts leaves space between what the sender SHOULD/MUST send and what the
receiver will silently ignore, and these techniques exploit that space.
Changing transport stacks to fail fast and hard on the receiver side, as
recommended in would reduce this space, but
at the possible risk of connectivity instability during the transition.
TCP HICCUPS proposes a method for cooperative discovery and
mitigation of middlebox manipulation. It uses many of the bits in the header
that could also be used for metadata injection, and as such provides a
concrete implementation of fail fast and hard, mitigating TCP attacks as in
.
The deployment of middleboxes to drop malformed packets or zero fields that
may be used in hypercookie attacks may help to reduce the rate of success and
therefore the incentive to perform hypercookie injection. However, this must
be balanced against the cost of additional management complexity and the risk
of further ossification of the Internet protocol stack through even more
widespread deployment of transport-aware, stateful, packet-modifying
middleboxes.
The best defense comes from evolving the stack: Widespread deployment
transport protocol proposals that encrypt most or all of the transport layer
headers such as QUIC, or proposals to enable generalized transport layer
encapsulation and encryption such as PLUS, would effectively mitigate the TCP
attacks in .
This section describes metadata injection techniques against the TCP/IP stack,
separated by whether they abuse the IPv4, IPv6, or TCP protocols.
Four attacks abuse the IPv6 header: three by injecting information into IPv6
source addresses, one abusing the IPv6 flow label.
section 2.5.1 required IPv6 interface identifiers for Stateless
Address Autoconfiguration (SLAAC) to be constructed using modified EUI-64
format. This leaks the hardware address of a user’s terminal to the receiver
and all devices along the path. Such addresses are easily recognized, as well,
given the presence of the bytes 0xff and 0xfe at byte offsets 11 and 12 of the
address. Though deprecates the significance of the IPv6 interface
identifier and specifies a standard method for assigning privacy
addresses when using SLAAC, these addresses may still be in use on the
Internet and as such can be passively used as identifying information along the path.
When present, this technique provides 47 bits of identifying information on a
per-node basis, present on each packet from the node. Access network
providers cannot force the use of EUI-64 addressing; however, see
for a related technique.
The mitigation is to disable EUI-64 based SLAAC at end hosts, replacing it
with privacy addressing and/or DHCPv6 . This is current
recommended practice in any event. Both of these mitigations come with limited
additional overhead and/or network management complexity.
An attacker which runs or can influence the configuration of a DHCPv6 server
from which a node gets its address can assign a source address to that node,
the interface identifier part of which can contain identifying information.
When successful, this technique provides approximately 64 bits of identifying
information on a per-node basis, present on each packet from the node. Access
network providers can influence the use of DHCPv6 addresses, depending on
access network architecture.
The mitigation is to disable DHCPv6. In situations when a user cannot
practically do so without losing connectivity, this technique can be
identified in some cases through an analysis of the addresses assigned to
node(s) belonging to a user and determination of the persistence of the
linkage between an address or addresses and a user.
An attacker which cannot influence the configuration of a DHCPv6 server can
use network address translation to rewrite the interface identifier part of an
address to contain identifying information.
When successful, this technique provides approximately 64 bits of identifying
information on a per-node basis, present on each packet from the node.
No user-initiated mitigation is possible with the present stack. This
technique can be detected by connecting to a remote host via IPv6, which can
then analyze the addresses assigned to node(s) belonging to a user and
determination of the persistence of the linkage between an address or
addresses and a user.
defines the IPv6 flow label, a 20-bit field in every IPv6 packet.
It is intended to replace source and destination port in equal-cost
multipath routing (ECMP) and other load distribution schemes. However, the
flow label can be freely rewritten by middleboxes on path.
This technique provides up to 20 bits of identifying information per packet,
with the caveat that applying different flow labels to different packets
within a flow may impair transport layer performance due to reordering.
No user-initiated mitigation is possible with the present stack. Header
modification detection as in , and/or the deployment of middleboxes
that monitor and/or zero the flow label may provide detection and mitigation.
One attack injects information into the IPv4 fragment ID header.
defines the Identification field in the IPv4 header, which is used
for fragmentation and fragment reassembly. While the field is only defined
when a packet is fragmented, middleboxes can freely fill identifying
information into this field. section 4.1 states that the value
MUST be ignored by middleboxes, so it will tend to be preserved along the
path assuming compliant devices.
This technique provides up to 16 bits of identifying information per packet,
with a caveat that it may be difficult to implement on networks with large
amounts of fragmented IPv4 traffic.
There is no user-initiated mitigation possible with deployed IPv4 stacks.
Header modification detection as in may provide detection and
mitigation
A multitude of techniques exist to abuse TCP. These can be roughly classified
into per-packet injection, where metadata can be added to header bits in each
packet; and per-flow injection, where packets not part of the normal flow are
generated and ignored by the receiver. Per-flow injection techniques generally
provide much more space for metadata injection, and are sufficient for user
identification for access control and user tracking on a per-flow basis.
A middlebox can rewrite the initial sequence number (ISN) of flows it sees the
SYN packet for, in order to place identifying information therein.
This technique provides up to 32 bits of identifying information per flow,
with the caveat that it requires a stateful middlebox to translate all
sequence and acknowledgment numbers on subsequent packets on the flow. It also
does not work if there are other proxies which rewrite the ISN (e.g. for
security, to mitigate poor randomness in 1990s era TCP stace ISN selection
algorithms) on the path between the middlebox and the Internet. The
identification provided by this technique also does not traverse split-TCP proxies.
Header modification detection as in or the aggressive deployment
of split-TCP proxies can mitigate this attack. We note that the aggressive
deployment of split-TCP proxies in the Internet is an undesirable solution, as
it implies an acceleration and deepening of middlebox-related transport
protocol ossification.
A middlebox can rewrite the urgent pointer of TCP packets without the URG flag
set, in order to place identifying information therein. The urgent pointer is
only intepreted when the URG flag is set, according to section 3.1 of
; compliant implementations will therefore ignore the urgent
pointer when used in this manner.
This technique provides up to 16 bits of identifying information per packet.
Information exposed using this technique may not traverse TCP firewalls or
split TCP proxies. The aggressive deployment of stateless TCP firewalls that
zero the urgent pointer on all packets with the URG flag not set can mitigate
this attack, at the cost of increased operational complexity and further
middlebox-related transport protocol ossification.
A middlebox can piggyback an experimental TCP option onto a TCP packet with
enough headroom, and place identifying information in that option. This option
could even be given a IANA identifier using the ExId mechanism ,
registered with IANA on a First-Come, First-Served basis, with an
innocuous name, in order to deflect suspicion about its use.
Assuming a 4-byte ExId, sufficient headroom between the segment size and the
path MTU, and no other TCP options on a packet, this technique can provide up
to 288 bits of identifying information per packet given limitations on TCP
options size. We note that this is an upper bound, and that the transparency
of Internet paths to unknown and experimental TCP options is not perfect,
which reduce the applicability of this technique somewhat.
Information exposed using this technique may not traverse TCP firewalls or
split TCP proxies. The aggressive deployment of stateless TCP firewalls that
strip experimental options not in use on a given network can mitigate this
attack. We note that some deployed TCP Fast Open implementations
use an experimental option, and would be affected by this mitigation. This
mitigation also incurs the cost of increased operational complexity and
further middlebox-related transport ossification.
As with the attack in , above, a middlebox could simply
generate a suitable bare ACK packet within a flow, but not initiated by the
sender, and place information in an experimental TCP option. The bare ACK
would be processed by the receiver and the option ignored.
This technique can provide up to 288 bits of identifying information per flow
given limitations on TCP options size. Note that multiple bare ACKs can be
used to extend the amount of information injected per flow.
Mitigations and caveats thereon are as in , above.
A middlebox that keeps state for each TCP connection traversing it can place
out-of-window segments sharing a given 5-tuple but not initiated by the sender
on the wire. These segments should traverse any device not looking at TCP
state, and be ignored by the receiver.
This technique can provide over 11000 bits of identifying information per flow
given a 1500 byte MTU. Note that multiple out of window segments can be used
to extend the amount of information injected per flow.
Information exposed using this technique may not traverse stateful TCP
firewalls or split TCP proxies. Existing stateful TCP firewalls already
provide out-of-window segment dropping, due to their usefulness in TCP session
hijacking attacks (see for more). The aggressive
deployment of stateful TCP firewalls that drop and warn on out- of-window
segments can mitigate this attack. This mitigation incurs the cost of
increased operational complexity and further middlebox-related transport
ossification.
Similar to , a middlebox can place segments with bad checksums
sharing a given 5-tuple on the wire. These segments should traverse any device
not looking at TCP state, and be ignored by the receiver.
Per-flow information and mitigations along with caveats are as in .
Note that multiple techniques above may be combined on any given packet or
over the sequence of packets in any given flow in order to increase the number
of bits available and/or increase the resilience of the injected information
to mitigation.
An analysis of the hypercookie attacks listed in this document, and the
ability to combine them freely to improve hypercookie resilience and capacity,
leads to a relatively bleak outlook. Mitigating the threat at scale with the
stack as presently deployed requires impractically aggressive, altruistic
deployment of TCP-modifying firewalls.
We therefore conclude that the most practical mitigation of this threat is the
development and deployment of transport protocols that provide cryptographic
integrity protection and/or confidentiality for their headers, in order to
prevent hypercookie injection at the transport layer.
Note that these mitigations can only detect, but not prevent, hypercookie
coercion attacks: if an attacker can successfully block a client’s access to
the Internet to enforce hypercookie coercion, removal of metadata will not
restore that access, as the attack is carried out through nontechnical
relationships between the attacker and the target. We can only hope that
raising awareness and bringing transparency to the potential for hypercookie
coercion attacks makes them less likely to be successful.
This document has no actions for IANA [EDITOR’S NOTE: please remove this
section at publication.]
This document outlines vulnerabilities in the TCP/IP protocol stack as
deployed to a type of attack described in . Exploitation of
these vulnerabilities can be used to expose identifying information about
users of a network to third parties; the document discusses general and specific techniques to mitigate the impact of these exploits.
This work is supported by the European Commission under Horizon 2020 grant
agreement no. 688421 Measurement and Architecture for a Middleboxed Internet
(MAMI), and by the Swiss State Secretariat for Education, Research, and
Innovation under contract no. 15.0268. This support does not imply
endorsement.
Thanks to Ted Hardie, Joe Hildebrand, Mirja Kuehlewind, and the participants
at the PLUS BoF at IETF 96 in Berlin for the conversations leading to and
informing the publication of this document.
Middleboxes: Taxonomy and Issues
This document is intended as part of an IETF discussion about "middleboxes" - defined as any intermediary box performing functions apart from normal, standard functions of an IP router on the data path between a source host and destination host. This document establishes a catalogue or taxonomy of middleboxes, cites previous and current IETF work concerning middleboxes, and attempts to identify some preliminary conclusions. It does not, however, claim to be definitive. This memo provides information for the Internet community.
Privacy Considerations for Internet Protocols
This document offers guidance for developing privacy considerations for inclusion in protocol specifications. It aims to make designers, implementers, and users of Internet protocols aware of privacy-related design choices. It suggests that whether any individual RFC warrants a specific privacy considerations section will depend on the document's content.
Confidentiality in the Face of Pervasive Surveillance: A Threat Model and Problem Statement
Since the initial revelations of pervasive surveillance in 2013, several classes of attacks on Internet communications have been discovered. In this document, we develop a threat model that describes these attacks on Internet confidentiality. We assume an attacker that is interested in undetected, indiscriminate eavesdropping. The threat model is based on published, verified attacks.
Internet Protocol
Requirements for Internet Hosts - Communication Layers
This RFC is an official specification for the Internet community. It incorporates by reference, amends, corrects, and supplements the primary protocol standards documents relating to hosts. [STANDARDS-TRACK]
Dynamic Host Configuration Protocol for IPv6 (DHCPv6)
IP Version 6 Addressing Architecture
This specification defines the addressing architecture of the IP Version 6 (IPv6) protocol. The document includes the IPv6 addressing model, text representations of IPv6 addresses, definition of IPv6 unicast addresses, anycast addresses, and multicast addresses, and an IPv6 node's required addresses.This document obsoletes RFC 3513, "IP Version 6 Addressing Architecture". [STANDARDS-TRACK]
Privacy Extensions for Stateless Address Autoconfiguration in IPv6
Nodes use IPv6 stateless address autoconfiguration to generate addresses using a combination of locally available information and information advertised by routers. Addresses are formed by combining network prefixes with an interface identifier. On an interface that contains an embedded IEEE Identifier, the interface identifier is typically derived from it. On other interface types, the interface identifier is generated through other means, for example, via random number generation. This document describes an extension to IPv6 stateless address autoconfiguration for interfaces whose interface identifier is derived from an IEEE identifier. Use of the extension causes nodes to generate global scope addresses from interface identifiers that change over time, even in cases where the interface contains an embedded IEEE identifier. Changing the interface identifier (and the global scope addresses generated from it) over time makes it more difficult for eavesdroppers and other information collectors to identify when different addresses used in different transactions actually correspond to the same node. [STANDARDS-TRACK]
Guidelines for Writing an IANA Considerations Section in RFCs
Many protocols make use of identifiers consisting of constants and other well-known values. Even after a protocol has been defined and deployment has begun, new values may need to be assigned (e.g., for a new option type in DHCP, or a new encryption or authentication transform for IPsec). To ensure that such quantities have consistent values and interpretations across all implementations, their assignment must be administered by a central authority. For IETF protocols, that role is provided by the Internet Assigned Numbers Authority (IANA).In order for IANA to manage a given namespace prudently, it needs guidelines describing the conditions under which new values can be assigned or when modifications to existing values can be made. If IANA is expected to play a role in the management of a namespace, IANA must be given clear and concise instructions describing that role. This document discusses issues that should be considered in formulating a policy for assigning values to a namespace and provides guidelines for authors on the specific text that must be included in documents that place demands on IANA.This document obsoletes RFC 2434. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.
On the Implementation of the TCP Urgent Mechanism
This document analyzes how current TCP implementations process TCP urgent indications and how the behavior of some widely deployed middleboxes affects how end systems process urgent indications. This document updates the relevant specifications such that they accommodate current practice in processing TCP urgent indications, raises awareness about the reliability of TCP urgent indications in the Internet, and recommends against the use of urgent indications (but provides advice to applications that do). [STANDARDS-TRACK]
IPv6 Flow Label Specification
This document specifies the IPv6 Flow Label field and the minimum requirements for IPv6 nodes labeling flows, IPv6 nodes forwarding labeled packets, and flow state establishment methods. Even when mentioned as examples of possible uses of the flow labeling, more detailed requirements for specific use cases are out of the scope for this document.The usage of the Flow Label field enables efficient IPv6 flow classification based only on IPv6 main header fields in fixed positions. [STANDARDS-TRACK]
Updated Specification of the IPv4 ID Field
The IPv4 Identification (ID) field enables fragmentation and reassembly and, as currently specified, is required to be unique within the maximum lifetime for all datagrams with a given source address/destination address/protocol tuple. If enforced, this uniqueness requirement would limit all connections to 6.4 Mbps for typical datagram sizes. Because individual connections commonly exceed this speed, it is clear that existing systems violate the current specification. This document updates the specification of the IPv4 ID field in RFCs 791, 1122, and 2003 to more closely reflect current practice and to more closely match IPv6 so that the field's value is defined only when a datagram is actually fragmented. It also discusses the impact of these changes on how datagrams are used. [STANDARDS-TRACK]
Shared Use of Experimental TCP Options
This document describes how the experimental TCP option codepoints can concurrently support multiple TCP extensions, even within the same connection, using a new IANA TCP experiment identifier. This approach is robust to experiments that are not registered and to those that do not use this sharing mechanism. It is recommended for all new TCP options that use these codepoints.
TCP Fast Open
This document describes an experimental TCP mechanism called TCP Fast Open (TFO). TFO allows data to be carried in the SYN and SYN-ACK packets and consumed by the receiving end during the initial connection handshake, and saves up to one full round-trip time (RTT) compared to the standard TCP, which requires a three-way handshake (3WHS) to complete before data can be exchanged. However, TFO deviates from the standard TCP semantics, since the data in the SYN could be replayed to an application in some rare circumstances. Applications should not use TFO unless they can tolerate this issue, as detailed in the Applicability section.
Significance of IPv6 Interface Identifiers
The IPv6 addressing architecture includes a unicast interface identifier that is used in the creation of many IPv6 addresses. Interface identifiers are formed by a variety of methods. This document clarifies that the bits in an interface identifier have no meaning and that the entire identifier should be treated as an opaque value. In particular, RFC 4291 defines a method by which the Universal and Group bits of an IEEE link-layer address are mapped into an IPv6 unicast interface identifier. This document clarifies that those two bits are significant only in the process of deriving interface identifiers from an IEEE link-layer address, and it updates RFC 4291 accordingly.
Design considerations for Metadata Insertion
The IAB has published [RFC7624] in response to several revelations of pervasive attack on Internet communications. In this document we consider the implications of protocol designs which associate metadata with encrypted flows. In particular, we assert that designs which do so by explicit actions of the end system are preferable to designs in which middleboxes insert them.
The Harmful Consequences of Postel's Maxim
Jon Postel's famous statement in RFC 1122 of "Be liberal in what you accept, and conservative in what you send" - is a principle that has long guided the design of Internet protocols and implementations of those protocols. The posture this statement advocates might promote interoperability in the short term, but that short term advantage is outweighed by negative consequences that affect the long term maintenance of a protocol and its ecosystem.
Resilience of Deployed TCP to Blind Attacks
A Middlebox-Cooperative TCP for a non End-to-End Internet