Network Working Group Jeffrey Mogul, Compaq WRL, Internet-Draft Fred Douglis, AT&T, Expires: 25 February 2001 Daniel Hellerstein, ERS/USDA 24 August 2000 HTTP Delta Clusters and Templates draft-mogul-http-dcluster-00.txt STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Distribution of this document is unlimited. Please send comments to the authors. ABSTRACT HTTP "Delta encoding," the transmission of a compact encoding of the change between instances of a Web resource instead of retransmitting the entire new value, has been shown to be of potential value. Research has shown additional benefits if deltas can be computed between instances of different resources. This document describes a compatible extension to HTTP delta encoding to support "clustering", where multiple resources (URLs) are treated as a pool, and the use of "templates", where a large set of resource instances are most naturally described as deltas from a chosen template resource. Mogul et al. [Page 1] Internet-Draft Delta clustering 24 August 2000 16:15 TABLE OF CONTENTS 1 Introduction 3 1.1 Related research and proposals 4 2 Terminology 5 3 Delta-encoding and clustering 6 4 Use of templates 8 5 Specification 11 5.1 Modified basic requirements for delta-encoded responses 11 5.2 Modified header specifications 12 5.2.1 A-IM 12 5.3 New header specifications 12 5.3.1 DCluster 12 5.3.2 DTemplate 13 5.4 Rules for determining base instances in a uniqueness scope 13 6 Security Considerations 15 6.1 Spoofing attacks using the DCluster header 15 6.2 Privacy attacks using the DCluster header 17 6.3 Data leakage attacks using the DCluster header 18 7 History 18 7.1 draft-mogul-http-dcluster-00.txt 18 8 Acknowledgements 18 9 References 18 10 Authors' addresses 20 Mogul et al. [Page 2] Internet-Draft Delta clustering 24 August 2000 16:15 1 Introduction WARNING: THIS SPECIFICATION WILL CHANGE. DO NOT DEPLOY ANY IMPLEMENTATIONS BASED ON THIS SPECIFICATION. The World Wide Web is a distributed system, and so often benefits from caching to reduce retrieval delays. Retrieval of a Web resource (such as document, image, icon, or applet) over the Internet or other wide-area network usually takes enough time that the delay is over the human threshold of perception. Often, that delay is measured in seconds. Caching can often eliminate or significantly reduce retrieval delays. Many Web resources change over time, so a practical caching approach must include a coherency mechanism, to avoid presenting stale information to the user. Originally, the Hypertext Transfer Protocol (HTTP) provided little support for caching, but under operational pressures, it quickly evolved to support a simple mechanism for maintaining cache coherency. In HTTP/1.0 [2], the server may supply a ``last-modified'' timestamp with a response. If a client stores this response in a cache entry, and then later wishes to re-use the response, it may transmit a request message with an ``If-modified-since'' field containing that timestamp; this is known as a conditional retrieval. Upon receiving a conditional request, the server may either reply with a full response, or, if the resource has not changed, it may send an abbreviated reply, indicating that the client's cache entry is still valid. HTTP/1.0 also includes a means for the server to indicate, via an ``Expires'' timestamp, that a response will be valid until that time; if so, a client may use a cached copy of the response until that time, without first validating it using a conditional retrieval. HTTP/1.1 [6] adds many new features to improve cache coherency and performance. However, it preserves the all-or-none model for responses to conditional retrievals: either the server indicates that the resource value has not changed at all, or it must transmit the entire current value. Common sense suggests (and traces confirm), however, that even when a Web resource does change, the new instance is often substantially similar to the old one. If the difference, or ``delta'', between the two instances could be sent to the client instead of the entire new instance, a client holding a cached copy of the old instance could apply the delta to construct the new version. In a world of finite bandwidth, the reduction in response size and delay could be significant. One can think of deltas as a way to squeeze as much benefit as possible from client and proxy caches. Rather than treating an Mogul et al. [Page 3] Internet-Draft Delta clustering 24 August 2000 16:15 entire response as the ``cache line,'' with deltas we can treat arbitrary pieces of a cached response as the replaceable unit, and avoid transferring pieces that have not changed. A separate document [8] specifies a set of compatible extensions to HTTP/1.1 that allow clients and servers to use delta encoding with minimal overhead. That mechanism only supports deltas between instances of a single resource. This document specifies further extensions to the delta encoding mechanism. These extensions allow deltas to be computed between instances of different resources. This increases the likelihood that a compact delta might be found to encode the current instance of a requested resource. We assume that the reader is familiar with the HTTP/1.1 specification, and with the delta encoding specification. 1.1 Related research and proposals The WebExpress project [7] appears to be the first published description of an implementation of delta encoding for HTTP (which they call ``differencing''). WebExpress is aimed specifically at wireless environments, and includes a number of orthogonal optimizations. Also, the WebExpress design does not propose changing the HTTP protocol itself, but rather uses a pair of interposed proxies to convert the HTTP message stream into an optimized form. The results reported for WebExpress differencing are impressive, but are limited to a few selected benchmarks. The WebExpress paper also pointed out that in many cases, the individual responses to different queries with the same ``URL prefix'' (that is, the prefix of the URL before the ``?'' character) are often similar enough to make delta encoding effective. Since users frequently make numerous different queries using the same URL prefix, it might be much more effective to compute deltas between different queries for a given URL prefix, rather than simply between different queries using an identical URL. Banga et al. [1] make a similar observation. A 1997 trace-based study [9] showed that this approach has significant potential for improving the bandwidth requirements. The "clustering" mechanism described in this specification is intended to support the use of delta encoding in contexts where the delta is computed between two different URLs. The WebExpress project [7] adopted the concept of a designated ``base object'', rather than simply relying on a prefix-matching mechanism. WebExpress included a mechanism for ``rebasing'' a client (providing it with a new base object). The "templates" mechanism described in this specification supports a very similar approach. The approaches described above, and in this specification, operate independent of the syntax and semantics of the data being transferred Mogul et al. [Page 4] Internet-Draft Delta clustering 24 August 2000 16:15 (although delta encoding algorithms for images may require some specialization). They function by decomposing responses at the bit or byte level into currently-cached and need-to-be-transferred components. One can also do this decomposition at a higher level. Douglis et al. [5] describe an "HTML macro" mechanism, in which a set of similar HTML pages is decomposed into a constant component (akin to a macro body) and a variable component (akin to macro arguments). In many cases, the variable component can be quite small; this means once the constant component is in a cache, references to similar pages require fetching only the small variable component, at a significant cost savings over transferring a monolithic response. The main drawback to the HTML macro approach is that it requires direct involvement by the designer (or software) when generating the Web pages, including some careful attention to the decomposition of a set of similar pages. It might also require some additional language-level standardization, although this perhaps could be obviated through the use of Java-based macros. Therefore, support for HTML macros is beyond the scope of this specification. 2 Terminology HTTP/1.1 [6] defines the following terms: resource A network data object or service that can be identified by a URI, as defined in section 3.2. Resources may be available in multiple representations (e.g. multiple languages, data formats, size, resolutions) or vary in other ways. entity The information transferred as the payload of a request or response. An entity consists of metainformation in the form of entity-header fields and content in the form of an entity-body, as described in section 7. variant A resource may have one, or more than one, representation(s) associated with it at any given instant. Each of these representations is termed a `variant.' Use of the term `variant' does not necessarily imply that the resource is subject to content negotiation. The specification for delta encoding [8] defined these additional terms: instance The entity that would be returned in a status-200 response to a GET request, at the current time, for the selected variant of the specified resource, with the application of zero or more content-codings, but Mogul et al. [Page 5] Internet-Draft Delta clustering 24 August 2000 16:15 without the application of any instance manipulations or transfer-codings. instance manipulation An operation on one or more instances which may result in an instance being conveyed from server to client in parts, or in more than one response message. For example, a range selection or a delta encoding. Instance manipulations are end-to-end, and often involve the use of a cache at the client. See that specification for further discussion of those terms. For the extensions specified in this document, we define one more term: uniqueness scope The uniqueness scope of an entity tag is the set of resources across which this entity tag is unique for all time. That is, within this set of resources, if two instances share an entity tag, then the values of these instances (including their instance bodies and their instance headers) are equal. In unmodified HTTP/1.1, the uniqueness scope of an entity tag is always a single resource. In this proposal, we provide a means to extend the uniqueness scope to include multiple resources. 3 Delta-encoding and clustering The basic delta-encoding model assumes that deltas are computed between two instances of a specific resources; i.e., both deltas are associated with a single URL. However, the WebExpress project [7] suggested that by treating a query URL (that is, a URL with an embedded ``?'') as a prefix followed by a set of parameters, one could then profitably compute deltas between resource values whose URLs have identical prefixes, but perhaps different parameters (suffixes). Our trace-based study confirmed this [10]. We believe that this might be generalized to certain other patterns of URLs (i.e., not just those using ``?'' as a separator). We use the term ``clustering'' for this approach. For example, if a client has cached a response for a DEC stock quote (``http://quote.yahoo.com/q?s=DEC&d=f''), and then requests a quote for AT&T from the same server (``http://quote.yahoo.com/q?s=T&d=f''), the prefix for the cluster would be ``http://quote.yahoo.com/q?''. In order to support clustering, we need a mechanism for the server to indicate to the client which URLs are eligible for clustering (since it would be highly inefficient for the client to send the entity tags of every resource in its cache on every request). Mogul et al. [Page 6] Internet-Draft Delta clustering 24 August 2000 16:15 We propose a new, optional response header for this purpose, to specify a URL-prefix for other resources that ``cluster'' with the given response. The header name is ``DCluster''. Once a cluster-eligible response is cached, when the client is about to make a subsequent request, it would match the request-URI against all of the URL-prefixes in its cache. (As specified in section 5.3.1, only cache entries received after the matching DCluster header are eligible.) The ``If-None-Match'' field in its request could then list the entity tags for all of the matching entries. In some cases, it might be more efficient to list only a subset (such as the most recently received cache entries), to avoid excessive request header lengths. For example, if a client makes this initial request: GET /foo?p=1 HTTP/1.1 Host: bar.example.net and receives this response: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "abc" DCluster: "//bar.example.net/foo?" then when the client later makes a request for ``http://bar.example.net/foo?p=2'', it can match the stored cluster prefix in its cache, and generate this request: GET /foo?p=2 HTTP/1.1 Host: bar.example.net If-None-Match: "abc" A-IM: vcdiff As a generalization, the DCluster header field may include multiple URL-prefixes, to allow specification of a set of URIs that do not share a single common prefix. In order to use this approach to clustering, we need to impose one important constraint. HTTP/1.1 requires so-called ``strong'' entity tags to be unique for a given URI, but does not impose any broader requirements on the uniqueness of entity tags. However, if a server sends a ``DCluster'' header, this implies that the entity tag in the response is unique not only for the Request-URI, but also for all URIs for which the string given by ``DCluster'' is a prefix. We call this set of URIs the ``uniqueness scope'' of the entity tag. Note that a response might carry multiple ``DCluster'' header fields (or, by the basic HTTP syntax rules, one such header field with a comma-separated list of prefix strings). This means that the Mogul et al. [Page 7] Internet-Draft Delta clustering 24 August 2000 16:15 uniqueness scope is the union of the scopes specified by the set of prefixes, plus the original Request-URI. Because the URI in a ``DCluster'' header field can be an absolute URI (i.e., contain a host name), a uniqueness scope can span multiple servers. Presumably, these servers have some out-of-band means to maintain the uniqueness property. A client making a request may have cache entries for many different resources in the uniqueness scope of the Request-URI. This is another situation where the ability of ``If-None-Match'' to carry multiple entity tags is employed. Abstractly, when the client makes a request for which it wants a delta-encoded response, it finds all of its cache entries in the same uniqueness scope, then sends the entity tags for these cache entries in an ``If-None-Match'' header. It would not make sense to have an extremely broad uniqueness scope (i.e., one that includes large numbers of resources), because this would imply that a client that has cache entries for many of those files would send lots of entity-tags in its request for a delta. This would bloat the request message, obviating the transfer-time reduction of the delta encoding. Therefore, in actual use, the ``DCluster'' header field value should represent not the entire uniqueness scope, but a subset of the uniqueness scope that is most likely to result in small deltas. Client implementations, however, should be prepared to prune their ``If-None-Match'' headers in case a server inadvertently (or maliciously) specifies an over-broad uniqueness scope. Server implementation that support clustering should minimize the length of the entity tags that they generate, consistent with the other requirements for entity tags, since the effect of overlong entity on request-header size is potentially multiplied many times by the use of clustering. Note that the ``DCluster'' header can be used in a potential spoofing attack. This attack, and defenses against it, are discussed in section 6.1. 4 Use of templates The model of delta encoding outlined so far requires the server to compute a delta between the current instance of the resource and some previous instance of that resource, or (if clustering is used) a previous instance of some other resource. This means that the base instance is, in effect, a moving target, since we do not want to require servers or clients to retain old instances for indefinite periods. Mogul et al. [Page 8] Internet-Draft Delta clustering 24 August 2000 16:15 Douglis et al. describe an approach to dynamically-generated documents in which the document is broken down into separate static and dynamic parts [5]. The static part is a macro with unbound variables, and the dynamic part is a set of bindings between variables and specific values. In their mechanism, the client retains the static part, called a ``template'' in its cache. It repeatedly requests, as needed, a new instance of the dynamic part, and then reevaluates the template macro, with its variables bound as specified in the dynamic part, in order to generate the current instance of the entire document. Their macro language is an extension to HTML, although other languages (such as Java) might be just as suitable. The WebExpress project [7] adopted the concept of a designated ``base object'', which is nearly identical to the template concept described here. WebExpress included a mechanism for ``rebasing'' a client (providing it with a new base object). The primary difference between the WebExpress approach and our approach is the time at which a client discovers the identity of a (possibly new) template. We can apply a similar template-based mechanism to substantially simplify the use of delta encoding. In this approach, the server ``computes'' the delta between the current instance of a resource, and a separately-identified template resource. (Depending on the encoding format, it might be possible to generate the delta directly, rather than generating the current instance and then computing a delta.) The client then applies the delta to the template resource, rather than to a previous instance of the requested resource. Since this approach avoids the need to retain old instances of the dynamic resource at either the client or the server, it greatly simplifies the implementation and optimization of base instance management at both client and server. However, it requires a new mechanism to inform the client of the appropriate template resource, and its success may depend on the proper construction of the template. To support template-base deltas, therefore, we define a new response header that the origin server uses as a ``hint'' to inform a client of the URI of the template resource. For example, if the client request is GET /foo.html HTTP/1.1 Host: bar.example.net A-IM: vcdiff the server might send: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Mogul et al. [Page 9] Internet-Draft Delta clustering 24 August 2000 16:15 Etag: "abc" DTemplate: "http://bar.example.net/foo.tplt" The implication of the DTemplate header is that, on subsequent requests for http://bar.example.net/foo.html, the client should ask for a delta between http://bar.example.net/foo.tplt and the current instance. This means, of course, that the client would first have obtained and cached an instance of http://bar.example.net/foo.tplt. The client might retrieve the template either on demand (i.e., just before making the new request for foo.html), or during an otherwise idle moment, or not at all (since the use of deltas is fully optional). The DTemplate header implies that the specified URL is within the uniqueness scope of the Request-URI (or else it would not be meaningful to ask for a delta between the template and the Request-URI). For example, if the client requests the template: GET /foo.tplt HTTP/1.1 Host: bar.example.net and receives the response: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:47 GMT Etag: "pqr" then the client can make a subsequent request for foo.html as: GET /foo.html HTTP/1.1 Host: bar.example.net If-None-match: "pqr" A-IM: vcdiff Alternatively, the DTemplate header field can be used to specify that a specific instance of a resource (rather than any available instance) be used as a template, by including an entity tag in the header field. For example: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "abc" DTemplate: "http://bar.example.net/foo.tplt"/etag="pqr" This form of the header further simplifies the instance-management problem, by eliminating any ambiguity about which instances are worth saving. It might, however, reduce the possibilities for delta encoding. Finally, the DTemplate and DCluster headers can be combined. For example: Mogul et al. [Page 10] Internet-Draft Delta clustering 24 August 2000 16:15 HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "abc" DTemplate: "http://bar.example.net/foo.tplt" DCluster: "//bar.example.net/foo?" This means that for any Request-URI matching the prefix specified in the DCluster header field, the URI specified in the DTemplate field is an appropriate template. Note that an origin server ought not necessarily send a DTemplate header field on every response; doing so could waste network bandwidth, if the recipient is not delta-capable. Instead, the server should employ heuristics to decide whether to send this header field. For example, it might be worth sending it whenever the client's request message indicates its willingness to accept a delta-encoded response, and when the If-None-Match field in the request does not already specify the entity-tag of the template resource. 5 Specification In this specification, the The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" document are to be interpreted as described in RFC2119 [4]. 5.1 Modified basic requirements for delta-encoded responses The basic requirements for delta-encoded responses, specified in [8], are modified for servers that support the DCluster and/or DTemplate header fields. A server MAY send a delta-encoded response if: 1. The server would be able to send a 200 (OK) response for the request. 2. The client's request includes an A-IM header field listing at least one delta-coding. 3. The client's request includes an If-None-Match header field listing at least one valid entity tag for an instance (a "base instance") of at least one of: a. the Request-URI. b. a different URI within the uniqueness scope of the Request-URI. c. a URI that matches a uri-prefix in a DTemplate header field that was sent in a response for a URI within the uniqueness scope of the Request-URI. Mogul et al. [Page 11] Internet-Draft Delta clustering 24 August 2000 16:15 XXX Anything else? 5.2 Modified header specifications One of the headers defined in the specification for delta encoding [8] has a slightly different meaning when delta clustering or delta templates are used. 5.2.1 A-IM When an A-IM request-header field includes one or more delta-coding values, the request MUST contain an If-None-Match header field, listing one or more entity tags from URIs in the uniqueness scope of an entity tag from a prior response for the request-URI. Section 5.4 defines rules that a client uses for determining the set of base instances in the uniqueness scope of a request-URI. 5.3 New header specifications The following headers are defined, for use as entity-headers. (Due to the terminological confusion discussed in [8], some entity-headers are more properly associated with instances than with entities.) 5.3.1 DCluster The DCluster entity-header field is used in a response to specify a subset of the uniqueness scope of the entity tag given in the Etag header field of the response. The uniqueness scope is the set of URIs across which this strong entity tag is guaranteed to be unique, for all time. A uniqueness scope is specified by providing one or more prefixes for other URIs in the set. DCluster = "DCluster" ":" #( <"> uri-prefix <">) uri-prefix = scheme ":" "//" host [ ":" port ] [ abs_path ] | abs_path | rel_path If the uri-prefix is an abs_path or rel_path, the implied scheme is the scheme used in the Request-URI. (Typically, the scheme would be "http".) If the uri-prefix is an abs_path, it is interpreted relative to the origin server host name. If the uri-prefix is a rel_path, it is interpreted relative to the Request-URI. The uniqueness scope of a strong entity tag in an ETag header field always includes the Request-URI of the corresponding request, and the union of all URIs matching one or more of the uri-prefix strings in the DCluster header field of the response. It may include other URIs not described in a DCluster header field. That is, the set of URIs for which a uri-prefix in a DCluster header field is a prefix MUST be a subset of the uniqueness scope, and MAY be a proper subset. Generally, the DCluster header does not necessarily describe the entire uniqueness scope of an entity tag. Rather, it describes a subset of the uniqueness scope whose members are likely to differ by small deltas. Mogul et al. [Page 12] Internet-Draft Delta clustering 24 August 2000 16:15 A server SHOULD NOT include a uri-prefix in a DCluster header field if the server is not likely to be able to generate deltas between the Request-URI and the URIs matching that uri-prefix. The uniqueness scope specified by a DCluster header is valid for use by the client only for entity tags received in the same response or in subsequent responses, never for entity tags received in previous responses. Section 5.4 defines rules that a client uses for determining the set of base instances in the uniqueness scope of a request-URI. 5.3.2 DTemplate The DTemplate entity-header field is used in a response to specify another resource that the origin server prefers to use as the base instance for computing deltas for the Request-URI, or for other resources in the uniqueness scope specified by a DCluster header field in the response. DTemplate = "DTemplate" ":" #( <"> dt-uri <"> [ "/" dt-param]) dt-uri = absoluteURI | abs_path dt-param = "etag" "=" entity-tag If the dt-uri is an abs_path, it is interpreted relative to the origin server host name. A URI specified in a DTemplate header field is, by definition, in the uniqueness scope of the Request-URI. If a client has received a DTemplate header field within a given uniqueness scope, the client SHOULD use an instance of the specified template resource(s) as the base instance for any future delta requests for other resources in the uniqueness scope. If the DTemplate header field includes an entity tag with a URI, then the client SHOULD use only the specified instance of the template resource base instance for any future delta requests for other resources in the uniqueness scope. The URI specified by a DTemplate header is valid for use by the client only with entity tags received in the same response or in subsequent responses, never for use with entity tags received in previous responses. 5.4 Rules for determining base instances in a uniqueness scope When a client is about to make a request for a given Request-URI, and wishes to choose entity tags to the request's If-None-Match header field, it follows a set of rules to determine which base instances (and hence, which entity tags) may be included. These rules do not require the client to include any entity tags, and for reasons of Mogul et al. [Page 13] Internet-Draft Delta clustering 24 August 2000 16:15 performance, a client implementation should not necessarily include all of the legal choices. Recall that the uniqueness scope of an entity tag is the set of resources across which this entity tag is unique for all time. In other words, if the client and server correctly agree that the Request-URI is contained in the uniqueness scope for an entity tag E for some URI X, then if the client sends this entity tag E in an If-None-Match header field, the server will know unambiguously which resource it refers to (even though X is not explicitly named in the request). The client's view of the uniqueness scope of an entity tag might be a subset of the server's view. (It cannot be a superset, or the server would be unable to interpret the If-None-Match field.) For example, a server might not list all possible uri-prefix values in a DCluster header, for performance reasons, or the client might not support the DTemplate header. A client probably will not have received responses for more than a small subset of the URIs in a uniqueness scope, or it might have deleted some of the instances in order to create space in its cache. A client SHOULD NOT list an entity tag in an If-None-Match header unless it has a cache entry containing at least part of the corresponding instance, since this would otherwise lead to uninterpretable delta responses. A Request-URI is in the uniqueness scope of an entity tag E for an instance of URI X if one or more of these conditions holds: 1. X is the Request-URI. 2. The DCluster header field of a prior response for the Request-URI includes a prefix of X. The base instance associated with entity tag E MUST NOT have been received before the first such DCluster header field. 3. The DCluster header field of a prior response for X includes a prefix of the Request-URI. The base instance associated with entity tag E MUST NOT have been received before the first such DCluster header field. 4. X has been listed in the DTemplate header field of a prior response for the Request-URI, or of a prior response for another URI Y in the uniqueness scope of the Request-URI (by recursive application of these conditions to an instance of URI Y). XXX Is this unambiguous? Security considerations (see section 6.1) require that a client not always trust every DCluster header that it receives. A malicious server might send a DCluster header that could cause the client to Mogul et al. [Page 14] Internet-Draft Delta clustering 24 August 2000 16:15 believe that a URI is within the uniqueness scope of an entity tag when, in fact, it is not. Therefore, a client MUST NOT use condition #3 above (DCluster of a prior response for X includes prefix of Request-URI) unless it can securely verify that a resulting delta is not spoofed. Our current belief is that spoofing can be detected by any one of the following means: - The delta-encoded response is accompanied by a secure message digest covering the entire current instance, generated by the origin server. This allows the client to verify that it has received the current instance of the Request-URI. - All of the URIs in the uniqueness scope of the Request-URI have the same "hostport" as the Request-URI; see RFC2396 [3] for the specification of this term. This ensures that, if no interception mechanism is in use, that the client receives what the server wishes it to receive. (In general, malicious interception mechanisms create broader risks than the spoofing of deltas.) - All of base instances associated with the entity tags listed in the client's A-IM header came from URIs listed in DCluster or DTemplate headers in responses for prior Request-URIs having the same "hostport" as the current Request-URI. This ensures that the chosen base instances came from origin servers trusted by the origin server for the current Request-URI. Note: the spoofing detection mechanisms listed above should be reviewed by competent security experts. 6 Security Considerations Note: This aspect of the specification is the subject of some controversy, and the details of protections against spoofing attacks in particular are likely to change. We will seek a more formal security review of this specification as part of the IETF standardization process. 6.1 Spoofing attacks using the DCluster header We have identified a potential spoofing attack via the ``DCluster'' header. In this scenario, a malicious server (e.g., malicious.example.org) generates a response (e.g., for http://malicious.example.org/trap.html) with a ``DCluster'' header indicating that the uniqueness scope of the entity tag in the response includes another server (e.g., victim.example.com). Suppose that the response includes the entity tag "abc". Now suppose that the client makes this request: Mogul et al. [Page 15] Internet-Draft Delta clustering 24 August 2000 16:15 GET /foo.html HTTP/1.1 host: victim.example.com If-None-Match: "abc" A-IM: vcdiff If the victim.example.com server does actually have an instance with entity tag "abc", either for http://victim.example.com/foo.html or for a resource that really is in the same uniqueness scope, then the server will generate a delta. However, if the client applies this delta to the cached response for http://malicious.example.org/trap.html, it will end up either with garbage, or (more perniciously) with an apparently genuine result that actually contains bogus information inserted by malicious.example.org. (The response for http://malicious.example.org/trap.html might contain the bogus information concealed in HTML comments.) Protection against this attack can be accomplished by the use of end-to-end digests on the instances, as described in another proposal [11]. (Message digests, such as provided by ``Content-MD5'' or by Digest Authentication, are not sufficient, since none of the individual messages are tampered with in this attack.) Note that protection against spoofing via the ``DCluster'' header does not inherently require a keyed digest. Since the delta encoded response for http://victim.example.com/foo.html is not itself generated by malicious.example.org, an end-to-end digest included with this response by victim.example.com is sufficient to prove that the client's reconstruction of foo.html is correct. However, if message tampering is also a possibility, then the server should also provide a keyed message digest. Another defense against such an attack is for the client to ignore a ``DCluster'' header that specifies a different server. However, this defense is only effective if servers that generate delta-encoded responses are not shared among multiple, possibly mutually untrustworthy, content providers. It also reduces the potential effectiveness of clustering, especially for large sites split across multiple servers. Note that because the DTemplate header field also adds one or more URIs to the uniqueness scope of an entity tag, the same spoofing attack is possible using the DTemplate header, and the same defenses apply. We recommend that if a client receives a delta-encoded response without an accompanying Digest, and if the client's view of the uniqueness scope for the Request-URI includes more than one server hostname, then the response should either be discarded, or presented to the user as potentially corrupt. Mogul et al. [Page 16] Internet-Draft Delta clustering 24 August 2000 16:15 6.2 Privacy attacks using the DCluster header Many people have drawn attention to the privacy risks associated with HTTP Cookies, which allow a site (or group of cooperating sites) to track the activity of a user. More recently, Martin Pool has identified a similar tracking mechanism based on cache validators, especially entity tags [12]. In this attack, a site encodes user-specific information in an entity tag, and then tracks repeated requests by that user to the same resource, as the user's browser attempts to validate its cache entry using that entity tag. Although this tracks only the requests for a specific resource (URL), a site can indirectly track references to many other pages by embedding an image reference to the tracked URL on each of those pages. Just as with Cookies, the entity-tag tracking mechanism depends upon the server's ability to induce the client to send back a specific string on subsequent requests. However, the basic entity-tag tracking mechanism only allows a site to track access to pages that it controls. The ``DCluster'' header field specified in this document makes this tracking mechanism more powerful, by allowing one site to gain access to entity tags from many other sites. For example, suppose that the site evil.example.com knows the format used to encode client-specific information in entity tags issued by the site naive.example.com. Any client who visits http://evil.example.com/home.html and receives a DCluster: http://naive.example.com/ header in response might then later make a delta-capable request to evil.example.com that includes entity tags issued by naive.example.com. It might be possible to defend against such ``hijacked'' tracking attacks by chosing a cryptographically strong encoding for the client-specific data hidden in entity tags, but this might not always be feasible. In any event, this could not hide from evil.example.com the fact that the client had at some point visited naive.example.com (which could be significant if this site provided, for example, medical information about an embarrassing disease). Cryptographic digests of instances, as described in section 6.1 to protect against DCluster spoofing, do not help, because the malicious site in this case is the source of the requested data, and need not actually use a delta encoding to accomplish its attack. As in section 6.1, one possible defense is for the client to ignore a ``DCluster'' header that specifies a different server, but (also as discussed in section 6.1) this is not ideal. Mogul et al. [Page 17] Internet-Draft Delta clustering 24 August 2000 16:15 User agents SHOULD provide a method to allow users to disable the use of the ``DCluster'' header, preferably either in all cases, or in cross-site cases. 6.3 Data leakage attacks using the DCluster header Suppose that a server has asserted, using a DCluster header, that resources URL1 and URL2 are in the same uniqueness scope. Also suppose that a client is allowed to access URL1, but is not allowed to access URL2. (Access may be denied due to a lack of authentication, or a server configuration setting, or some other mechanism.) Finally, suppose that the client can guess or obtain the entity tag ET2 of some instance of URL2. If the client asks the server for the current instance of URL1 as a delta from the ET2 instance of URL2, and the server responds with such a delta, this may reveal information about the contents of URL2. (The amount of information revealed depends strongly on the delta-coding format, and probably will not be enough to recover the full contents of URL2.) A server MUST NOT reply using a delta encoding, if the chosen base instance is not an instance of the Request-URI, unless the server can verify that the client would currently be allowed access to both the chosen base instance and the Request-URI. 7 History 7.1 draft-mogul-http-dcluster-00.txt This document was split off from draft-mogul-http-delta-*.txt, to avoid having the security issues affect the basic HTTP delta encoding specification, and to ensure that the design of clusters and templates was done so that they are entirely optional for implementors of basic delta encoding. 8 Acknowledgements Andrew Birrell alerted us to the possibility of data leakage attacks using the DCluster header. Koen Holtman contributed to the drafting of this document, and especially to the security considerations and mechanisms. 9 References NOTE TO RFC EDITOR: many of the references here might be out of date. Please verify these with the primary author of this Internet-Draft before issuing this document as an RFC. 1. Gaurav Banga, Fred Douglis, and Michael Rabinovich. Optimistic Deltas for WWW Latency Reduction. Proc. 1997 USENIX Technical Conference, Anaheim, CA, January, 1997, pp. 289-303. Mogul et al. [Page 18] Internet-Draft Delta clustering 24 August 2000 16:15 2. T. Berners-Lee, R. Fielding, and H. Frystyk. Hypertext Transfer Protocol -- HTTP/1.0. RFC 1945, HTTP Working Group, May, 1996. 3. T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax. RFC 2396, IETF, August, 1998. 4. S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. RFC 2119, Harvard University, March, 1997. 5. Fred Douglis, Antonio Haro, and Michael Rabinovich. HPP: HTML Macro-Preprocessing to Support Dynamic Document Caching. Proc. USENIX Symposium on Internet Technologies and Systems, USENIX, Monterey, CA, December, 1997, pp. 83-94. 6. Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk Nielsen, Larry Masinter, Paul Leach, and Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1. RFC 2616, HTTP Working Group, June, 1999. 7. Barron C. Housel and David B. Lindquist. WebExpress: A System for Optimizing Web Browsing in a Wireless Environment. Proc. 2nd Annual Intl. Conf. on Mobile Computing and Networking, ACM, Rye, New York, November, 1996, pp. 108-116. http://www.networking.ibm.com/art/artwewp.htm. 8. Jeffrey C. Mogul, Balachander Krishnamurthy, Fred Douglis, Anja Feldmann, Yaron Goland, and Arthur van Hoff. Delta encoding in HTTP. Internet-Draft draft-mogul-http-delta-06, IETF, August, 2000. This is a work in progress. 9. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. Proc. SIGCOMM '97, Cannes, France, September, 1997, pp. 181-194. 10. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. Research Report 97/4, DECWRL, July, 1997. URL http://www.research.digital.com/wrl/techreports/abstracts/97.4.html. 11. Jeffrey C. Mogul and Arthur Van Hoff. Instance Digests in HTTP. Internet-Draft draft-mogul-http-digest-02, IETF, March, 2000. This is a work in progress. 12. Martin Pool. meantime: non-consensual http user tracking using caches. http://www.linuxcare.com.au/mbp/meantime/. Mogul et al. [Page 19] Internet-Draft Delta clustering 24 August 2000 16:15 10 Authors' addresses Jeffrey C. Mogul Western Research Laboratory Compaq Computer Corporation 250 University Avenue Palo Alto, California, 94305, U.S.A. Email: mogul@pa.dec.com Phone: 1 650 617 3304 (email preferred) Fred Douglis AT&T Labs - Research 180 Park Ave, Room B-137 Florham Park, NJ 07932-0971, U.S.A. Email: douglis@research.att.com Phone: 1 973 360-8775 Daniel M. Hellerstein Economic Research Service, USDA 1909 Franwall Ave, Wheaton MD 20902 E-mail: danielh@crosslink.net or webmaster@srehttp.org Phone: 1 202 694-5613 or 1 301 649-4728 Mogul et al. [Page 20]