Network Working Group J. Whitehead
Internet-Draft U.C. Santa Cruz
Intended status: Informational February 27, 2006
Expires: August 31, 2006
Design Considerations for State Identifiers in HTTP and WebDAV
draft-whitehead-http-etag-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 31, 2006.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
This document discusses design considerations for state identifiers
in the Hypertext Transfer Protocol (HTTP) and related protocols such
as WebDAV.
Editorial Note
Discussion of this draft and comments to the editors should be sent
to the ietf-http-wg@w3.org [1] mailing list, which is archived at
Whitehead Expires August 31, 2006 [Page 1]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Entity Identifiers in HTTP . . . . . . . . . . . . . . . . 3
1.2. Problems with Entity Identifiers as Substitute State
Identifiers . . . . . . . . . . . . . . . . . . . . . . . 4
2. Requirements for State Identifiers and Entity Identifiers . . 6
2.1. Caching Requirements . . . . . . . . . . . . . . . . . . . 6
2.2. End-to-End Integrity Check Requirements . . . . . . . . . 6
2.3. Authoring Requirements . . . . . . . . . . . . . . . . . . 6
2.4. Implementation Driven Requirements . . . . . . . . . . . . 7
3. Current Implementation Behaviors and their Implications . . . 8
4. Ambiguities in the HTTP and WebDAV Specifications . . . . . . 8
4.1. Confusion over the meaning of the Etag returned in a
PUT response . . . . . . . . . . . . . . . . . . . . . . . 8
4.2. Confusion over Semantics of Strong Etags . . . . . . . . . 9
5. Dimensions of a Solution . . . . . . . . . . . . . . . . . . . 9
5.1. Julian's suggestion . . . . . . . . . . . . . . . . . . . 9
5.2. Lisa's suggestion . . . . . . . . . . . . . . . . . . . . 10
5.3. Jim's suggestion . . . . . . . . . . . . . . . . . . . . . 10
6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12
Intellectual Property and Copyright Statements . . . . . . . . . . 13
Whitehead Expires August 31, 2006 [Page 2]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
1. Introduction
In distributed systems such as the World Wide Web, it is useful to
assign unique identifiers to individual states of network resources.
The basic idea is that each time the state of a network resource
changes, its associated state identifier changes too. State
identifiers have the quality that they uniquely identify a particular
state of a network resource; only one identifier will ever be
associated with a given network resource state. These state
identifiers can be used to support caching and remote authoring of
network resources.
In caching, each cache locally stores a state identifier along with
its cached copy of a network resource, and uses the state identifier
to query whether the cached copy is up-to-date. If the cache's local
state identifier is different from the current state identifier of
the original network resource, it indicates the cached copy is stale,
and needs to be refreshed. If the cached and original state
identifiers are the same, it indicates the local copy is up-to-date.
In remote authoring, a remote authoring tool wishes to update the
state of a network resource. A typical authoring session involves
retrieving the current state of the network resource, some editing,
then writing the new state of the network resource back to its
original location. There are two concerns during remote authoring.
One is that another author might try to modify the same network
resource at the same time, leading to the lost update problem. State
identifiers are used to detect this problem. The authoring
application stores a local copy of the network resource's state
identifier at the beginning of the authoring session. When the
application goes to write the new resource state, it compares the
local state identifier with the current state identifier of the
network resource. If they are the same, the network resource has not
been modified, and the write can proceed without danger of overwrite.
The second concern is the remote authoring client wants to know that
the network resource has been stored in the same form it was
submitted. There are many document formats, such as XML, that can
remain semantically equivalent in the face of multiple kinds of
changes to the actual octets stored. The authoring application would
like to know if it can continue to use its local copy of the network
resource, or if it instead needs to reload its local copy from the
original.
1.1. Entity Identifiers in HTTP
Version 1.1 of the Hypertext Transfer Protocol (HTTP) [RFC2616]
supports two identifiers, the Entity Tag (Etag) and the Content-MD5
hash (MD5-hash). Etags are used in HTTP 1.1 for caching, and the Web
Whitehead Expires August 31, 2006 [Page 3]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
Distributed Authoring and Versioning (WebDAV) authoring protocol uses
Etags in conjunction with locks to avoid the lost update problem and
keep local client state in synch with the authoring server. The MD5-
hash is used to perform end-to-end message integrity checks. A key
difference between Etags and MD5-hashes is the Etag is not required
to be computable from the contents of the on-the-wire representation
of a resource, while an MD5-hash is. A consquence is that Etags are
less computationally inexpensive to produce than MD5-hashes.
Within HTTP, there are no provisions for directly interacting with
the state of a network resource. Instead, clients can retrieve
representations of a network resource using the GET method, and can
write representations using PUT. The representation retrieved using
GET can be the result of an arbitrary computational process, or can
be the result of applying a wide range of transformations to a
persistently stored resource. Similarly, a server may apply a range
of transformations to a representation submitted with PUT before
creating a persistently stored resource. A resource representation
on the wire is known as an entity, and both Etags and MD5-hashes are
unique identifiers for this entity.
In the most general case, Etags and MD5-hashes are not state
identifiers, since they uniquely identify only for the on-the-wire
representation of a resource, and do not necessarily identify the
actual resource state. Etags and MD5-hashes are entity identifiers.
For caching, this is usually acceptable, since the cache only wants
to ensure that the client-visible representation of the resource is
maintained up-to-date.
1.2. Problems with Entity Identifiers as Substitute State Identifiers
For authoring, the distinction between entity identifiers and state
identifiers is problematic. Since authoring applications modify the
state of the network resource, the information provided by entity
identifiers does not provide sufficient feedback on the progress of
authoring applications. Changes to the state of a resource must be
inferred from changes in entity identifiers.
Adding to this core difficulty are many accidental ones. HTTP does
not clearly specify the behavior of Etags and PUT. There is no
requirement that a successful PUT response return an Etag or Content-
MD5 header (it is a MAY level requirement for just one of 3 possible
response codes). While many servers do return the Etag header, this
is not universal. If no Etag is received in the PUT response,
clients must perform an additional request to retrieve it.
Unfortunately, there is no mechanism clients can use to determine if
the retrieved Etag represents the one assigned to the PUT entity they
submitted, or the Etag of an entity submitted subsequently by another
Whitehead Expires August 31, 2006 [Page 4]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
client. Since an arbitrary transformation could have taken place on
the PUT entity before it was persistently stored, even an MD5-hash
would be unreliable. This is moot, since there is no mechanism that
clients can use to force a server to return an MD5-hash.
The HTTP specification is also unclear on what Etag should be
returned in the response to a PUT. The current specification states
that for a 201 Created response, "indicating the current value of the
entity tag for the requested variant just created" (Section 10.2.2),
while there is no statement concerning use of Etag with 200 or 204
responses, the other possible responses to a successful PUT. Given
the specification ambiguity, it is conceivable that servers might
return the Etag for the submitted entity, rather than the Etag for
the current GET response for the resource.
Several HTTP servers use filesystem last modified timestamps as their
mechanism for computing Etags. This has the advantage of fast
recall; a simple system call retrieves a resource Etag, and does not
require any computation on the state of the resource itself, or
retrieval of a precomputed Etag from a database. However, since
servers can process multiple write operations within the time span of
the minimum granularity of the operating system clock, such servers
return a provisional Etag immediately, and then upgrade this to a
permanent Etag later. This requires clients to perform an additional
network request to retrieve the final version of the Etag.
Unfortunately, there is no mechanism clients can use to distinguish
between the Etag having been changed due to promotion to a permanent
Etag, or the Etag having been changed due to another authoring client
modifying the resource. As before, MD5-hashes are unreliable.
This combination of the essential different between state identifiers
and entity identifiers, and the several accidental difficulties in
specifying and implementing entity identifiers have combined to
create substantial difficulty for authoring clients using the WebDAV
protocol. These difficulties make it impossible, in the general
case, for authoring clients to have any confidence that they have
successfully written an updated resource to a remote server. Since
this the core operation supported by remote authoring clients, this
problem has broad ramifications for the adoption and use of HTTP-
based remote authoring.
In the remainder of this document we describe the requirements for
clients and servers for state and entity identifiers. When then
document several current behaviors by HTTP servers that contribute to
the difficulty of using the current entity identifiers. Following,
we note several specification ambiguities that contribute to the
problem. We then outline the characteristics of a broad solution to
the problem. Our goal is for this document to be used as a statement
Whitehead Expires August 31, 2006 [Page 5]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
of goals for a subsequent protocol specification that substantially
addresses the concerns raised herein.
2. Requirements for State Identifiers and Entity Identifiers
Three scenarios that drive requirements for state identifiers and
entity identifiers are caching, end-to-end message integrity checks,
and authoring. Additionally, implementation concerns also provide
requirements.
2.1. Caching Requirements
The following two requirements drive the existence of weak and strong
Etags. While the complete set of requirements for HTTP caches is
quite broad, the requirements below are the ones specifically related
to entity identifiers and state identifiers.
A client must be able to determine if its cached copy of the GET
response for a resource is octet-for-octet the same as the current
GET response, without having to re-retrieve the current GET response.
A client must be able to determine if its cached copy of the GET
response for a resource is semantically equivalent to the current GET
response, without having to re-retrieve the current GET response.
Two responses might be considered semantically equivalent even if not
octet-for-octet equivalent if, for example, they had minor
differences in HTML encoding, or some automatically updated value
like a hit counter was considered semantically irrelevant.
2.2. End-to-End Integrity Check Requirements
The following requirement drives the existence of the MD5-hash.
A client or server must be able to determine if an HTTP message has
been transmitted through zero or more intermediaries without
modification to the entity body.
2.3. Authoring Requirements
An authoring client must be able to determine if the state of a
resource at the beginning of an editing session remains unchanged
when the client wishes to update the state of the resource.
An authoring client must be able to determine if the server has made
changes to an entity submitted during PUT that would require the
client to reload the resource to have the correct current state.
This determination must be reliable, in the sense that the client
Whitehead Expires August 31, 2006 [Page 6]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
must be able to receive an unambiguous answer to the query, "has the
server modified the submitted entity prior to its persistent
storage?"
An authoring server must be able to modify the entity submitted using
PUT before persistently storing it. Servers frequently modify
submitted data. Examples based on current applications include
modifying XML to change XML namespace usage, change linear
whitespace, and sometimes modify the character encoding. Versioning
servers also may perform keyword expansion in the body of submitted
source code, e.g., to inject the author, version identifier, date,
etc. Calendar servers may annotate calendar event resources with
server-specific properties.
An authoring client must be able to direct the server to reject a
request to persistently store the resource if it cannot guarantee
octet-for-octet storage of the submitted entity. This requirement is
more speculative than the others, since it does not describe a
strongly expressed existing client need. Still, there are many media
types that cannot withstand any server tampering, such as the native
formats of many kinds of application software that store their
documents in a proprietary binary format. Any server tampering with
these document types would corrupt the document.
WebDAV resources have two types of state, the resource body, a
representation of which is returned by GET, and resource properties,
a representation of which is returned by PROPFIND. An authoring
client must be able to determine if changes have occurred to entries
in both kinds of state. State identifiers must not mix state types.
That is a state identifier for the resource body should be
independent of state identifiers for properties. An open question is
the necessary granularity of state identifiers for properties.
2.4. Implementation Driven Requirements
Since GET is a very common operation, it must be possible for servers
to efficiently compute any entity or state token returned by GET.
Alternately, it must be possible for servers to not return expensive-
to-compute identifiers unless specifically requested by the client.
The response to any write operation must return state identifiers and
entity identifiers associated with the permanent persisted state of
the resource following the operation. This is the only reliable
mechanism for communicating this information to the client.
Whitehead Expires August 31, 2006 [Page 7]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
3. Current Implementation Behaviors and their Implications
_To do:_
1. Document the Apache server behavior of returning a weak etag with
the PUT response then promoting this to a strong etag. Note that
this makes it impossible for clients to reliably determine the
permanent etag associated with the resource.
2. Document a server that performs significant content modification
upon PUT (a CalDAV server?)
3. Others?
4. Ambiguities in the HTTP and WebDAV Specifications
4.1. Confusion over the meaning of the Etag returned in a PUT response
It is currently unclear as to which entity is identified by an Etag
returned in the response to a PUT.
The HTTP specification [RFC2616] states (Section 10.2.2):
"A 201 response MAY contain an ETag response header field
indicating the current value of the entity tag for the requested
variant just created, see section 14.19."
Hence, for situations where a new resource is created, the meaning of
Etag is clear.
The HTTP specification also states (Section 14.19):
"The ETag response-header field provides the current value of the
entity tag for the requested variant."
Since the other success responses for a PUT request (200 and 204)
provide no specification for the meaning of Etag, a strict reading of
the specification is ambiguous, since there is no "requested variant"
here. Presumably the 201 Etag semantics are intended for 200 and 204
responses to PUT, though this is not explicitly stated in the HTTP
specification.
Another possible interpretation is that the server should return the
Etag of the entity just submitted by the client in the PUT request.
In the section below, we describe an ambiguity where the Etag
returned may be expected to vary depending on the amount of
processing the server performs on the submitted entity.
Whitehead Expires August 31, 2006 [Page 8]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
4.2. Confusion over Semantics of Strong Etags
The HTTP specification states (Section 3.11):
A "strong entity tag" MAY be shared by two entities of a resource
only if they are equivalent by octet equality.
When a client performs a PUT, there are two entities in play:
A. the entity submitted by the client in the initial PUT request
B. the entity returned by the server in subsequent GET requests
A question that arises is whether a server can return a strong Etag
if it modifies the submitted entity, A, before persistently storing
it. Supporting a yes viewpoint, we note that the submitted entity,
A, isn't an entity of the resource until the PUT operation succeeds,
because only the success of the operation associates it with the
resource. As a result, it is not reasonable to discuss equivalence
of A and B as two entities of the same resource, since A is not yet
associated with the resource.
Supporting a no viewpoint, we note that the intent of the Etag is to
act as an entity identifier. If we perform two GET operations in a
row on a resource, and we receive the same strong Etag in each
response, we expect two response entity bodies to be octet-for-octet
the same. Hence, if we submit an entity, A, and receive a strong
Etag in return along with entity B, there is an assumption that the
submitted entity has not been modified, and A is octet-for-octet the
same as B. We note that there is no language in the HTTP
specification to support this viewpoint.
5. Dimensions of a Solution
TBD.
There are currently three suggestions.
5.1. Julian's suggestion
Alternative 1: Make strong server requirement; i.e., mandate to only
return ETag if content was written octet-by-octet. Drawback: this in
not required in HTTP, thus potentially implemented differently in
existing servers. No way for a client to tell the difference. Also,
returning strong ETags although content rewriting happens may have
its use cases; it only becomes a problem if the client tries to use
the ETag as cache validator in a byte-range request (which the server
Whitehead Expires August 31, 2006 [Page 9]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
could reject).
Alternative 2: Add a new Response Header through which servers can
indicate whether they need to refetch content or not. Note that
header would not have a default, so clients can simply detect whether
they speak to "new" server. This would also be applicable to other
write methods, such as PROPPPATCH: for instance, would a PROPPATCH
affect the representation of the resource (i.e., metadata is stored
in body such as in JPEG, MP3, Office docs...), the server could
return a new ETag and indicate that the entitiy changed.
[[anchor15: jr -- For reasons of compatibility with existing
implementations, the second alternative seems to be superior to me]]
5.2. Lisa's suggestion
Thus, we RECOMMEND servers supporting ETag and PUT return the ETag
header in the PUT response, and we RECOMMEND clients receiving the
ETag in a PUT response use their local copy of the resource rather
than query the server for a redundant copy.
When a client does not receive an ETag header at all in a PUT
response, the client MUST NOT consider its local copy of the resource
to be up-to-date with the server's copy.
The Get-ETag response-header field provides the value of the entity
tag for the entity of the resource that would be provided on a
subsequent GET request.
Get-ETag = "Get-ETag" ":" entity-tag
The Get-ETag header is appropriate for use when the server can only
guarantee that it can return the entity with that tag in response to
a GET, not an entity that is byte-for-byte equivalent to the entity
the client provided.
5.3. Jim's suggestion
Introduce the notion of a resource body state identifier that
uniquely identifies the persistently recorded state of a resource.
Introduce the notion of a resource property identifier that
identifies the aggregate persistently recorded state of all dead
properties. Then, introduce six new headers, and two new properties:
Resource-State: a response header that indicates the current state
identifier for the resource after successful performance of the
requested operation. In the case of an error, it indicates the state
of the resource prior to the failed operation. This header is only
Whitehead Expires August 31, 2006 [Page 10]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
included in a response if the client specifically requests it using
the Request-Resource-State header.
Property-State: a response header that indicates the current property
state identifier for the resource after successful performance of the
requested operation. In the case of an error, it indicates the state
of the resource prior to the failed operation. This header is only
included in a response if the client specifically requests it using
the Request-Property-State header.
Request-Resource-State: a request header used to request a Resource
State header in the response. May be used with any method.
Request-Property-State: a request header used to request a Property
State header in the response. May be used with any method.
Content-Handling: a response header the MUST be returned by a
successful PUT. Broadly indicates the kind of handling performed by
the server when storing the submitted entity. Acceptable values are
"none" (entity was stored octet-for-octet), "XML" (processing that
modified the entity but did not change the semantics according to XML
rules, only applicable to XML content types), "some" (the server
performed some modification of the entity), "encoding" (the server
changed the entity's content encoding only). Allow for extensions to
this set of values.
Acceptable-Content-Handling: a request header usable in PUT requests
only. A list of tokens from the set defined with Content-Handling.
If the response Content-Handling header value is not one of the
tokens listed in this header, the request MUST fail.
Add a new value to the DAV header, "fixed-PUT" to indicate support
for these semantics.
Two new properties, one to represent the resource state identifier,
and the other to represent the aggregate property identifier.
6. References
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[1]
Whitehead Expires August 31, 2006 [Page 11]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
Author's Address
Jim Whitehead
UC Santa Cruz, Dept. of Computer Science
1156 High Street
Santa Cruz, CA 95064
Email: ejw@cse.ucsc.edu
Whitehead Expires August 31, 2006 [Page 12]
Internet-Draft State Identifiers in HTTP/WebDAV February 2006
Full Copyright Statement
Copyright (C) The Internet Society (2006).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Whitehead Expires August 31, 2006 [Page 13]