IETF March 1999 Proceedings

Current Meeting Report
Slides

2.1.11 WWW Distributed Authoring and Versioning (webdav)

NOTE: This charter is a snapshot of the 44th IETF Meeting in Minneapolis, Minnesota. It may now be out-of-date. Last Modified: 16-Feb-99

Chair(s):

Jim Whitehead <ejw@ics.uci.edu>

Applications Area Director(s):

Keith Moore <moore@cs.utk.edu>
Patrik Faltstrom <paf@swip.net>

Applications Area Advisor:

Keith Moore <moore@cs.utk.edu>

Mailing Lists:

General Discussion:w3c-dist-auth@w3.org
To Subscribe: w3c-dist-auth-request@w3.org
In Body: Subject of subscribe
Archive: http://www.w3.org/pub/WWW/Archives/Public/w3c-dist-auth/

Description of Working Group:

This working group will define the HTTP extensions necessary to enable distributed web authoring tools to be broadly interoperable, while supporting user needs.

The HTTP protocol contains functionality which enables the editing of web content at a remote location, without direct access to the storage media via an operating system. This capability is exploited by several existing HTML distributed authoring tools, and by a growing number of mainstream applications (e.g. word processors) which allow users to write (publish) their work to an HTTP server. To date, experience from the HTML authoring tools has shown they are unable to meet their user's needs using the facilities of the HTTP protocol. The consequence of this is either postponed introduction of distributed authoring capability, or the addition of nonstandard extensions to the HTTP protocol. These extensions, developed in isolation, are not interoperable.

An ad-hoc group has analyzed the functional needs of several organizations, and has developed requirements for distributed authoring and versioning. These requirements encompass the following capabilities, which shall be considered by this working group:

IN-SCOPE:

*Locking: lock, lock status, unlock

*Name space manipulation: copy, move/rename, resource redirection (e.g. 3xx response codes)

*Containers: creation, access, modification, container-specific semantics

*Attributes: creation, access, modification, query, naming

*Notification of intent to edit: reserve, reservation status, release reservation

*Use of existing authentication schemes

*Access control

*Unprocessed source retrieval

*Informing proxies of an action's impact

*Versioning:

*Checkin/Checkout

*History graph

*Differencing

*Automatic Merging

*Naming and accessing resource versions

Further information on these requirements can be found in the document, "Requirements for Distributed Authoring and Versioning on the World Wide Web". <http://www.ics.uci.edu/~ejw/authoring/webdav-req-00.html

While the scope of activity of this working group may seem rather broad, in fact much of the functionality under consideration is well understood, and has been previously considered. This working group will leverage off of previous work when it is applicable. Discussion of the security issues concerning distributed authoring and versioning are essential to the creation of a protocol which implements this functionality.

Though the feature set described above bears a resemblance to the capabilities provided by a network file system, the intent of this working group is not to create a replacement distributed file system (e.g. NFS, CIFS). The WEBDAV emphasis on collaborative authoring of resources which are not necessarily stored in a file system, and which have associated metadata in the form of links and attributes, differentiate WEBDAV from a distributed file system.

Many decisions have been made to reduce the scope of effort of this working group. It is the intent of this working group to avoid the inclusion of the following functionality, unless it proves impossible to create a useful set of distributed authoring capabilities without it:

NOT IN SCOPE:

*Definition of core attribute sets, beyond those attributes necessary for the implementation of distributed authoring and versioning functionality

*Creation of new authentication schemes

*HTTP server to server communication protocols

*Distributed authoring via non-HTTP protocols (except email)

*Implementation of functionality by non-origin proxies

Eventually, it is desirable to provide access to WEBDAV capability by disconnected clients, or by clients whose only connectivity is via email. However, given the scope of developing requirements and specifications for disconnected operation, the initial target user group of fully connected clients, and the desire to work swiftly, the working group will address this issue by ensuring the protocol specification does not preclude a future body from developing an interoperability specification for disconnected operation via email.

Deliverables

The final output of this working group is expected to be three documents:

1. A scenarios document, which gives a series of short descriptions of how distributed authoring and versioning functionality can be used, typically from an end-user perspective. Ora Lassila, Nokia, currently visiting with the World Wide Web Consortium, is editor of this document.

2. A requirements document, which describes the high-level functional requirements for distributed authoring and versioning, including rationale. Judith Slein, Xerox, is editor of this document.

3. A protocol specification, which describes new HTTP methods, headers, request bodies, and response bodies, to implement the distributed authoring and versioning requirements. Del Jensen, Novell, is editor of this document.

The most recent versions of these documents are accessible via links from the WEBDAV Web page.

Goals and Milestones:

Mar 97



(Specification) Produce revised distributed authoring and versioning protocol specification. Submit as Internet Draft.

Apr 97



(Meeting, Specification, Requirements) Meet at Memphis IETF and hold working group meeting to review the protocol specification and requirements document.

Apr 97



(Scenarios) Revise scenarios document. Submit as Internet Draft.

Aug 97



(Scenarios) Create final scenarios document. Submit as Informational RFC.

Aug 97



(Requirements) Create final version of distributed authoring and versioning requirements document. Submit as Informational RFC.

Aug 97



(Specification) Produce revised distributed authoring and versioning protocol specification. Submit as Internet Draft.

Dec 97



(Specification) Complete revisions to distributed authoring and versioning specification. Submit as a Proposed Standard RFC.

Internet-Drafts:

· WebDAV Tree Operations

· WebDAV ACL Protocol

· Requirements for Access Control within Distributed Authoring and Versioning Environments on the World Wide Web

· Requirements for Advanced Collection Functionality in WebDAV

· Requirements for Event Notification Protocol

· WebDAV Advanced Collections Protocol

· WebDAV Access Control Goals

· Versioning Extensions to WebDAV

· Use of Dublin Core Metadata in WebDAV

· Goals for Web Versioning

Request For Comments:

RFC

Status

Title

RFC2291

Requirements for a Distributed Authoring and Versioning Protocol for the World Wide Web

RFC2518

PS

HTTP Extensions for Distributed Authoring -- WEBDAV

Current Meeting Report

Meeting Minutes
WEBDAV WG
Minneapolis IETF
March 17, 1999

The WEBDAV working group met at the Minneapolis IETF, on March 17, 1999, from 15:30 to 17:30. The meeting was chaired by Jim Whitehead, and Yaron Goland recorded notes. Approximately 55 people attended.

The meeting began with a brief review of the agenda (overview of DELTA-V BOF, issues from the Advanced Collections specification, creating a property registry, moving access control forward).

DELTA-V BOF PRESENTATION

Jim Amsden gave a brief presentation on the DELTA-V BOF, which was held in the previous session. The presentation gave an overview of the scope of the effort proposed in the DELTA-V charter.

Jim's presentation began with a short history of why WebDAV was created. WebDAV, when it went to create document management features found that versioning was critical and included it from the start. As WebDAV progressed it was found that versioning was very hard and that it required its own working group. DELTA-V is that proposed working group.

The protocol that is proposed for DELTA-V will contain the following features:

Versioning
- Ability for a resource to be checked into a version controlled system where it has multiple revisions that are tracked and can have multiple successor and predecessor relationships. The server will maintain those relationships, report the revision history, and control the write able access to these revisions using check in/out operations.

Parallel Development
- Provides more resource availability in a multi-user environment. Multiple users can check-out the same revisions of a resource and track who has those check-outs and to merge them back into each other later on as appropriate.

Configuration Management
- A means of to collecting a group of consistent revisions of resources together. The protocol will support creating configurations, putting revisions in them and tracking them over time.

Jim Amsden finished by reporting that the first BOF was just completed, and seemed to be reasonable success. There is a mailing list ietf-dav-versioning@w3.org, which is archived.

ADVANCED COLLECTIONS PROTOCOL

After Jim Amsden's presentation, the floor was turned over to Judy Slein for discussion of issues from the Advanced Collections protocol specification.

The first issue concerned what the default behavior should be for certain methods like copy and lock when applied to references. There are two types of references in the Advanced Collections protocol
specification:

Redirect
- for servers that want to provide basic referencing capabilities at minimal cost, the server never acts as a proxy (i.e., the server does not forward methods along to the target resources) but the disadvantage is that the reference is very visible to clients, and clients have to take actions (based on the returned 3xx status code) to resolve the reference.

Direct references
- cheap for clients but expensive for servers. The servers resolve these automatically and provide the illusion that the client is working directly on the referenced resource (a.k.a. target resource).

The general rule of thumb for default behavior is that when you apply a method to a redirect reference you get a 302 response in the location header and that response gives you the URI of the target resource. The default behavior for direct reference is for the server to automatically apply the method to the target resource, itself. Judy stated that in the ideal, these default behaviors should be the rule for all methods, but there are some cases which make this not possible. During discussions on this issue, the spec. authors have developed principles to deal with situations where the default behaviors do not apply. These discussions led to the realization that there are four cases when determining the behavior of a method when references are present: Redirect references, Collections that contain redirect references, Direct references, Collections that contain direct references.

The first rules the authors developed was to ensure that if a method is applied to a single direct reference and or if the same method is applied to a direct reference in a collection, the behavior of the method will be the same. The same logic applies for redirect references. So we really have two cases. We would like to be able to do the same thing for redirect and direct references, either apply to the target or not. But we haven't been able to do that.

At this point there some Q&A from the attendees. One attendee asked why there are both direct and redirect references?

Some arguments from this discussion in favor of having both direct and redirect references:

- Redirect resources are easier for servers to implement than direct references
- Security: the server may not want to perform an action on behalf of the client because of the security implications (and hence would either not want to implement direct references, or would limit the target to be a resource on the same server (or administrative domain) as the reference
- Redirect references can have a target which is not an http scheme URL (e.g., ftp or gopher URLs), and it is unlikely that a server would proxy HTTP commands (some of which do not have equivalents in other protocols) to allow direct references to these URLs.
- Servers already provide a redirect capability, and creating a redirect is performed by out-of-band mechanisms. Redirect references provide a mechanism for remotely authoring, via HTTP, these redirects. File systems often contain both kinds of reference, both direct and redirect-style, and it would be useful to be able to author both kinds of reference.

Arguments raised against having both kinds of reference:

- Direct references appear to have the same set of features as redirect references, so why have both?
- If a client is redirected to the target resource (via a redirect reference) once, that is more efficient than if a server constantly forwards requests, as is the case with direct references.

Judy closed discussion by noting the issue that the specification may not necessarily need two different kinds of references. She also noted that the specification was unlikely to change so fundamentally at this point.

Judy continued her presentation. Judy observed that the specification authors have developed a set of design principles which are not orthogonal, and must be traded off for some methods. The authors have developed the following principles:

- All references should be usable by down-level clients and the default behavior should be what makes the most sense to a down-level client.
- The behavior of a method applied to a referential resource should be consistent, whether it is applied to an individual references or a reference encountered when processing a collection.
- A server should never need to resolve a redirect reference and act as a proxy (we never violated this).
- Behavior should be consistent across all methods as far as possible.
- We want to be consistent with WebDAV and HTTP semantics for methods.

Unfortunately, these principles lead to conflicting design choices for some methods.

Applying the principles is easy for the methods: GET, HEAD, OPTIONS, PUT, POST, MKCOL, MKREF, PROPPATCH, and PROPFIND. For a redirect reference you respond with a 302. For a direct redference you apply to the target.

The more difficult methods are DELETE and MOVE. For these, always apply the method to the redirect reference resource itself, and also apply the method to the direct reference resource. For COPY, LOCK and UNLOCK, the method is applied to the redirect reference resource, while for direct references, the method is applied to the target. There is no consensus on the last three.

Judy noted that for MOVE and DELETE, there appears to be consensus because their semantics are similar to those supported by file systems. The rationale for applying them to the reference resource, rather than its target, is that MOVE and DELETE affect the membership of collections, and it would be undesirable if MOVE and DELETE, through secondary effects, modified the membership of the target collections. There was general agreement in the room on this point.

COPY FOR REFERENCES

Judy went on the semantics of COPY. For COPY, the expectation is that the destination of a COPY should be a new resource, and operations on that new resource do not affect the original resource. However, what is the expectation if you copy a collection with references? Is the expectation the new collection will have copies of the references or of the targets? If you want to get 302 in all the same cases then you want to copy the references. If you want to have safe resources to play with then you want the targets to be copied.

Discussion on this topic then ensued. One thread of discussion concerned wther the behavior of COPY on individual resources should be the same as COPY on collections of resources. Some attendees noted that copying collections is a difficult, and option-laden activity in operating systems, and in programming languages (e.g. LISP has five different copy operators based on various conditions). Choices in programming languages haven't been encouraging: either they make one wrong choice, which leads to people creating many different types of copy, or they choose one and tell everyone else to go away. It was noted that one source of underlying difficulty with COPY is that the term copy has lots of different meanings for computers, and for paper too.

Larry Masinter then proposed that, the safest thing is to perform the least amount of work, and hence copy of a reference should always just copy the reference resource, and not the target. Of the two choices, copy the reference or copy the target, copying the reference is the least amount of work for the server. Mark Day noted while the choices in the protocol should surprise the least number of people, it's not always possible to avoid surprising a signifigant part of the population. We have to be ready to make an arbitrary choice because we can't converge on the easy to use solution. Larry Masinter then proposed that the protocol specification document indicate clearly that copy is complicated, and that users will have different expectations of what copy means. There was general agreement in the room for the "do the least amount of work, but document the difficulty" approach.

LOCK FOR REFERENCES

Discussion then moved onto LOCK.

It was noted that returning one 302 response for each reference in a collection of redirect references would cause LOCK to fail (since it has all-or-nothing semantics). Hence, locking the reference is the desired functionality for redirect references. Direct references have their own set of problems.

One attendee noted that here are four possibile choices when locking a direct reference. Either lock the target, lock the reference, lock both, or lock neither. Neither is out. Like copy you can make very reasonable cases for all, and like copy the choices seem very arbitrary. Locking the target makes a lot of sense except that you could move or delete the reference. So if the target is locked and someone moves or deletes the reference, that might be surprising to the client. Or it might not be. How often do you go and try to move or delete something that someone has locked.

One person noted that the point of a lock is to protect the contents of the persistent state of the resource. Locking the target would at least honor those semantics. However, another person noted that a lock affects both the contents and the namespace, and a lock on a reference needs to protect against both namespace operations and content modification operations.

A proposal was made that a lock on a direct reference should lock the target, but cause the reference to behave as if it were locked. That is, MOVE and DELETE on the reference would fail if the target were locked. There was some disucssion on the impact this might have on the No-passthrough header. There was a proposal that if the target of a reference is locked then operations that are performed without the no passthrough header behave as if the reference is also locked. However if they are supplied with no passthrough then they do not. The particular case is DELETE. There is also a need to define how references behave when their target is locked. Some, but not all people on the room appeared to support this proposal.

IMPACT OF REFERENCES ON URL RESOLUTION

The crux of this issue is, if you create a reference to a collection, are you forcing the server to create references to each member of that collection? The answer of the specification authors is, no you are not, because the server doesn't need all those additional references.

As an example, suppose there is a direct reference called BLAH to a target which is a collection called FOO, and FOO contains a member called BAR. If a client performs a GET on BLAH/BAR, the specification authors say BLAH/BAR is just an alternate URL, while Yaron Goland insists that, from an HTTP perspective, BLAH is a resource, and so is BLAH/BAR. But is it a direct reference? Are there operations which can be peformed on FOO/BAR that cannot be applied to BLAH/BAR?

Filesystems solve this problem by making both foo and blah pointer objects, and ref-counting bar. UNIX prevents this by barring hard-links to a collection.

One attendee suggested that if BLAH is a direct reference to a collection, it should only support operations on BLAH, not on URLs which are BLAH/x. However, this approach was considered, and rejected, in a authors' group teleconferences because there is an expectation that references support this kind of namespace operation.

Larry Masinter noted that it is bothersome that BLAH/BAR is not a *direct* reference, or at least behaves like one. You can discover its target, so it quacks like a reference. Only operations only on that name -- without no-passthrough -- behave in a little different way. Judy responded that even before references came along, you could already have multiple URLs for the same resource -- this is the same.

Yaron stated that there may not be a problem here. However, this namespace redirection action is sufficiently novel that it deserves to be addressed in the specification more than it currently is.

PROPERTY REGISTRY

Jim Whitehead then began discussion on having a registry for WebDAV properties. The discussion began with a brief overview of WebDAV properties. Properties are name/value pairs where each name is a URI, which could be a URL. A nice quality of names being URLs is that if you want to define a new property for your use, then you can create a new property, assign it a URL in a namespace you own, and you can have a fresh name without running into any namespace collisions.

One attendee noted that a problem arises when concatenating namespaces and element names, beacuse the XML community defined namespaces which don't end in seperators, and hence you can't deconstruct a namespace with an element name appended. However, other participants noted that WebDAV defines rules for concatenating a namespace and an element name, and noted that XML namespaces delegates to its users the definition of the namespace and element relationship.

Alex Hopmann stated that it is useful to have rules on how you take the on the wire names and how you expose those as a single identifier. If you are talking about a property registry where the names is a full URI then you need a rule for how they are separated into a namespace and a property (element) name.

Jim Whitehead stated that he would note the issue, and moved on. The value of a property is a sequence of well-formed XML. There are cases when it is desirable to register properties. The property may have wider utility than just your client or server, or people will get together to create properties that have wider use. You need to register the property name, the namespace.

One attendee asked whether it makes sense to register an entire namespace, in order to assert that this is a namespace I am going to use, and I want people to know.

Attendees also noted that it would be good to register whether the property is live or dead, and that there should be a URL for getting more (human readable) information about the properties.

Larry Masinter stated, in his view, a property registry is almost useless. What people do with properties is they build protocols out of them. Properties rarely have an independent, separable semantics from other properties. People define a suite of properties and then implement a protocol that goes along with it. The suite of properties mean something together. Registering properties without defining their relationship is like defining a registry of HTTP headers without defining their relationships. Actually... a registry of HTTP headers would be useful, so I take it back, but it is hardly enough to understand what the HTTP headers are. A registry of XML elements doesn't tell you the semantics of the constructions you build out of them. Defining a registry by defining the properties alone is insufficient.

Larry continued, stating that you need the whole schema. Dublin core kind of goes together. That is why it is a core. If you just took author out of Dublin Core and didn't take providence or authority, they don't really hold independently without the set.

There was some discussion over the utility of using a single property from a schema. Some participants felt that the author property from Dublin Core was a good example of a property that should be widely reusable. This discussion highlighted a fundamental difference in belief about property reuse: some participants felt that individual properties could be reused, with other felt that individual properties could not be reused, but sets of properties (schemas) which hang together could be.

Another participant asked whether we see any reason why the success or usage of this registry should be a different experience from the registries of directory attributes or MIB objects (experience with these registries has been negative). There was no answer to this.

There was a brief discussion on the proper forum for registering properties. Jim Amsden asked, if the document management community wants to get together and define properties to contain document life cycle information, what is the forum that this group would have to publish that schema of properties to share them, agree on them. There was a suggestion that they publish the schema as an Informational RFC. However, this was perceived as being too formal, and there was concern that other companies would not find this document in the RFCs listing. The web site www.webdav.org was suggested as the place to go to find these pointers. Other organizations might also provide an index.

ACCESS CONTROL

With time already expired, there was a brief discussion on how to move access control forward. Jim Whitehead noted that he has heard that the authors are not able to drive forward with the ACL draft.

There was a suggestion from the participants that, since there is at most one person at a time who is willing to push forward with this stuff, it isn't worth the interest of the IETF. However, Jim W. countered by noting that everytime he meets someone implementing a WebDAV server their first criticism is "where is the access control?" So the fact that no one is working on it here doesn't indicate that there isn't a problem.

Jim Whitehead ended by calling for volunteers to work on the access control protocol specification.

Slides

Collections
Property Registry Rationale