idnits 2.17.1 

draft-ietf-dasl-requirements-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  == Mismatching filename: the document gives the document name as
     'draft-dasl-requirements-01', but the file name used is
     'draft-ietf-dasl-requirements-00'

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 510 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([WEBDAV]), which it shouldn't.
      Please replace those with straight textual mentions of the documents in
     question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Couldn't figure out when the document was first submitted -- there may
     comments or warnings related to the use of a disclaimer for pre-RFC5378
     work that could not be issued because of this.  Please check the Legal
     Provisions document at https://trustee.ietf.org/license-info to determine
     if you need the pre-RFC5378 disclaimer.

  -- Couldn't find a document date in the document -- date freshness check
     skipped.

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Unexpected draft version: The latest known version of 
     draft-alvestrand-charset-policy is -01, but you're referring to -02.

  ** Obsolete normative reference: RFC 2068 (ref. 'HTTP') (Obsoleted by RFC
     2616)

  -- Possible downref: Normative reference to a draft: ref. 'SCENARIOS' 

  ** Obsolete normative reference: RFC 2518 (ref. 'WEBDAV') (Obsoleted by RFC
     4918)


     Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT                                               Jim Davis
2	draft-dasl-requirements-01.txt                       Xerox Corporation
3	Feb 24, 1999                                              Saveen Reddy
4	Expires August 24, 1999                          Microsoft Corporation
5	                                                          Judith Slein
6	                                                     Xerox Corporation

8	Requirements for DAV Searching and Locating

10	Status of this Memo

12	     This document is an Internet-Draft and is in full conformance
13	     with all provisions of Section 10 of RFC2026.

15	     Internet-Drafts are working documents of the Internet Engineering
16	     Task Force (IETF), its areas, and its working groups.  Note that
17	     other groups may also distribute working documents as
18	     Internet-Drafts.

20	     Internet-Drafts are draft documents valid for a maximum of six
21	     months and may be updated, replaced, or obsoleted by other
22	     documents at any time.  It is inappropriate to use Internet-
23	     Drafts as reference material or to cite them other than as
24	     "work in progress."

26	     The list of current Internet-Drafts can be accessed at
27	     http://www.ietf.org/ietf/1id-abstracts.txt

29	     The list of Internet-Draft Shadow Directories can be accessed at
30	     http://www.ietf.org/shadow.html.

32	     This document is a product of the DAV Searching and Locating
33	     (DASL) Working Group of the IETF. Please send comments to the
34	     mailing list at:
35	       www-webdav-dasl@w3.org
36	     This list may be joined by sending a message with subject
37	     "subscribe" to:
38	       www-webdav-dasl-request@w3.org

40	     Discussions of the list are archived at:
41	       http://www.w3.org/pub/WWW/Archives/Public/www-webdav-dasl

43	Abstract

45	The Distributed Authoring and Versioning protocol [WEBDAV] defines
46	simple mechanisms to assign and retrieve values for properties. This
47	document presents requirements for a WebDAV extension to support
48	efficient searching for resources based on WEBDAV properties and
49	content. These requirements are intended to be the basis for the DAV
50	Searching and Location (DASL) protocol.

52	1. Introduction

54	Motivation for DASL

56	WEBDAV and HTTP provide support for client-side search, but not server-
57	side search. The GET method defined in [HTTP] allows clients to
58	retrieve a resource's content; the PROPFIND method defined in [WEBDAV]
59	allows clients to retrieve a resource's properties. Having retrieved a
60	resource's properties and/or content, the client can compare them to
61	its search criteria to determine whether the resource is of interest.
62	Although this client-side searching is logically sufficient, and
63	requires no modifications to the server, it comes at a significant
64	cost, because it makes inefficient use of network resources. A client
65	must retrieve properties and content for each resource under
66	consideration. Furthermore, it does not take advantage of server
67	intelligence. Servers capable of searching can use sophisticated
68	mechanisms to generate results: internal caching of intermediate search
69	results, content-indexing, etc.

71	Even simple, common queries may expose these limitations. Consider the
72	query "find all text files modified during the last week." When such a
73	query is extended to a large number of clients searching against a
74	single server, the limitations become more apparent. Client-side
75	searching has difficulties scaling in these cases.

77	DASL allows for server-side searching. Server-side searching allows the
78	client to formulate a query and have the server perform task of
79	selecting the resources that fit the criteria. This overcomes both of
80	the limitations of client-side searching described above. The benefit
81	is a searching solution that scales; the cost is that the server
82	software becomes more complex.

84	This document presents requirements for any protocol that might be
85	proposed for DASL. These requirements come from considerations of the
86	scenarios presented in [SCENARIOS], from the need to support the WebDAV
87	object model, the use of HTTP, and general IETF rules. We provide
88	rationale for those requirements whose justification is not obvious.
89	We assign each requirement a priority, one or two, where one is higher.
90	The significance of the number is that priority one requirements are
91	those that any protocol must define to be considered successful, where
92	priority two requirements are those that are desirable but not
93	necessary. There are no priority three requirements at present.

95	2. Terminology

97	scope
98	        a set of resources to be searched.
99	criteria
100	        an expression against which each resource in the search scope
101	        is evaluated.
102	result set
103	        a set of records, one for each resource for which the search
104	        criteria evaluated to True.
105	record
106	        a description of a resource. A result record is a set of
107	        properties, and possibly other descriptive information
108	result
109	        A result is a result set, optionally augmented with other
110	        information describing the search as a whole.
111	result record definition
112	        a specification of the set of properties to be returned in the
113	        result record
114	sort specification
115	        a specification of an ordering on the result records in the
116	        result set.
117	search modifier
118	        an instruction that governs the execution of the query but is
119	        not part of the search scope, result record definition, the
120	        search criteria, or the sort specification. An example of a
121	        search modifier is one that controls how much time the server
122	        can spend on the query before giving a response.
123	query
124	        A query is a combination of a search scope, search criteria,
125	        result record definition, sort specification, and a search
126	        modifier.
127	query grammar
128	        a set of definitions of XML elements, attributes, and
129	        constraints on their relations and values that defines a set of
130	        queries and the intended semantics.
131	schema
132	        a listing, for any given grammar and scope, of the properties
133	        and operators that may be used in a query with that grammar and
134	        scope.
135	Hit highlighting
136	        is a specification of the location(s) within a resource
137	        containing text that matched a content-query. It allows clients
138	        to provide visual cues to a user to identify segments in a text
139	        resource that cause them to match content-based queries.
140	paged results
141	        allows a client to request that the server return a subset of
142	        the result set rather than the entire set. In subsequent calls
143	        to the server, additional results from the same query can be
144	        requested. Paged results are intended to improve the
145	        performance and manageability of search results.

147	In addition to the terms defined above, this document uses terminology
148	consistent with [HTTP] and [WEBDAV].

150	Requirements are divided into five categories, and numbered within each
151	category. The categories are Scope, Criteria, Record Definition, Other
152	and Discovery.

154	3. Requirements: Scope

156	S1: It is possible to specify at least one resource in the scope (P1).
157	It is possible to specify a set of distinct, unrelated resources in the
158	scope (P2).
159	        As this is the first requirement in the document, we explain
160	        the notation. S1 means this is the requirement one in the Scope
161	        section, P1 means that the requirement to have at least one
162	        resource in scope is essential, and P2 means that allowing more
163	        than one is nice but not required.

165	        Rationale: Supporting multiple resources in scope could be
166	        difficult to define, because distinct resources may have
167	        different sets of metadata, support different operators, or
168	        have different access rights.

170	S2 It is possible to specify a WebDAV collection as a scope (P1).

172	S3: It is possible to specify other types of resources in a scope (P2).
173	        Rationale: A client might wish to determine whether a given
174	        resource was of interest without transferring it.

176	S4: When the scope is a collection, it is possible to specify the depth
177	(P1).
178	        Users often intend to scope their searches either to the
179	        immediate children of a container or to extend the search
180	        recursively to the container's children. Furthermore, depth
181	        control is needed to prevent servers from performing
182	        unnecessary work.

184	4. Requirements: Criteria

186	Criteria generalities

188	C1: It is possible to search properties in a query (P1). It is possible
189	to search both DAV-defined and application-defined properties in a
190	query (P1).

192	        Further requirements for properties are below.

194	C2: It is possible to search content in a query (P1).
195	        Note that at this writing, unlike property searches, there is
196	        no single widely accepted semantics for content-based queries.
197	        Further requirements for content criteria are below.

199	C3: It is possible to search both properties and content in a single
200	query.

202	C4: It is possible to combine criteria with Boolean operators (i.e.
203	and, or, not) (P1).

205	Criteria for properties

207	C5: It is possible to include undefined properties in a query without
208	error (P1).
209	        Rationale:. This arises from the property model of DAV. Unlike
210	        the more familiar relational model, DAV does not define tables
211	        or schema for resources, hence there is no guarantee that all
212	        properties will be defined for all resources. Moreover, DAV
213	        allows an client to store arbitrary properties on arbitrary
214	        resources. Therefore DASL must support queries that use
215	        properties that are not defined on all resources in the scope.
216	        If such a query failed, there would be no way to locate the
217	        desired resources.

219	C5.1: It is possible to test whether a property is defined (P1).

221	C6.1: It is possible to compare a property value to a constant
222	value (P1).

224	C6.2.1: It is possible to compare property values to other properties
225	of the same resource (P2).

227	C6.2.2: It is possible to compare property values to other properties
228	of other resources (P2).

230	        Note that this may involve a "join". We do not expect the first
231	        version of the DASL protocol to meet this requirements.

233	C6.3: It is possible to compare property values to results of
234	expressions (P2).

236	C6.4: It is possible to match property values with string-ending
237	wildcards (P1). It is possible to match property values with pattern
238	matching operators similar to the SQL "like" operator or regular
239	expressions (P2).

241	        The minimum is necessary to enable DASL to locate resources by
242	        content type, e.g. to locate all image files by comparison with
243	        "image/*". More powerful comparisons are useful when strings
244	        encode structured data such as times or lists. Note that these
245	        are constraints on what the protocol must define, not on what
246	        servers must necessarily implement.

248	C6.5: It is possible to compare property values taking into account
249	their structure (P2).

251	        Explanation: Some WebDAV properties are defined to contain
252	        strings (e.g. DAV:getcontenttype), but others contain
253	        structured values (e.g., DAV:resourcetype, DAV:lockdiscovery).
254	        Support for structured value criteria is needed, for example,
255	        to locate resources locked in a certain manner by a certain
256	        principal. The working group consensus is that this feature,
257	        while undeniably very useful, is so difficult to define that it
258	        is better for DASL to proceed than attempt to define it. Also,
259	        there is much activity in the W3C to define an XML query
260	        language, and it was felt better to wait for this to complete
261	        than to define a competing standard.

263	C7.1: The protocol defines an equality operator (P1).

265	C7.2: The protocol defines relative operators (P1).

267	C8: The protocol defines means to specify case sensitivity (P1).

269	        Note this does not say that all DASL servers must support both
270	        case-sensitive and case-insensitive comparisons, but only that
271	        the protocol must be able to express a client's preference, and
272	        define behavior in the case where the server cannot support
273	        that preference.

275	C9: The protocol supports language-specific definitions for string
276	comparison and sorting (P1).

278	        Different cultures define different rules for string
279	        comparison, e.g. for collating sequence and for significance of
280	        diacritics. Cross-language comparison is out of scope for DASL,
281	        but comparisons within the same language must be done with the
282	        appropriate semantics.

284	        Requirements: Criteria for content searches

286	C10: It is possible to search content of any text media type (P1). The
287	definition of "searching content" for DASL means locating sequences of
288	characters in the contents of the resource.

290	        DASL defines no requirements for searching for structure within
291	        text media types (e.g. for finding character strings only
292	        within certain HTML tags.) This functionality is too
293	        complicated to specify at the present time.

295	        DASL defines no requirements for searching other media types
296	        that might contain text (e.g. subtypes of application).
297	        Searching non-text media types (e.g.images, audio) is out of
298	        scope for DASL.

300	C11.1: It is possible to search for words that are within a specified
301	number of words (or, for some languages, characters) of each
302	other (P1).

304	        This is often called 'near' search. It is used to locate
305	        concepts that can be expressed in more than one way using the
306	        same set of words, e.g. one might locate both "the President's
307	        impeachment" and "the impeachment of the President".

309	C11.2: It is possible to search for words that occur within the same
310	grammatical context, e.g. same phrase, sentence, or paragraph (P2).

312	        This is sometimes called 'in' search.

314	C12.1: It is possible for a client to control whether content searches
315	does or does not use a stemming comparison (P2).

317	C12.2: It is possible for a client to request comparisons using
318	phonetic similarity (e.g. soundex) (P2)

320	C12.3: It is possible for the client to request keyword expansion
321	(thesaurus expansion) (P2).

323	C13: It is possible for a client to conduct a relevance search (P2). In
324	such a search, the query consists of a set of words (perhaps an entire
325	resource), and the result is a list of resources whose contents most
326	closely resemble the query, sorted in decreasing order of resemblance.

328	5. Requirements: Results

330	R1: It is possible to specify a sorting for the result set (P1).

332	R2: It is possible to specify a set of properties to be returned in the
333	result records, distinct from the properties in criteria (P1).

335	        For example, a query might ask for "the authors of those
336	        documents under 10K in size". In this case, the criterion
337	        relates only to the size, but the desired result record
338	        contains only the author.

340	R3: It is possible for a client to request limits on the resources
341	consumed in creating of transmitting in the result set (P1).

343	        Some queries can potentially return very large result sets.
344	        Clients that are good citizens will voluntarily limit the size
345	        of such results. In addition, some servers may charge money for
346	        queries.

348	R3.1: It is possible for a client to limit the number of records in the
349	result set (P1).

351	        This is the most meaningful unit of resource consumption to the
352	        client.

354	R4: It is possible for the server to return fewer result records than
355	match the criteria (P1).

357	        "Client proposes, server disposes".

359	R5: It is possible to a client to request paged results (P1).

361	        Paged retrieval is necessary if result sets are very large and
362	        if clients must also present a responsive interface to a user.
363	        Note that this requirement is silent about whether a server
364	        implements paged results by storing results from a query or
365	        recalculating them as needed.

367	6. Requirements: Other

369	O1: It is possible to support multiple query grammars (P1).

371	        Rationale: A particular query grammar may not expose all the
372	        useful searching functionality of a server. Clients should be
373	        allowed to query a server using any grammar that takes
374	        advantage of those special server capabilities. This
375	        requirement also allows DASL to define an initial limited query
376	        grammar which meets all the mandatory requirements without
377	        needing to address all the desirable, but non-mandatory
378	        requirements.

380	O2: It is possible to extend the basic grammar defined by DASL (P1).

382	03: It is possible for the server to redirect a query (P1).

384	        This is useful when a server is not able to search a given
385	        scope, but can refer the client to another server which is able
386	        to search the scope.

388	O4: It is possible for the client to request hit highlighting (P2).

390	7. Requirements: Discovery

392	D1: It is possible for a client to discover the set of query grammars
393	supported by a server (P1).

395	        Without this, it is not very useful for servers to support
396	        multiple grammars.

398	D2: It is possible for a client to discover the schema supported by a
399	server for a particular grammar with a particular scope (P1).

401	        Note that the schema may differ depending on the scope. Query
402	        schema discovery allows a client to use optional properties and
403	        operators supported by a server.

405	D3: It is possible for a client to determine information about the
406	properties within a scope (P2).

408	        This information can enable a user interface to help a user to
409	        construct a valid query, for example by providing meaningful
410	        names for properties, constraints on values, hints about data
411	        type, and so on, or information about expected performance, for
412	        example whether a property is indexed (and hence more quickly
413	        searched).

415	8. External Requirements

417	DASL must describe how to perform searches on internationalized content
418	and properties. This is in keeping with IETF policy.

420	Information intended for user comprehension must conform to the IETF
421	Character Set Policy [CHAR].

423	The WebDAV working group is currently addressing the standardization of
424	mechanisms for authors to submit variants and version of resources, or
425	for means of exposing access control. DASL should provide mechanisms
426	that can query for variants, versions, and access control but can not
427	do so until they are defined. Likewise, DASL may contribute
428	requirements to access control (e.g. control over querying).

430	9. Related Work

432	Z39.50: "Information Retrieval (Z39.50): Application Service Definition
433	and Protocol Specification".
434	http://lcweb.loc.gov/z3950/agency/

436	Z39.50 Profile for Simple Distributed Search and Ranked Retrieval
437	http://lcweb.loc.gov/z3950/agency/profiles/zdsr.html

439	The STARTS Protocol
440	http://www-db.stanford.edu/~gravano/starts.html

442	The Harvest Information Discovery and Access System
443	http://mordor.transarc.com/afs/transarc.com/public/trg/Harvest/

445	10. References

447	[CHAR]      H.T. Alvestrand, "IETF Policy on Character Sets and
448	            Languages", June 1997, internet-draft, work-in-progress,
449	            draft-alvestrand-charset-policy-02.txt.

451	[HTTP]      R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and
452	            T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1",
453	            RFC 2068, U.C. Irvine, DEC, MIT/LCS, January 1997.

455	[SCENARIOS] Henderson, R. et al Scenarios for DAV Searching and
456	            Locating. Work in progress.
457	            draft-henderson-dasl-scenarios-00.html, September 18, 1998
458	            (Expires Mar 23, 1999)

460	[WEBDAV]    Y. Y. Goland, E. J. Whitehead, Jr., A. Faizi, S. R. Carter,
461	            D. Jensen, "Extensions for Distributed Authoring and
462	            Versioning on the World Wide Web", IETF Proposed Standard,
463	            RFC 2518

465	11. Authors' Addresses

467	        Jim Davis
468	        Xerox Corporation
469	        3333 Coyote Hill Road
470	        Palo Alto, CA 94304
471	        Email: jdavis@parc.xerox.com

473	        Saveen Reddy
474	        Microsoft Corporation
475	        One Microsoft Way
476	        Redmond WA, 9085-6933
477	        email: saveenr@microsoft.com

479	        Judith Slein
480	        Xerox Corporation
481	        800 Phillips Road 105-50C
482	        Webster, NY 14580
483	        Email: slein@wrc.xerox.com