idnits 2.17.1 draft-ietf-dasl-requirements-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. == Mismatching filename: the document gives the document name as 'draft-dasl-requirements-01', but the file name used is 'draft-ietf-dasl-requirements-00' == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 510 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([WEBDAV]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Unexpected draft version: The latest known version of draft-alvestrand-charset-policy is -01, but you're referring to -02. ** Obsolete normative reference: RFC 2068 (ref. 'HTTP') (Obsoleted by RFC 2616) -- Possible downref: Normative reference to a draft: ref. 'SCENARIOS' ** Obsolete normative reference: RFC 2518 (ref. 'WEBDAV') (Obsoleted by RFC 4918) Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Jim Davis 2 draft-dasl-requirements-01.txt Xerox Corporation 3 Feb 24, 1999 Saveen Reddy 4 Expires August 24, 1999 Microsoft Corporation 5 Judith Slein 6 Xerox Corporation 8 Requirements for DAV Searching and Locating 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance 13 with all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as 18 Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other 22 documents at any time. It is inappropriate to use Internet- 23 Drafts as reference material or to cite them other than as 24 "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This document is a product of the DAV Searching and Locating 33 (DASL) Working Group of the IETF. Please send comments to the 34 mailing list at: 35 www-webdav-dasl@w3.org 36 This list may be joined by sending a message with subject 37 "subscribe" to: 38 www-webdav-dasl-request@w3.org 40 Discussions of the list are archived at: 41 http://www.w3.org/pub/WWW/Archives/Public/www-webdav-dasl 43 Abstract 45 The Distributed Authoring and Versioning protocol [WEBDAV] defines 46 simple mechanisms to assign and retrieve values for properties. This 47 document presents requirements for a WebDAV extension to support 48 efficient searching for resources based on WEBDAV properties and 49 content. These requirements are intended to be the basis for the DAV 50 Searching and Location (DASL) protocol. 52 1. Introduction 54 Motivation for DASL 56 WEBDAV and HTTP provide support for client-side search, but not server- 57 side search. The GET method defined in [HTTP] allows clients to 58 retrieve a resource's content; the PROPFIND method defined in [WEBDAV] 59 allows clients to retrieve a resource's properties. Having retrieved a 60 resource's properties and/or content, the client can compare them to 61 its search criteria to determine whether the resource is of interest. 62 Although this client-side searching is logically sufficient, and 63 requires no modifications to the server, it comes at a significant 64 cost, because it makes inefficient use of network resources. A client 65 must retrieve properties and content for each resource under 66 consideration. Furthermore, it does not take advantage of server 67 intelligence. Servers capable of searching can use sophisticated 68 mechanisms to generate results: internal caching of intermediate search 69 results, content-indexing, etc. 71 Even simple, common queries may expose these limitations. Consider the 72 query "find all text files modified during the last week." When such a 73 query is extended to a large number of clients searching against a 74 single server, the limitations become more apparent. Client-side 75 searching has difficulties scaling in these cases. 77 DASL allows for server-side searching. Server-side searching allows the 78 client to formulate a query and have the server perform task of 79 selecting the resources that fit the criteria. This overcomes both of 80 the limitations of client-side searching described above. The benefit 81 is a searching solution that scales; the cost is that the server 82 software becomes more complex. 84 This document presents requirements for any protocol that might be 85 proposed for DASL. These requirements come from considerations of the 86 scenarios presented in [SCENARIOS], from the need to support the WebDAV 87 object model, the use of HTTP, and general IETF rules. We provide 88 rationale for those requirements whose justification is not obvious. 89 We assign each requirement a priority, one or two, where one is higher. 90 The significance of the number is that priority one requirements are 91 those that any protocol must define to be considered successful, where 92 priority two requirements are those that are desirable but not 93 necessary. There are no priority three requirements at present. 95 2. Terminology 97 scope 98 a set of resources to be searched. 99 criteria 100 an expression against which each resource in the search scope 101 is evaluated. 102 result set 103 a set of records, one for each resource for which the search 104 criteria evaluated to True. 105 record 106 a description of a resource. A result record is a set of 107 properties, and possibly other descriptive information 108 result 109 A result is a result set, optionally augmented with other 110 information describing the search as a whole. 111 result record definition 112 a specification of the set of properties to be returned in the 113 result record 114 sort specification 115 a specification of an ordering on the result records in the 116 result set. 117 search modifier 118 an instruction that governs the execution of the query but is 119 not part of the search scope, result record definition, the 120 search criteria, or the sort specification. An example of a 121 search modifier is one that controls how much time the server 122 can spend on the query before giving a response. 123 query 124 A query is a combination of a search scope, search criteria, 125 result record definition, sort specification, and a search 126 modifier. 127 query grammar 128 a set of definitions of XML elements, attributes, and 129 constraints on their relations and values that defines a set of 130 queries and the intended semantics. 131 schema 132 a listing, for any given grammar and scope, of the properties 133 and operators that may be used in a query with that grammar and 134 scope. 135 Hit highlighting 136 is a specification of the location(s) within a resource 137 containing text that matched a content-query. It allows clients 138 to provide visual cues to a user to identify segments in a text 139 resource that cause them to match content-based queries. 140 paged results 141 allows a client to request that the server return a subset of 142 the result set rather than the entire set. In subsequent calls 143 to the server, additional results from the same query can be 144 requested. Paged results are intended to improve the 145 performance and manageability of search results. 147 In addition to the terms defined above, this document uses terminology 148 consistent with [HTTP] and [WEBDAV]. 150 Requirements are divided into five categories, and numbered within each 151 category. The categories are Scope, Criteria, Record Definition, Other 152 and Discovery. 154 3. Requirements: Scope 156 S1: It is possible to specify at least one resource in the scope (P1). 157 It is possible to specify a set of distinct, unrelated resources in the 158 scope (P2). 159 As this is the first requirement in the document, we explain 160 the notation. S1 means this is the requirement one in the Scope 161 section, P1 means that the requirement to have at least one 162 resource in scope is essential, and P2 means that allowing more 163 than one is nice but not required. 165 Rationale: Supporting multiple resources in scope could be 166 difficult to define, because distinct resources may have 167 different sets of metadata, support different operators, or 168 have different access rights. 170 S2 It is possible to specify a WebDAV collection as a scope (P1). 172 S3: It is possible to specify other types of resources in a scope (P2). 173 Rationale: A client might wish to determine whether a given 174 resource was of interest without transferring it. 176 S4: When the scope is a collection, it is possible to specify the depth 177 (P1). 178 Users often intend to scope their searches either to the 179 immediate children of a container or to extend the search 180 recursively to the container's children. Furthermore, depth 181 control is needed to prevent servers from performing 182 unnecessary work. 184 4. Requirements: Criteria 186 Criteria generalities 188 C1: It is possible to search properties in a query (P1). It is possible 189 to search both DAV-defined and application-defined properties in a 190 query (P1). 192 Further requirements for properties are below. 194 C2: It is possible to search content in a query (P1). 195 Note that at this writing, unlike property searches, there is 196 no single widely accepted semantics for content-based queries. 197 Further requirements for content criteria are below. 199 C3: It is possible to search both properties and content in a single 200 query. 202 C4: It is possible to combine criteria with Boolean operators (i.e. 203 and, or, not) (P1). 205 Criteria for properties 207 C5: It is possible to include undefined properties in a query without 208 error (P1). 209 Rationale:. This arises from the property model of DAV. Unlike 210 the more familiar relational model, DAV does not define tables 211 or schema for resources, hence there is no guarantee that all 212 properties will be defined for all resources. Moreover, DAV 213 allows an client to store arbitrary properties on arbitrary 214 resources. Therefore DASL must support queries that use 215 properties that are not defined on all resources in the scope. 216 If such a query failed, there would be no way to locate the 217 desired resources. 219 C5.1: It is possible to test whether a property is defined (P1). 221 C6.1: It is possible to compare a property value to a constant 222 value (P1). 224 C6.2.1: It is possible to compare property values to other properties 225 of the same resource (P2). 227 C6.2.2: It is possible to compare property values to other properties 228 of other resources (P2). 230 Note that this may involve a "join". We do not expect the first 231 version of the DASL protocol to meet this requirements. 233 C6.3: It is possible to compare property values to results of 234 expressions (P2). 236 C6.4: It is possible to match property values with string-ending 237 wildcards (P1). It is possible to match property values with pattern 238 matching operators similar to the SQL "like" operator or regular 239 expressions (P2). 241 The minimum is necessary to enable DASL to locate resources by 242 content type, e.g. to locate all image files by comparison with 243 "image/*". More powerful comparisons are useful when strings 244 encode structured data such as times or lists. Note that these 245 are constraints on what the protocol must define, not on what 246 servers must necessarily implement. 248 C6.5: It is possible to compare property values taking into account 249 their structure (P2). 251 Explanation: Some WebDAV properties are defined to contain 252 strings (e.g. DAV:getcontenttype), but others contain 253 structured values (e.g., DAV:resourcetype, DAV:lockdiscovery). 254 Support for structured value criteria is needed, for example, 255 to locate resources locked in a certain manner by a certain 256 principal. The working group consensus is that this feature, 257 while undeniably very useful, is so difficult to define that it 258 is better for DASL to proceed than attempt to define it. Also, 259 there is much activity in the W3C to define an XML query 260 language, and it was felt better to wait for this to complete 261 than to define a competing standard. 263 C7.1: The protocol defines an equality operator (P1). 265 C7.2: The protocol defines relative operators (P1). 267 C8: The protocol defines means to specify case sensitivity (P1). 269 Note this does not say that all DASL servers must support both 270 case-sensitive and case-insensitive comparisons, but only that 271 the protocol must be able to express a client's preference, and 272 define behavior in the case where the server cannot support 273 that preference. 275 C9: The protocol supports language-specific definitions for string 276 comparison and sorting (P1). 278 Different cultures define different rules for string 279 comparison, e.g. for collating sequence and for significance of 280 diacritics. Cross-language comparison is out of scope for DASL, 281 but comparisons within the same language must be done with the 282 appropriate semantics. 284 Requirements: Criteria for content searches 286 C10: It is possible to search content of any text media type (P1). The 287 definition of "searching content" for DASL means locating sequences of 288 characters in the contents of the resource. 290 DASL defines no requirements for searching for structure within 291 text media types (e.g. for finding character strings only 292 within certain HTML tags.) This functionality is too 293 complicated to specify at the present time. 295 DASL defines no requirements for searching other media types 296 that might contain text (e.g. subtypes of application). 297 Searching non-text media types (e.g.images, audio) is out of 298 scope for DASL. 300 C11.1: It is possible to search for words that are within a specified 301 number of words (or, for some languages, characters) of each 302 other (P1). 304 This is often called 'near' search. It is used to locate 305 concepts that can be expressed in more than one way using the 306 same set of words, e.g. one might locate both "the President's 307 impeachment" and "the impeachment of the President". 309 C11.2: It is possible to search for words that occur within the same 310 grammatical context, e.g. same phrase, sentence, or paragraph (P2). 312 This is sometimes called 'in' search. 314 C12.1: It is possible for a client to control whether content searches 315 does or does not use a stemming comparison (P2). 317 C12.2: It is possible for a client to request comparisons using 318 phonetic similarity (e.g. soundex) (P2) 320 C12.3: It is possible for the client to request keyword expansion 321 (thesaurus expansion) (P2). 323 C13: It is possible for a client to conduct a relevance search (P2). In 324 such a search, the query consists of a set of words (perhaps an entire 325 resource), and the result is a list of resources whose contents most 326 closely resemble the query, sorted in decreasing order of resemblance. 328 5. Requirements: Results 330 R1: It is possible to specify a sorting for the result set (P1). 332 R2: It is possible to specify a set of properties to be returned in the 333 result records, distinct from the properties in criteria (P1). 335 For example, a query might ask for "the authors of those 336 documents under 10K in size". In this case, the criterion 337 relates only to the size, but the desired result record 338 contains only the author. 340 R3: It is possible for a client to request limits on the resources 341 consumed in creating of transmitting in the result set (P1). 343 Some queries can potentially return very large result sets. 344 Clients that are good citizens will voluntarily limit the size 345 of such results. In addition, some servers may charge money for 346 queries. 348 R3.1: It is possible for a client to limit the number of records in the 349 result set (P1). 351 This is the most meaningful unit of resource consumption to the 352 client. 354 R4: It is possible for the server to return fewer result records than 355 match the criteria (P1). 357 "Client proposes, server disposes". 359 R5: It is possible to a client to request paged results (P1). 361 Paged retrieval is necessary if result sets are very large and 362 if clients must also present a responsive interface to a user. 363 Note that this requirement is silent about whether a server 364 implements paged results by storing results from a query or 365 recalculating them as needed. 367 6. Requirements: Other 369 O1: It is possible to support multiple query grammars (P1). 371 Rationale: A particular query grammar may not expose all the 372 useful searching functionality of a server. Clients should be 373 allowed to query a server using any grammar that takes 374 advantage of those special server capabilities. This 375 requirement also allows DASL to define an initial limited query 376 grammar which meets all the mandatory requirements without 377 needing to address all the desirable, but non-mandatory 378 requirements. 380 O2: It is possible to extend the basic grammar defined by DASL (P1). 382 03: It is possible for the server to redirect a query (P1). 384 This is useful when a server is not able to search a given 385 scope, but can refer the client to another server which is able 386 to search the scope. 388 O4: It is possible for the client to request hit highlighting (P2). 390 7. Requirements: Discovery 392 D1: It is possible for a client to discover the set of query grammars 393 supported by a server (P1). 395 Without this, it is not very useful for servers to support 396 multiple grammars. 398 D2: It is possible for a client to discover the schema supported by a 399 server for a particular grammar with a particular scope (P1). 401 Note that the schema may differ depending on the scope. Query 402 schema discovery allows a client to use optional properties and 403 operators supported by a server. 405 D3: It is possible for a client to determine information about the 406 properties within a scope (P2). 408 This information can enable a user interface to help a user to 409 construct a valid query, for example by providing meaningful 410 names for properties, constraints on values, hints about data 411 type, and so on, or information about expected performance, for 412 example whether a property is indexed (and hence more quickly 413 searched). 415 8. External Requirements 417 DASL must describe how to perform searches on internationalized content 418 and properties. This is in keeping with IETF policy. 420 Information intended for user comprehension must conform to the IETF 421 Character Set Policy [CHAR]. 423 The WebDAV working group is currently addressing the standardization of 424 mechanisms for authors to submit variants and version of resources, or 425 for means of exposing access control. DASL should provide mechanisms 426 that can query for variants, versions, and access control but can not 427 do so until they are defined. Likewise, DASL may contribute 428 requirements to access control (e.g. control over querying). 430 9. Related Work 432 Z39.50: "Information Retrieval (Z39.50): Application Service Definition 433 and Protocol Specification". 434 http://lcweb.loc.gov/z3950/agency/ 436 Z39.50 Profile for Simple Distributed Search and Ranked Retrieval 437 http://lcweb.loc.gov/z3950/agency/profiles/zdsr.html 439 The STARTS Protocol 440 http://www-db.stanford.edu/~gravano/starts.html 442 The Harvest Information Discovery and Access System 443 http://mordor.transarc.com/afs/transarc.com/public/trg/Harvest/ 445 10. References 447 [CHAR] H.T. Alvestrand, "IETF Policy on Character Sets and 448 Languages", June 1997, internet-draft, work-in-progress, 449 draft-alvestrand-charset-policy-02.txt. 451 [HTTP] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and 452 T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", 453 RFC 2068, U.C. Irvine, DEC, MIT/LCS, January 1997. 455 [SCENARIOS] Henderson, R. et al Scenarios for DAV Searching and 456 Locating. Work in progress. 457 draft-henderson-dasl-scenarios-00.html, September 18, 1998 458 (Expires Mar 23, 1999) 460 [WEBDAV] Y. Y. Goland, E. J. Whitehead, Jr., A. Faizi, S. R. Carter, 461 D. Jensen, "Extensions for Distributed Authoring and 462 Versioning on the World Wide Web", IETF Proposed Standard, 463 RFC 2518 465 11. Authors' Addresses 467 Jim Davis 468 Xerox Corporation 469 3333 Coyote Hill Road 470 Palo Alto, CA 94304 471 Email: jdavis@parc.xerox.com 473 Saveen Reddy 474 Microsoft Corporation 475 One Microsoft Way 476 Redmond WA, 9085-6933 477 email: saveenr@microsoft.com 479 Judith Slein 480 Xerox Corporation 481 800 Phillips Road 105-50C 482 Webster, NY 14580 483 Email: slein@wrc.xerox.com