idnits 2.17.1 draft-hammer-discovery-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 467 has weird spacing: '... query frag...' -- The document date (March 23, 2009) is 5511 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: '-' is mentioned on line 945, but not defined == Missing Reference: 'TM' is mentioned on line 1126, but not defined == Unused Reference: 'RFC2818' is defined on line 1107, but no explicit reference was found in the text == Outdated reference: A later version (-10) exists of draft-nottingham-http-link-header-03 == Outdated reference: A later version (-05) exists of draft-nottingham-site-meta-01 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 2818 (Obsoleted by RFC 9110) == Outdated reference: A later version (-28) exists of draft-bryan-metalink-05 Summary: 3 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Hammer-Lahav 3 Internet-Draft Yahoo! 4 Intended status: Informational March 23, 2009 5 Expires: September 24, 2009 7 Link-based Resource Descriptor Discovery 8 draft-hammer-discovery-03 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on September 24, 2009. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents in effect on the date of 40 publication of this document (http://trustee.ietf.org/license-info). 41 Please review these documents carefully, as they describe your rights 42 and restrictions with respect to this document. 44 Abstract 46 This memo describes LRDD (pronounced 'lard'), a process for obtaining 47 information about a resource identified by a URI. The 'information 48 about a resource', a resource descriptor, provides machine-readable 49 information that aims to increase interoperability and enhance the 50 interaction with the resource. This memo only defines the process 51 for locating and obtaining the descriptor, but leaves the descriptor 52 format and its interpretation out of scope. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4 58 3. The describedby Link Relation . . . . . . . . . . . . . . . . 4 59 4. Identifying Descriptor Location . . . . . . . . . . . . . . . 5 60 4.1. Method Selection . . . . . . . . . . . . . . . . . . . . . 5 61 4.2. The Element . . . . . . . . . . . . . . . . . . . . 6 62 4.3. The HTTP Link Header . . . . . . . . . . . . . . . . . . . 7 63 4.4. The Host Metadata Document . . . . . . . . . . . . . . . . 8 64 5. Obtaining Resource Descriptor . . . . . . . . . . . . . . . . 9 65 6. The Link-Pattern host-meta Field . . . . . . . . . . . . . . . 9 66 6.1. Template Syntax . . . . . . . . . . . . . . . . . . . . . 10 67 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 68 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 69 8.1. The Link-Pattern host-meta Field . . . . . . . . . . . . . 11 70 8.2. The describedby Relation Type . . . . . . . . . . . . . . 12 71 Appendix A. Descriptor Discovery vs. Service Discovery . . . . . 12 72 Appendix B. Methods Suitability Analysis . . . . . . . . . . . . 13 73 Appendix B.1. Requirements . . . . . . . . . . . . . . . . . . . . 13 74 Appendix B.2. Analysis . . . . . . . . . . . . . . . . . . . . . . 15 75 Appendix C. Acknowledgments . . . . . . . . . . . . . . . . . . 22 76 Appendix D. Document History . . . . . . . . . . . . . . . . . . 22 77 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 78 9.1. Normative References . . . . . . . . . . . . . . . . . . . 24 79 9.2. Informative References . . . . . . . . . . . . . . . . . . 25 80 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 25 82 1. Introduction 84 This memo defines a process for locating descriptors for resources 85 identified with URIs. Resource descriptors are documents (usually 86 based on well known serialization languages such as XML, RDF, and 87 JSON) which provide machine-readable information about resources 88 (resource metadata) for the purpose of promoting interoperability and 89 assist in interacting with unknown resources that support known 90 interfaces. 92 While many methods provide the ability to link a resource to its 93 metadata, none of these methods fully address the requirements of a 94 uniform and easily implementable process. These requirements include 95 the ability for resources to self-declare the location of their 96 descriptors, the ability to access descriptors directly without 97 interacting with the resource, and support a wide range of platforms 98 and scale of deployment. They must also be fully compliant with 99 existing web protocols, and support extensibility. These 100 requirements, and the analysis used as the basis for this memo are 101 explains in detail in Appendix B. 103 For example, a web page about an upcoming meeting can provide in its 104 descriptor document the location of the meeting organizer's free/busy 105 information to potentially negotiate a different time. A social 106 network profile page descriptor can identify the location of the 107 user's address book as well as accounts on other sites. A web 108 service implementing an API with optional components can advertise 109 which of these are supported. 111 This memo describes the first step in the discovery process in which 112 the resource descriptor document is located and retrieved. Other 113 steps, which are outside the scope of this memo, include parsing the 114 descriptor document based on its format (such as POWDER [POWDER], XRD 115 [XRD], and Metalink [I-D.bryan-metalink]) and utilizing it based on 116 the application. 118 Discovery can be performed before, after, or without obtaining a 119 representation of the resource. Performing discovery ahead of 120 accessing a representation allows the client not to reply on 121 assumptions about the properties of the resource. Performing 122 discovery after a representation has been obtained enables further 123 interaction with it. 125 Given the wide range of 'information about a resource', no single 126 descriptor format can adequately accommodate such scope. However, 127 there is great value in making the process locating the descriptor 128 uniform across formats. While HTTP is the most common protocol used 129 in association with discovery and is explicitly specified in this 130 memo, other protocols MAY be used. 132 Please discuss this draft on the www-talk@w3.org [1] mailing list. 134 2. Notational Conventions 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 138 document are to be interpreted as described in [RFC2119]. 140 This document uses the Augmented Backus-Naur Form (ABNF) notation of 141 [RFC2616]. Additionally, the following rules are included from 142 [RFC3986]: reserved and unreserved, and from 143 [I-D.nottingham-http-link-header]: link-param. 145 3. The describedby Link Relation 147 The methods described in this memo express the location of the 148 resource descriptor as a link relation, utilizing the link framework 149 defined by [I-D.nottingham-http-link-header]. The association of a 150 descriptor document with the resource it describes is declared using 151 the "describedby" link relation type. 153 The "describedby" link relation is defined in [POWDER] and registered 154 as: 156 The relationship A "describedby" B asserts that resource B 157 provides a description of resource A. There are no constraints on 158 the format or representation of either A or B, neither are there 159 any further constraints on either resource. 161 Since a single resource can have many descriptors, the "describedby" 162 link relation has a one-to-many structure (the question whether a 163 single descriptor can describe multiple resources is outside the 164 scope of this memo). In the case of multiple "describedby" links 165 obtained from a single method, selecting which link to use is 166 application-specific. 168 To promote interoperability, applications referencing this memo 169 SHOULD clearly define the application-specific criteria used to 170 select between "describedby" links. This MAY be done by: 172 o Supporting a single descriptor format, or defining an order of 173 precedence for multiple descriptor formats. Applications MAY 174 require the presence of the link "type" attribute with the mime- 175 type of the required format. 177 o Using the "describedby" relation type together with another 178 application-specific relation type in the same link. The 179 application-specific relation type can be registered or an 180 extension. 182 o Specifying additional link attributes using link-extensions. 184 Link selection MUST NOT depend on the order in which multiple links 185 are obtained from a single method. Applications MUST NOT impose 186 constraints on the usage of the "describedby" relation type as it is 187 likely to be used by other applications in association with the same 188 resource. 190 4. Identifying Descriptor Location 192 The descriptor location (URI) is a function of the resource URI. 193 This section defines three methods which together satisfy the 194 requirements defined in Appendix B. While each method on its own 195 satisfies the requirements partially, together they provide enough 196 flexibility for most use cases. Each of the following three methods 197 is performed by using the resource URI to identify its descriptor 198 URI. 200 In many cases, a request for one URI leads to requesting other URIs, 201 as is the case with HTTP redirections. Because the decision whether 202 to use such URIs is application-specific, discovery is constrained to 203 a single URI identifying the resource. Any other resource URIs 204 received MUST be considered as a separate and discrete input into the 205 discovery function. If a resource URI obtained during the 206 performance of these methods is found to be more relevant to the 207 application, the discovery process MUST be restarted with the new 208 resource URI as its input. 210 For example, an HTTP HEAD request for URI A returns a redirect (307) 211 response with a set of "describedby" links, and identifies the 212 temporary location of the representation at URI B. An HTTP HEAD 213 request for URI B returns a successful (200) response with its own 214 set of "describedby" links. An application MAY choose to define a 215 process in which the two sets of links are obtained, prioritized, and 216 utilized, however, it MUST do so by explicitly instructing the client 217 to perform discovery multiple times, as each is considered separate 218 and distinct discovery. 220 4.1. Method Selection 222 Each method presents a different set of requirements. The criteria 223 used to determine which methods a server SHOULD support and client 224 SHOULD attempt are based on a combination of factors: 226 o The ability to offer and obtain a representation of the resource 227 by dereferencing its URI. 229 o The availability of a representation supporting markup 230 compatible with [I-D.nottingham-http-link-header]. 232 o The availability of an HTTP representation of the resource and the 233 ability to provide and access link information in its response 234 header. 236 The methods are listed is based on the restrictiveness of their 237 requirements in descending order, from the most specialized to the 238 most generic. This ordering however, does not imply the order in 239 which multiple applicable methods should be attempted. Because 240 different methods are more appropriate in different circumstances, it 241 is up to each application to define how they should be used together. 243 To promote interoperability, applications referencing this memo MUST 244 clearly define the relationship between the three methods as either: 246 o equal, all methods MUST produce the same set of resource 247 descriptors and clients MAY attempt either method according to 248 their capabilities, or 250 o with an application-specific order of precedence, where methods 251 MUST be attempted in a specific order. 253 4.2. The Element 255 The element method is limited to resources with an available 256 markup representation that supports typed-relations using the 257 element, such as HTML [W3C.REC-html401-19991224], XHTML 258 [W3C.REC-xhtml1-20020801], and Atom [RFC4287]. Other markup formats 259 are permitted as long as the semantics of their elements are 260 fully compatible with the link framework defined in 261 [I-D.nottingham-http-link-header]. This method requires the 262 retrieval of a resource representation. While HTTP is the most 263 common transport for such documents, this method is transport 264 independent. 266 For example: 268 271 A client trying to obtain the location of the resource's descriptor 272 using this method SHALL: 274 1. Retrieve a representation of the resource using the applicable 275 transport for that resource URI. If the markup document is 276 obtained using HTTP, it MUST only be used by the client if the 277 document is a valid representation of the resource identified by 278 the HTTP request URI, typically in a response with a successful 279 (2xx) or redirection (3xx) status code. If no such valid 280 representation of the request URI is found, the method fails. 282 2. Parse the document as defined by its format specification and 283 look for elements with a "rel" attribute value containing 284 the "describedby" relation. The client MUST obey the document 285 markup schema and ignore any invalid elements (such as 286 elements outside the section of an HTML document). This 287 is done to avoid unintentional markup from other parts of the 288 document to be used for discovery purposes, which can have vast 289 impact on usability and security. 291 3. Narrow down the selection if more than one "describedby" link is 292 found, following the application-specific criteria. The 293 descriptor location is obtained from the value of the "href" 294 attribute in the selected element. 296 elements MAY include other relation types together with 297 "describedby" in a single "rel" attribute (for example 298 'rel="describedby copyright"'). Clients MUST be properly process use 299 such multiple relation "rel" attributes as defined by the format 300 specification. 302 4.3. The HTTP Link Header 304 The HTTP Link header method is limited to resources for which an HTTP 305 GET or HEAD request returns a 2xx, 3xx, or 4xx HTTP response 306 [RFC2616]. This method uses the Link header defined in 307 [I-D.nottingham-http-link-header] and requires the retrieval of a 308 resource representation header. 310 For example: 312 Link: ; rel="describedby"; 313 type="application/powder+xml" 315 A client trying to obtain the location of the resource's descriptor 316 using this method SHALL: 318 1. Make an HTTP (or HTTPS as required) GET or HEAD request to the 319 resource URI to obtain a valid response header. If the HTTP 320 response carries a status code other than successful (2xx), 321 redirection (3xx), or client error (4xx), the method fails. 323 2. Parse the HTTP response header and look for Link headers with a 324 "rel" parameter value containing the "describedby" relation. 326 3. Narrow down the selection if more than one "describedby" link is 327 found, following the application-specific criteria. The 328 descriptor location is obtained from the "<>" enclosed URI- 329 reference in the selected Link header. 331 Link headers MAY include other relation types together with 332 "describedby" in a single "rel" parameter (for example 333 'rel="describedby copyright"'). Clients MUST be properly process use 334 such multiple relation "rel" attributes as defined by 335 [I-D.nottingham-http-link-header]. 337 4.4. The Host Metadata Document 339 The host metadata document method is available for any resource 340 identified by a URI whose authority supports the host-meta document 341 defined in [I-D.nottingham-site-meta]. This method does not require 342 obtaining any representation of the resource, and operates solely 343 using the resource URI. 345 The link relation between the resource URI and the descriptor URI is 346 obtained by using a template contained in the host-meta document. By 347 applying the host-wide template to an individual resource URI, a 348 resource-specific link is produced which can be used to indicate the 349 location of the descriptor document for that resource, bypassing the 350 need to access or provide a representation for it. 352 For example (line breaks are for formatting only, and are not allowed 353 in the document): 355 Link-Pattern: <{uri};about">; rel="describedby"; 356 type="application/powder+xml" 358 A client trying to obtain the location of the resource's descriptor 359 using this method SHALL: 361 1. Retrieve the host-meta document for URI's authority as defined by 362 [I-D.nottingham-site-meta] section 4. If the request fails to 363 retrieve a valid host-meta document, the method fails. 365 2. Parse host-meta document and look for Link-Pattern fields with a 366 "rel" attribute value containing the "describedby" relation. 368 3. Narrow down the selection if more than one "describedby" link is 369 found, following the application-specific criteria. The 370 descriptor location is constructed by applying the template 371 obtained from the selected Link-Pattern field to the resource URI 372 as described by Section 6.1. 374 Link-Pattern MAY include other relation types together with 375 "describedby" in a single "rel" parameter (for example 376 'rel="describedby copyright"'). Clients MUST be properly process use 377 such multiple relation "rel" attributes as defined by Section 6. 379 5. Obtaining Resource Descriptor 381 Once the desired descriptor URI has been obtained, the descriptor 382 document is retrieved. If the descriptor URI scheme is "http" or 383 "https", the document is obtained via an HTTP (or HTTPS as required) 384 GET request to the identified URI. The client MUST obey HTTP 385 redirections (3xx), and the descriptor document is considered valid 386 only if retrieved with a successful HTTP response status (2xx). 388 6. The Link-Pattern host-meta Field 390 The Link host-meta field [I-D.nottingham-site-meta] conveys a link 391 relation between all resource URIs under the host-meta authority and 392 a common target URI. However, there are cases in which relations of 393 different resources with the same authority do not share the same 394 target URI, but do follow a common pattern in how the target URI is 395 constructed. 397 For example, a news site with multiple authors can provide 398 information about each article's author, but appending a suffix (such 399 as ";by") to the URI of each article. Each article has a unique 400 author, but all share the same pattern of where that information is 401 located. The same information can be provided using an HTTP link 402 header or HTML element, but in a less efficient manner when a 403 single pattern can provide the same information: 405 Link-Pattern: <{uri};by>; rel="author" 407 The Link-Pattern host-meta field uses a slightly modified syntax of 408 the HTTP Link header [I-D.nottingham-http-link-header] to convey 409 relations whose context is individual resources with the same 410 authority as the host-meta document, and whose target is constructed 411 by applying a template to the context URI. The field is not specific 412 to any relation type and MAY be used to express any relations 413 supported by the Link header [I-D.nottingham-http-link-header]. 415 The Link-Pattern host-meta field differs from the HTTP Link header in 416 the following respects: 418 o The "<>" enclosed token is not a valid URI, but instead contains a 419 template as defined in Section 6.1. 421 o Its context URI is defined as the individual resource URI used as 422 input to the template. 424 o If the resulting target URI expressed by the template is relative, 425 its base URI is the root resource of the authority. 427 Link-Pattern = "Link-Pattern" ":" #pattern-value 429 pattern-value = "<" template ">" *( ";" link-param ) 431 template = *( uri-char | "{" [ "%" ] var-name "}" ) 433 uri-char = ( reserved | unreserved ) 435 var-name = "scheme" | "authority" | "path" 436 | "query" | "fragment" | "userinfo" 437 | "host" | "port" | "uri" 439 [[ should this spec define a filter/map parameter that will allow 440 applying link patterns to subsets of the host-meta scope? This can 441 use a regular expression match or something similar to robots.txt. 442 If the spec will end up not directly supporting this feature, I will 443 add a note suggesting that such a feature could be defined elsewhere 444 as an extension. ]] 446 6.1. Template Syntax 448 The template syntax provides a simple format for URI transformation. 449 A template is a string containing brace-enclosed ("{}") variable 450 names marking the parts of the string that are to be substituted by 451 the variable values. A template is transformed into a URI by 452 substituting the variables with their calculated value. If a 453 variable name is prefixed by "%", any character in the variable value 454 other than unreserved MUST be percent-encoded per [RFC3986]. 456 To construct a URI using a template, the input URI is parsed into its 457 URI components and each component value assigned to a variable name. 458 The template variable substitution is based on the URI vocabulary 459 defined by [RFC3986] section 3 and includes: "scheme", "authority", 460 "path", "query", "fragment", "userinfo", "host", and "port". In 461 addition, it defines the "uri" variable as the entire input URI 462 excluding the fragment component and the "#" fragment separator. 464 foo://william@example.com:8080/over/there?name=ferret#nose 465 \_/ \______________________/\_________/ \_________/ \__/ 466 | | | | | 467 scheme authority path query fragment 469 foo://william@example.com:8080/over/there?name=ferret#nose 470 \_____/ \_________/ \__/ 471 | | | 472 userinfo host port 474 foo://william@example.com:8080/over/there?name=ferret#nose 475 \___________________________________________________/ 476 | 477 uri 479 For example, given the input URI "http://example.com/r/1?f=xml#top", 480 each of the following templates will produce the associated output 481 URI: 483 http://example.org?q={%uri} --> 484 http://example.org?q=http%3A%2F%2Fexample.com%2Fr%2F1%3Ff%3Dxml 486 http://meta.{host}:8080{path}?{query} --> 487 http://meta.example.com:8080/r/1?f=xml 489 https://{authority}/v1{path}#{fragment} --> 490 https://example.com/v1/r/1#top 492 7. Security Considerations 494 The methods used to perform discovery are not secure, private or 495 integrity-guaranteed, and due caution should be exercised when using 496 them. Applications that perform discovery should consider the attack 497 vectors opened by automatically following, trusting, or otherwise 498 using links gathered from elements, HTTP Link headers, or 499 host-meta documents. 501 8. IANA Considerations 503 8.1. The Link-Pattern host-meta Field 505 This specification registers the Link-Pattern host-meta field in the 506 host-meta Field Registry [I-D.nottingham-site-meta]. 508 Field Name: Link-Pattern 510 Change controller: IETF 512 Specification document(s): [[ this document ]] 514 Related information: [I-D.nottingham-http-link-header] 516 8.2. The describedby Relation Type 518 [[ this section will be removed if the "describedby" relation type is 519 registered by the time it is published ]] 521 This specification registers the "describedby" relation type in the 522 Link Relation Type Registry [I-D.nottingham-http-link-header]. 524 o Relation Name: describedby 526 o Description: The relationship A "describedby" B asserts that 527 resource B provides a description of resource A. There are no 528 constraints on the format or representation of either A or B, 529 neither are there any further constraints on either resource. 531 o Documentation: [POWDER] 533 Appendix A. Descriptor Discovery vs. Service Discovery 535 Descriptor discovery provides a process for obtaining information 536 about a resource identified with a URI. It allows servers to 537 describe their resources in a machine-readable format, enabling 538 automatic interoperability by user-agents and resource consuming 539 applications. Discovery enables applications to utilize a wide range 540 of web services and resources across multiple providers without the 541 need to know about their capabilities in advance, reducing the need 542 for manual configuration and resource-specific software. 544 When discussing discovery, it is important to differentiate between 545 descriptor discovery and service discovery. Both types attempts to 546 associate capabilities with resources, but they approach it from 547 opposite ends. 549 Service discovery centers on identifying the location of qualified 550 resources, typically finding an endpoint capable of certain protocols 551 and capabilities. In contrast, descriptor discovery begins with a 552 resource, trying to find which capabilities it supports. 554 A simple way to distinguish between the two types of discovery is to 555 define the questions they are each trying to answer: 557 Descriptor-Discovery: Given a resource, what are its attributes: 558 capabilities, characteristics, and relationships to other 559 resources? 561 Service-Discovery: Given a set of attributes, which available 562 resources match the desired set and what is their location? 564 While this memo deals exclusively with descriptor discovery, it is 565 important to note that the two discovery types are closely related 566 and are usually used in tandem. In fact, a typical use case will 567 switch between service discovery and descriptor discovery multiple 568 times in a single workflow, and can start with either one. 570 One reason for this dependency between the two discovery types is 571 that resource descriptors usually contain not only a list of 572 capabilities, but also relationships to other resources. Since those 573 relationships are usually typed, the process in which an application 574 chooses which links to use is in fact service discovery. 576 Applications use descriptor discovery to obtain the list of links, 577 and service discovery to choose the relevant links. In another 578 common example, the application uses service discovery to find a 579 resource with a given capability, then uses descriptor discovery to 580 find out what other capabilities it supports. 582 Appendix B. Methods Suitability Analysis 584 Due to the wide range of use cases requiring resource descriptors, 585 and the desire to reuse as much as possible, no single solution has 586 been found to sufficiently cover the requirements for linking between 587 the resource URI and the descriptor URI. The following analysis 588 attempts to list all the method proposed for addressing descriptor 589 discovery. It is included here to provide background information as 590 to why certain methods have been selected while others rejected from 591 the discovery process. It has been updated to match the terms used 592 in this memo and its structure. 594 Appendix B.1. Requirements 596 Getting from a resource URI to its descriptor document can be 597 implemented in many ways. The problem is that none of the current 598 methods address all of the requirements presented by the common use 599 cases. The requirements are simple, but the more we try to address, 600 the less elegant and accessible the process becomes. While working 601 on the now defunct XRDS-Simple specification [XRDS-Simple] and 602 talking to companies and individual about it, the following 603 requirements emerged for any proposed process: 605 Self Declaration: 607 Allow resources to declare the availability of descriptor 608 information and its location. When a resource is accessed, it 609 needs to have a way to communicate to the client that it 610 supports the discovery protocol and to indicates the location 611 of such descriptor. 613 This is useful when the client is able or is already 614 interacting with the resource but can enhance its interaction 615 with additional information. For example, accessing a blog 616 page enhanced if it was generated from an Atom feed or Atom 617 entry and that feed supports Atom authoring. 619 Direct Descriptor Access: 621 Enable direct retrieval of the resource descriptor without 622 interacting with the resource itself. Before a resource is 623 accessed, the client should have a way to obtain the resource 624 descriptor without accessing the resource. This is important 625 for two reasons. 627 First, accessing an unknown resource may have undesirable 628 consequences. After all, the information contained in the 629 descriptor is supposed to inform the client how to interact 630 with the resource. The second is efficiency - removing the 631 need to first obtain the resource in order to get its 632 descriptor (reducing HTTP round-trips, network bandwidth, and 633 application latency). 635 Web Architecture Compliant: 637 Work with well-established web infrastructure. This may sound 638 obvious but it is in fact the most complex requirement. 639 Deploying new extensions to the HTTP protocol is a complicated 640 endeavor. Beside getting applications to support a new header, 641 method, or content negotiation, existing caches and proxies 642 must be enhanced to properly handle these requests, and they 643 must not fail performing their normal duties without such 644 enhancements. 646 For example, a new content negotiation method may cause an 647 existing cache to serve the wrong data to a non-discovery 648 client due to its inability to distinguish the metadata request 649 from the resource representation request. 651 Scale and Technology Agnostic: 653 Support large and small web providers regardless of the size of 654 operations and deployment. Any solution must work for a small 655 hosted web site as well as the world largest search engine. It 656 must be flexible enough to allow developers with restricted 657 access to the full HTTP protocol (such as limited access to 658 request or response headers) to be able to both provide and 659 consume resource descriptors. Any solution should also support 660 caching as much as possible and allow reuse of source code and 661 data. 663 Extensible: 665 Accommodate future enhancements and unknown descriptor formats. 666 It should support the existing set of descriptor formats such 667 as XRD and POWDER, as well as new descriptor relationships that 668 might emerge in the future. In addition, the solution should 669 not depend on the descriptor format itself and work equally 670 well with any document format - it should aim to keep the road 671 and destination separate. 673 Appendix B.2. Analysis 675 The following is a list of proposed and implemented methods trying to 676 address descriptor discovery. Each method is reviewed for its 677 compliance with the requirements identified previously. The [-], 678 [+], or [+-] symbols next to each requirement indicate how well the 679 method complies with the requirement. 681 Appendix B.2.1. HTTP Response Header 683 When a resource representation is retrieved using and HTTP GET 684 request, the server includes in the response a header pointing to the 685 location of the descriptor document. For example, POWDER uses the 686 "Link" response header to create an association between the resource 687 and its descriptor. XRDS [XRDS] (based on the Yadis protocol 688 [Yadis]) uses a similar approach, but since the Link header was not 689 available when Yadis was first drafted, it defines a custom header 690 X-XRDS-Location which serves a similar but less generic purpose. 692 [+] Self Declaration - using the Link header, any resource can point 693 to its descriptor documents. 695 [-] Direct Descriptor Access - the header is only accessible when 696 requesting the resource itself via an HTTP GET request. While 697 HTTP GET is meant to be a safe operation, it is still possible for 698 some resource to have side-effects. 700 [+] Web Architecture Compliant - uses the Link header which is an 701 IETF Internet Standard [[ currently a standard-track draft ]], and 702 is consistent with HTTP protocol design. 704 [-] Scale and Technology Agnostic - since discovery accounts for a 705 small percent of resource requests, the extra Link header is 706 wasteful. For some hosted servers, access to HTTP headers is 707 limited and will prevent implementation. 709 [+] Extensible - the Link header provides built-in extensibility by 710 allowing new link relations, mime-types, and other extensions. 712 Minimum roundtrips to retrieve the resource descriptor: 2 714 Appendix B.2.2. HTTP Response Header Via HEAD 716 Same as the HTTP Response Header method but used with an HTTP HEAD 717 request. The idea of using the HEAD method is to solve the wasteful 718 overhead of including the Link header in every reply. By limiting 719 the appearance of the Link header only to HEAD responses, typical GET 720 requests are not encumbered by the extra bytes. 722 [+] Self Declaration - Same as the HTTP Response Header method. 724 [-] Direct Descriptor Access - Same as the HTTP Response Header 725 method. 727 [-] Web Architecture Compliant - HTTP HEAD should return the exact 728 same response as HTTP GET with the sole exception that the 729 response body is omitted. By adding headers only to the HEAD 730 response, this solution violates the HTTP protocol and might not 731 work properly with proxies as they can return the header of the 732 cached GET request. 734 [+] Scale and Technology Agnostic - solves the wasted bandwidth 735 associated with the HTTP Response Header method, but still suffers 736 from the limitation imposed by requiring access to HTTP headers. 738 [+] Extensible - Same as the HTTP Response Header method. 740 Minimum roundtrips to retrieve the resource descriptor: 2 742 Appendix B.2.3. HTTP Content Negotiation 744 Using the HTTP Accept request header or Transparent Content 745 Negotiation as defined in [RFC2295], the client informs the server it 746 is interested in the descriptor and not the resource itself, to which 747 the server responds with the descriptor document or its location. In 748 Yadis, the client sends an HTTP GET (or HEAD) request to the resource 749 URI with an Accept header and content-type application/xrds+xml. 750 This informs the server of the client's discovery interest, which in 751 turn may reply with the descriptor document itself, redirect to it, 752 or return its location via the X-XRDS-Location response header. 754 [-] Self Declaration - does not address as it focuses on the client 755 declaring its intentions. 757 [+] Direct Descriptor Access - provides a simple method for directly 758 requesting the descriptor document. 760 [-] Web Architecture Compliant - while it can be argued that the 761 descriptor can be considered another representation of the 762 resource, it is very much external to it. Using the Accept header 763 to request a separate resource (as opposed to a different 764 representation of the same resource) violates web architecture. 765 It also prevents using the discovery content-type as a valid 766 (self-standing) web resource having its own descriptor. 768 [-] Scale and Technology Agnostic - requires access to HTTP request 769 and response headers, as well as the registration of multiple 770 handlers for the same resource URI based on the Accept header. In 771 addition, improper use or implementation of the Vary header in 772 conjunction with the Accept header will cause caches to serve the 773 descriptor document instead of the resource itself - a great 774 concern to large providers with frequently visited front-pages. 776 [-] Extensible - applies an implicit relation type to the descriptor 777 mime-type, limiting descriptor formats to a single purpose. It 778 also prevents using existing mime-types from being used as a 779 descriptor format. 781 Minimum roundtrips to retrieve the resource descriptor: 1 783 Appendix B.2.4. HTTP Header Negotiation 785 Similar to the HTTP Content Negotiation method, this solution uses a 786 custom HTTP request header to inform the server of the client's 787 discovery intentions. The server responds by serving the same 788 resource representation (via an HTTP GET or HEAD requests) with the 789 relevant Link headers. It attempts to solve the HTTP Response Header 790 waste issue by allowing the client to explicitly request the 791 inclusion of Link headers. One such header can be called "Request- 792 links" to inform the server the client would like it to include 793 certain Link headers of a given "rel" type in its reply. 795 [+] Self Declaration - same as HTTP Response Header with the option 796 of selective inclusion. 798 [-] Direct Descriptor Access - does not address. 800 [-] Web Architecture Compliant - HTTP does not include any mechanism 801 for header negotiation and any custom solution will break existing 802 caches. 804 [+-] Scale and Technology Agnostic - Requires advance access to HTTP 805 headers on both the client and server sides, but solves the 806 bandwidth waste issue of the HTTP Response Header method. 808 [+] Extensible - builds on top of Link header extensibility. 810 Minimum roundtrips to retrieve the resource descriptor: 2 812 Appendix B.2.5. Element 814 Embeds the location of the descriptor document within the resource 815 representation by leveraging the HTML header element (as 816 opposed to the HTTP header). Applies to HTML resource 817 representations or similar markup-based formats with support for 818 "Link"-like elements such as Atom. POWDER uses the element in 819 this manner, while XRDS uses the HTML element with an "http- 820 equiv" attribute equals to X-XRDS-Location (to create an embedded 821 version of the X-XRDS-Location custom header). 823 [+] Self Declaration - similar to HTTP Response Header method but 824 limited to HTML resources. 826 [-] Direct Descriptor Access - the method requires fetching the 827 entire resource representation in order to obtain the descriptor 828 location. In addition, it requires changing the resource HTML 829 representation which makes discovery an intrusive process. 831 [+] Web Architecture Compliant - uses the element as 832 designed. 834 [+] Scale and Technology Agnostic - while this solution requires 835 direct retrieval of the resource and manipulation of its content, 836 it is extremely accessible in many platforms. 838 [-] Extensible - extensibility is restricted to HTML representations 839 or similar markup formats with support for a similar element. 841 Minimum roundtrips to retrieve the resource descriptor: 2 843 Appendix B.2.6. HTTP OPTIONS Method 845 The HTTP OPTIONS method is used to interact with the HTTP server with 846 regard to its capabilities and communication-related information 847 about its resources. The OPTIONS method, together with an optional 848 request header, can be used to request both the descriptor location 849 and descriptor content itself. 851 [-] Self Declaration - does not address. 853 [+] Direct Descriptor Access - provides a clean mechanism for 854 requesting descriptor information about a resource without 855 interacting with it. 857 [+] Web Architecture Compliant - uses an existing HTTP featured. 859 [-] Scale and Technology Agnostic - requires client and server 860 access to the OPTIONS HTTP method. Also does not support caching 861 which makes this solution inefficient. 863 [+] Extensible - built-into the OPTIONS method. 865 Minimum roundtrips to retrieve the resource descriptor: 1 867 Appendix B.2.7. WebDAV PROPFIND Method 869 Similar to the HTTP OPTIONS method, the WebDAV PROPFIND method 870 defined in [RFC4918] can be used to request resource specific 871 properties, one of which can hold the location of the descriptor 872 document. PROPFIND, unlike OPTIONS, cannot return the descriptor 873 itself, unless it is returned in the required PROPFIND schema (a 874 multi-status XML element). Other alternatives include URIQA [URIQA], 875 an HTTP extension which defines a method called MGET, and ARK 876 (Archival Resource Key) [ARK] - a method similar to PROPFIND that 877 allows the retrieval of resource attributes using keys (which 878 describe the resource). 880 [-] Self Declaration - does not address. 882 [+-] Direct Descriptor Access - does not require interaction with 883 the resource, but does require at least two requests to get the 884 descriptor (get location, get document). 886 [+] Web Architecture Compliant - uses an HTTP extension with less 887 support than core HTTP, but still based on published standards. 889 [-] Scale and Technology Agnostic - same as the HTTP OPTIONS Method. 891 [+-] Extensible - uses extensible protocols but at the same time 892 depends on solutions that have already gone beyond the standard 893 HTTP protocol, which makes further extensions more complex and 894 unsupported. 896 Minimum roundtrips to retrieve the resource descriptor: 2 898 Appendix B.2.8. Custom HTTP Method 900 Similar to the HTTP OPTIONS Method, a new method can be defined (such 901 as DISCOVER) to return (or redirect to) the descriptor document. The 902 new method can allow caching. 904 [-] Self Declaration - does not address. 906 [+] Direct Descriptor Access - same as the HTTP OPTIONS Method. 908 [-] Web Architecture Compliant - depends heavily on extending every 909 platform to support the extension. Unlikely to be supported by 910 existing proxy services and caches. 912 [-] Scale and Technology Agnostic - same as HTTP OPTIONS Method with 913 the additional burden on smaller sites requiring access to the new 914 protocol. 916 [+] Extensible - new protocol that can extend as needed. 918 Minimum roundtrips to retrieve the resource descriptor: 1 920 Appendix B.2.9. Static Resource URI Transformation 922 Instead of using HTTP facilities to access the descriptor location, 923 this method defines a template to transform any resource URI to the 924 descriptor document URI. This can be done by adding a prefix or 925 suffix to the resource URI, which turns it into a new resource URI. 926 The new URI points to the descriptor document. For example, to fetch 927 the descriptor document for http://example.com/resource, the client 928 makes an HTTP GET request to http://example.com/resource;about using 929 a static template that adds the ";about" suffix. 931 [-] Self Declaration - does not address. 933 [+] Direct Descriptor Access - creates a unique URI for the 934 descriptor document. 936 [+-] Web Architecture Compliant - uses basic HTTP facilities but 937 intrudes on the domain authority namespace as it defines a static 938 template for URI transformation that is not likely to be 939 compatible with many existing URI naming conventions. 941 [+-] Scale and Technology Agnostic - depending on the static mapping 942 chosen. Some hosted environment will have a problem gaining 943 access to the mapped URI based on the URI format chosen. 945 [-] Extensible - provides a very specific and limited method to map 946 between resources and their descriptor, since each relation type 947 must mint its own static template. 949 Minimum roundtrips to retrieve the resource descriptor: 1 951 Appendix B.2.10. Dynamic Resource URI Transformation 953 Same as the Static Resource URI Transformation method but with the 954 ability for each domain authority to specify its own discovery 955 transformation template. This can done by placing a configuration 956 file at a known location (such as robots.txt) which contains the 957 template needed to perform the URL mapping. The client first obtains 958 the configuration document (which may be cached using normal HTTP 959 facilities), parses it, then uses that information to transform the 960 resource URI and access the descriptor document. 962 [+-] Self Declaration - does not address individual resources, but 963 allows entire domains to declare their support (and how to use 964 it). 966 [+-] Direct Descriptor Access - once the mapping template has been 967 obtained, descriptors can be accessed directly. 969 [+-] Web Architecture Compliant - uses an existing known-location 970 design pattern (such as robots.txt) and standard HTTP facilities. 971 The use of a known-location if not ideal and is considered a 972 violation of web architecture but if it serves as the last of its 973 kind, can be tolerated. An alternative to the known-location 974 approach can be using DNS to store either the location of the 975 mapping or the map template itself, but DNS adds a layer of 976 complexity not always available. 978 [+-] Scale and Technology Agnostic - works well at the URI authority 979 level (domain) but is inefficient at the URI path level (resource 980 path) and harder to implement when different paths within the same 981 domain need to use different templates. With the decreasing cost 982 of custom domains and sub-domains hosting, this will not be an 983 issue for most services, but it does require sharing configuration 984 at the domain/sub-domain level. 986 [+-] Extensible - can be, depending on the schema used to format the 987 known-location configuration document. 989 Minimum roundtrips to retrieve the resource descriptor: initially 2, 990 1 after caching 992 Appendix C. Acknowledgments 994 With the exception of the host-meta template extension, very little 995 of this memo is original work. Many communities and individuals have 996 been working on solving discovery for many years and this work is a 997 direct result of their hard and dedicated efforts. 999 Inspiration for this memo derived from previous work on a descriptor 1000 format called XRDS-Simple, which in turn derived from another 1001 descriptor format, XRDS. Previous discovery workflows include Yadis 1002 which is currently used by the OpenID community. While suffering 1003 from significant shortcomings, Yadis was a breakthrough approach to 1004 performing discovery using extremely restricted hosting environments, 1005 and this memo has strived to preserve as much of that spirit as 1006 possible. 1008 The use of Link elements and headers and the introduction of the 1009 "describedby" relation type in this memo is a direct result of the 1010 dedicated work and contribution of Phil Archer to the W3C POWDER 1011 specification and Jonathan Rees to the W3C review of Uniform Access 1012 to Information About. The host-meta approach was first proposed by 1013 Mark Nottingham as an alternative to attaching links directly to 1014 resource representations. 1016 The author wishes to thanks the OASIS XRI community for their 1017 support, encouragement, and enthusiasm for this work. Special thanks 1018 go to Lisa Dusseault, Joseph Holsten, Mark Nottingham, John Panzer, 1019 Drummond Reed, and Jonathan Rees for their invaluable feedback. 1021 The author takes all responsibility for errors and omissions. 1023 Appendix D. Document History 1025 [[ to be removed by the RFC editor before publication as an RFC ]] 1027 -03 1028 o Added protocol name LRDD (pronounced 'lard'). 1030 o Fixed Link-Pattern examples to include missing semicolons. 1032 -02 1034 o Changed focus from an HTTP-based process to Link-based process. 1036 o Completely revised and restructured document for better clarity. 1038 o Realigned the methods to produce consistent results and changed 1039 the way redirections and client-errors are handled. 1041 o Updated to use newer version of site-meta, now called host-meta, 1042 including a new plaintext-based format to replace the previous XML 1043 format. 1045 o Renamed Link-Template to Link-Pattern to avoid future conflict 1046 with a previously proposed Link-Template HTTP header. 1048 o Removed support for the "scheme" Link-Template parameter. 1050 o Replaced restrictions with interoperability recommendations. 1052 o Added IANA considerations per new host-meta registry requirements. 1054 -01 1056 o Rename 'resource discovery' to 'descriptor discovery'. 1058 o Added informative reference to Metalink. 1060 o Clarified that the resource descriptor URI can use any URI scheme, 1061 not just "http" or "https". 1063 o Removed comment regarding redirects when using Elements. 1065 o Clarified that HTTPS must be used with "https" URIs for both Link 1066 headers and host-meta retrieval. 1068 o Removed DNS verification step for host-meta with schemes other 1069 then "http" and "https". Replaced with a general discussion of 1070 authority and a security consideration comment. 1072 o Organized host-meta section into another sub-section level. 1074 o Enlarged the template vocabulary from a single "uri" variable to 1075 include smaller URI components. 1077 o Added informative reference to RFC 2295 in analysis appendix. 1079 -00 1081 o Initial draft. 1083 9. References 1085 9.1. Normative References 1087 [I-D.nottingham-http-link-header] 1088 Nottingham, M., "Link Relations and HTTP Header Linking", 1089 draft-nottingham-http-link-header-03 (work in progress), 1090 November 2008. 1092 [I-D.nottingham-site-meta] 1093 Nottingham, M. and E. Hammer-Lahav, "Host Metadata for the 1094 Web", draft-nottingham-site-meta-01 (work in progress), 1095 February 2009. 1097 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1098 Requirement Levels", BCP 14, RFC 2119, March 1997. 1100 [RFC2295] Holtman, K. and A. Mutz, "Transparent Content Negotiation 1101 in HTTP", RFC 2295, March 1998. 1103 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1104 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1105 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1107 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. 1109 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1110 Resource Identifier (URI): Generic Syntax", STD 66, 1111 RFC 3986, January 2005. 1113 [RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom 1114 Syndication Format", RFC 4287, December 2005. 1116 [RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed 1117 Authoring and Versioning (WebDAV)", RFC 4918, June 2007. 1119 [W3C.REC-html401-19991224] 1120 Raggett, D., Jacobs, I., and A. Hors, "HTML 4.01 1121 Specification", World Wide Web Consortium 1122 Recommendation REC-html401-19991224, December 1999, 1123 . 1125 [W3C.REC-xhtml1-20020801] 1126 Pemberton, S., "XHTML[TM] 1.0 The Extensible HyperText 1127 Markup Language (Second Edition)", World Wide Web 1128 Consortium Recommendation REC-xhtml1-20020801, 1129 August 2002, 1130 . 1132 9.2. Informative References 1134 [ARK] Kunze, J. and R. Rodgers, "The ARK Identifier Scheme", 1135 . 1137 [I-D.bryan-metalink] 1138 Bryan, A., "The Metalink Download Description Format", 1139 draft-bryan-metalink-05 (work in progress), January 2009. 1141 [POWDER] Archer, P., Ed., Smith, K., Ed., and A. Perego, Ed., 1142 "POWDER: Protocol for Web Description Resources", 1143 . 1145 [URIQA] Nokia, "The URI Query Agent Model", 1146 . 1148 [XRD] Hammer-Lahav, E., Ed., "XRD 1.0 [[ replace with new XRD 1149 specification reference ]]". 1151 [XRDS] Wachob, G., Reed, D., Chasen, L., Tan, W., and S. 1152 Churchill, "Extensible Resource Identifier (XRI) 1153 Resolution V2.0", . 1156 [XRDS-Simple] 1157 Hammer-Lahav, E., "XRDS-Simple 1.0", 1158 . 1160 [Yadis] Miller, J., "Yadis Specification 1.0", 1161 . 1163 URIs 1165 [1] 1167 Author's Address 1169 Eran Hammer-Lahav 1170 Yahoo! 1172 Email: eran@hueniverse.com 1173 URI: http://hueniverse.com