idnits 2.17.1 draft-hammer-discovery-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 468 has weird spacing: '... query frag...' -- The document date (February 12, 2009) is 5552 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: '-' is mentioned on line 946, but not defined == Missing Reference: 'TM' is mentioned on line 1120, but not defined == Unused Reference: 'RFC2818' is defined on line 1101, but no explicit reference was found in the text == Outdated reference: A later version (-10) exists of draft-nottingham-http-link-header-03 == Outdated reference: A later version (-05) exists of draft-nottingham-site-meta-01 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 2818 (Obsoleted by RFC 9110) == Outdated reference: A later version (-28) exists of draft-bryan-metalink-05 Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Hammer-Lahav 3 Internet-Draft Yahoo! 4 Intended status: Informational February 12, 2009 5 Expires: August 16, 2009 7 Link-based Resource Descriptor Discovery 8 draft-hammer-discovery-02 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on August 16, 2009. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. 45 Abstract 47 This memo describes a process for obtaining information about a 48 resource identified by a URI. The 'information about a resource', a 49 resource descriptor, provides machine-readable information that aims 50 to increase interoperability and enhance the interaction with the 51 resource. This memo only defines the process for locating and 52 obtaining the descriptor, but leaves the descriptor format and its 53 interpretation out of scope. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4 59 3. The describedby Link Relation . . . . . . . . . . . . . . . . 4 60 4. Identifying Descriptor Location . . . . . . . . . . . . . . . 5 61 4.1. Method Selection . . . . . . . . . . . . . . . . . . . . . 5 62 4.2. The Element . . . . . . . . . . . . . . . . . . . . 6 63 4.3. The HTTP Link Header . . . . . . . . . . . . . . . . . . . 7 64 4.4. The Host Metadata Document . . . . . . . . . . . . . . . . 8 65 5. Obtaining Resource Descriptor . . . . . . . . . . . . . . . . 9 66 6. The Link-Pattern host-meta Field . . . . . . . . . . . . . . . 9 67 6.1. Template Syntax . . . . . . . . . . . . . . . . . . . . . 10 68 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 69 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 70 8.1. The Link-Pattern host-meta Field . . . . . . . . . . . . . 11 71 8.2. The describedby Relation Type . . . . . . . . . . . . . . 12 72 Appendix A. Descriptor Discovery vs. Service Discovery . . . . . 12 73 Appendix B. Methods Suitability Analysis . . . . . . . . . . . . 13 74 Appendix B.1. Requirements . . . . . . . . . . . . . . . . . . . . 13 75 Appendix B.2. Analysis . . . . . . . . . . . . . . . . . . . . . . 15 76 Appendix C. Acknowledgments . . . . . . . . . . . . . . . . . . 22 77 Appendix D. Document History . . . . . . . . . . . . . . . . . . 22 78 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 79 9.1. Normative References . . . . . . . . . . . . . . . . . . . 24 80 9.2. Informative References . . . . . . . . . . . . . . . . . . 25 81 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 25 83 1. Introduction 85 This memo defines a process for locating descriptors for resources 86 identified with URIs. Resource descriptors are documents (usually 87 based on well known serialization languages such as XML, RDF, and 88 JSON) which provide machine-readable information about resources 89 (resource metadata) for the purpose of promoting interoperability and 90 assist in interacting with unknown resources that support known 91 interfaces. 93 While many methods provide the ability to link a resource to its 94 metadata, none of these methods fully address the requirements of a 95 uniform and easily implementable process. These requirements include 96 the ability for resources to self-declare the location of their 97 descriptors, the ability to access descriptors directly without 98 interacting with the resource, and support a wide range of platforms 99 and scale of deployment. They must also be fully compliant with 100 existing web protocols, and support extensibility. These 101 requirements, and the analysis used as the basis for this memo are 102 explains in detail in Appendix B. 104 For example, a web page about an upcoming meeting can provide in its 105 descriptor document the location of the meeting organizer's free/busy 106 information to potentially negotiate a different time. A social 107 network profile page descriptor can identify the location of the 108 user's address book as well as accounts on other sites. A web 109 service implementing an API with optional components can advertise 110 which of these are supported. 112 This memo describes the first step in the discovery process in which 113 the resource descriptor document is located and retrieved. Other 114 steps, which are outside the scope of this memo, include parsing the 115 descriptor document based on its format (such as POWDER [POWDER], XRD 116 [XRD], and Metalink [I-D.bryan-metalink]) and utilizing it based on 117 the application. 119 Discovery can be performed before, after, or without obtaining a 120 representation of the resource. Performing discovery ahead of 121 accessing a representation allows the client not to reply on 122 assumptions about the properties of the resource. Performing 123 discovery after a representation has been obtained enables further 124 interaction with it. 126 Given the wide range of 'information about a resource', no single 127 descriptor format can adequately accommodate such scope. However, 128 there is great value in making the process locating the descriptor 129 uniform across formats. While HTTP is the most common protocol used 130 in association with discovery and is explicitly specified in this 131 memo, other protocols MAY be used. 133 Please discuss this draft on the www-talk@w3.org [1] mailing list. 135 2. Notational Conventions 137 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 138 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 139 document are to be interpreted as described in [RFC2119]. 141 This document uses the Augmented Backus-Naur Form (ABNF) notation of 142 [RFC2616]. Additionally, the following rules are included from 143 [RFC3986]: reserved and unreserved, and from 144 [I-D.nottingham-http-link-header]: link-param. 146 3. The describedby Link Relation 148 The methods described in this memo express the location of the 149 resource descriptor as a link relation, utilizing the link framework 150 defined by [I-D.nottingham-http-link-header]. The association of a 151 descriptor document with the resource it describes is declared using 152 the "describedby" link relation type. 154 The "describedby" link relation is defined in [POWDER] and registered 155 as: 157 The relationship A "describedby" B asserts that resource B 158 provides a description of resource A. There are no constraints on 159 the format or representation of either A or B, neither are there 160 any further constraints on either resource. 162 Since a single resource can have many descriptors, the "describedby" 163 link relation has a one-to-many structure (the question whether a 164 single descriptor can describe multiple resources is outside the 165 scope of this memo). In the case of multiple "describedby" links 166 obtained from a single method, selecting which link to use is 167 application-specific. 169 To promote interoperability, applications referencing this memo 170 SHOULD clearly define the application-specific criteria used to 171 select between "describedby" links. This MAY be done by: 173 o Supporting a single descriptor format, or defining an order of 174 precedence for multiple descriptor formats. Applications MAY 175 require the presence of the link "type" attribute with the mime- 176 type of the required format. 178 o Using the "describedby" relation type together with another 179 application-specific relation type in the same link. The 180 application-specific relation type can be registered or an 181 extension. 183 o Specifying additional link attributes using link-extensions. 185 Link selection MUST NOT depend on the order in which multiple links 186 are obtained from a single method. Applications MUST NOT impose 187 constraints on the usage of the "describedby" relation type as it is 188 likely to be used by other applications in association with the same 189 resource. 191 4. Identifying Descriptor Location 193 The descriptor location (URI) is a function of the resource URI. 194 This section defines three methods which together satisfy the 195 requirements defined in Appendix B. While each method on its own 196 satisfies the requirements partially, together they provide enough 197 flexibility for most use cases. Each of the following three methods 198 is performed by using the resource URI to identify its descriptor 199 URI. 201 In many cases, a request for one URI leads to requesting other URIs, 202 as is the case with HTTP redirections. Because the decision whether 203 to use such URIs is application-specific, discovery is constrained to 204 a single URI identifying the resource. Any other resource URIs 205 received MUST be considered as a separate and discrete input into the 206 discovery function. If a resource URI obtained during the 207 performance of these methods is found to be more relevant to the 208 application, the discovery process MUST be restarted with the new 209 resource URI as its input. 211 For example, an HTTP HEAD request for URI A returns a redirect (307) 212 response with a set of "describedby" links, and identifies the 213 temporary location of the representation at URI B. An HTTP HEAD 214 request for URI B returns a successful (200) response with its own 215 set of "describedby" links. An application MAY choose to define a 216 process in which the two sets of links are obtained, prioritized, and 217 utilized, however, it MUST do so by explicitly instructing the client 218 to perform discovery multiple times, as each is considered separate 219 and distinct discovery. 221 4.1. Method Selection 223 Each method presents a different set of requirements. The criteria 224 used to determine which methods a server SHOULD support and client 225 SHOULD attempt are based on a combination of factors: 227 o The ability to offer and obtain a representation of the resource 228 by dereferencing its URI. 230 o The availability of a representation supporting markup 231 compatible with [I-D.nottingham-http-link-header]. 233 o The availability of an HTTP representation of the resource and the 234 ability to provide and access link information in its response 235 header. 237 The methods are listed is based on the restrictiveness of their 238 requirements in descending order, from the most specialized to the 239 most generic. This ordering however, does not imply the order in 240 which multiple applicable methods should be attempted. Because 241 different methods are more appropriate in different circumstances, it 242 is up to each application to define how they should be used together. 244 To promote interoperability, applications referencing this memo MUST 245 clearly define the relationship between the three methods as either: 247 o equal, all methods MUST produce the same set of resource 248 descriptors and clients MAY attempt either method according to 249 their capabilities, or 251 o with an application-specific order of precedence, where methods 252 MUST be attempted in a specific order. 254 4.2. The Element 256 The element method is limited to resources with an available 257 markup representation that supports typed-relations using the 258 element, such as HTML [W3C.REC-html401-19991224], XHTML 259 [W3C.REC-xhtml1-20020801], and Atom [RFC4287]. Other markup formats 260 are permitted as long as the semantics of their elements are 261 fully compatible with the link framework defined in 262 [I-D.nottingham-http-link-header]. This method requires the 263 retrieval of a resource representation. While HTTP is the most 264 common transport for such documents, this method is transport 265 independent. 267 For example: 269 272 A client trying to obtain the location of the resource's descriptor 273 using this method SHALL: 275 1. Retrieve a representation of the resource using the applicable 276 transport for that resource URI. If the markup document is 277 obtained using HTTP, it MUST only be used by the client if the 278 document is a valid representation of the resource identified by 279 the HTTP request URI, typically in a response with a successful 280 (2xx) or redirection (3xx) status code. If no such valid 281 representation of the request URI is found, the method fails. 283 2. Parse the document as defined by its format specification and 284 look for elements with a "rel" attribute value containing 285 the "describedby" relation. The client MUST obey the document 286 markup schema and ignore any invalid elements (such as 287 elements outside the section of an HTML document). This 288 is done to avoid unintentional markup from other parts of the 289 document to be used for discovery purposes, which can have vast 290 impact on usability and security. 292 3. Narrow down the selection if more than one "describedby" link is 293 found, following the application-specific criteria. The 294 descriptor location is obtained from the value of the "href" 295 attribute in the selected element. 297 elements MAY include other relation types together with 298 "describedby" in a single "rel" attribute (for example 299 'rel="describedby copyright"'). Clients MUST be properly process use 300 such multiple relation "rel" attributes as defined by the format 301 specification. 303 4.3. The HTTP Link Header 305 The HTTP Link header method is limited to resources for which an HTTP 306 GET or HEAD request returns a 2xx, 3xx, or 4xx HTTP response 307 [RFC2616]. This method uses the Link header defined in 308 [I-D.nottingham-http-link-header] and requires the retrieval of a 309 resource representation header. 311 For example: 313 Link: ; rel="describedby"; 314 type="application/powder+xml" 316 A client trying to obtain the location of the resource's descriptor 317 using this method SHALL: 319 1. Make an HTTP (or HTTPS as required) GET or HEAD request to the 320 resource URI to obtain a valid response header. If the HTTP 321 response carries a status code other than successful (2xx), 322 redirection (3xx), or client error (4xx), the method fails. 324 2. Parse the HTTP response header and look for Link headers with a 325 "rel" parameter value containing the "describedby" relation. 327 3. Narrow down the selection if more than one "describedby" link is 328 found, following the application-specific criteria. The 329 descriptor location is obtained from the "<>" enclosed URI- 330 reference in the selected Link header. 332 Link headers MAY include other relation types together with 333 "describedby" in a single "rel" parameter (for example 334 'rel="describedby copyright"'). Clients MUST be properly process use 335 such multiple relation "rel" attributes as defined by 336 [I-D.nottingham-http-link-header]. 338 4.4. The Host Metadata Document 340 The host metadata document method is available for any resource 341 identified by a URI whose authority supports the host-meta document 342 defined in [I-D.nottingham-site-meta]. This method does not require 343 obtaining any representation of the resource, and operates solely 344 using the resource URI. 346 The link relation between the resource URI and the descriptor URI is 347 obtained by using a template contained in the host-meta document. By 348 applying the host-wide template to an individual resource URI, a 349 resource-specific link is produced which can be used to indicate the 350 location of the descriptor document for that resource, bypassing the 351 need to access or provide a representation for it. 353 For example (line breaks are for formatting only, and are not allowed 354 in the document): 356 Link-Pattern: <{uri};about"> rel="describedby" 357 type="application/powder+xml" 359 A client trying to obtain the location of the resource's descriptor 360 using this method SHALL: 362 1. Retrieve the host-meta document for URI's authority as defined by 363 [I-D.nottingham-site-meta] section 4. If the request fails to 364 retrieve a valid host-meta document, the method fails. 366 2. Parse host-meta document and look for Link-Pattern fields with a 367 "rel" attribute value containing the "describedby" relation. 369 3. Narrow down the selection if more than one "describedby" link is 370 found, following the application-specific criteria. The 371 descriptor location is constructed by applying the template 372 obtained from the selected Link-Pattern field to the resource URI 373 as described by Section 6.1. 375 Link-Pattern MAY include other relation types together with 376 "describedby" in a single "rel" parameter (for example 377 'rel="describedby copyright"'). Clients MUST be properly process use 378 such multiple relation "rel" attributes as defined by Section 6. 380 5. Obtaining Resource Descriptor 382 Once the desired descriptor URI has been obtained, the descriptor 383 document is retrieved. If the descriptor URI scheme is "http" or 384 "https", the document is obtained via an HTTP (or HTTPS as required) 385 GET request to the identified URI. The client MUST obey HTTP 386 redirections (3xx), and the descriptor document is considered valid 387 only if retrieved with a successful HTTP response status (2xx). 389 6. The Link-Pattern host-meta Field 391 The Link host-meta field [I-D.nottingham-site-meta] conveys a link 392 relation between all resource URIs under the host-meta authority and 393 a common target URI. However, there are cases in which relations of 394 different resources with the same authority do not share the same 395 target URI, but do follow a common pattern in how the target URI is 396 constructed. 398 For example, a news site with multiple authors can provide 399 information about each article's author, but appending a suffix (such 400 as ";by") to the URI of each article. Each article has a unique 401 author, but all share the same pattern of where that information is 402 located. The same information can be provided using an HTTP link 403 header or HTML element, but in a less efficient manner when a 404 single pattern can provide the same information: 406 Link-Pattern: <{uri};by> rel="author" 408 The Link-Pattern host-meta field uses a slightly modified syntax of 409 the HTTP Link header [I-D.nottingham-http-link-header] to convey 410 relations whose context is individual resources with the same 411 authority as the host-meta document, and whose target is constructed 412 by applying a template to the context URI. The field is not specific 413 to any relation type and MAY be used to express any relations 414 supported by the Link header [I-D.nottingham-http-link-header]. 416 The Link-Pattern host-meta field differs from the HTTP Link header in 417 the following respects: 419 o The "<>" enclosed token is not a valid URI, but instead contains a 420 template as defined in Section 6.1. 422 o Its context URI is defined as the individual resource URI used as 423 input to the template. 425 o If the resulting target URI expressed by the template is relative, 426 its base URI is the root resource of the authority. 428 Link-Pattern = "Link-Pattern" ":" #pattern-value 430 pattern-value = "<" template ">" *( ";" link-param ) 432 template = *( uri-char | "{" [ "%" ] var-name "}" ) 434 uri-char = ( reserved | unreserved ) 436 var-name = "scheme" | "authority" | "path" 437 | "query" | "fragment" | "userinfo" 438 | "host" | "port" | "uri" 440 [[ should this spec define a filter/map parameter that will allow 441 applying link patterns to subsets of the host-meta scope? This can 442 use a regular expression match or something similar to robots.txt. 443 If the spec will end up not directly supporting this feature, I will 444 add a note suggesting that such a feature could be defined elsewhere 445 as an extension. ]] 447 6.1. Template Syntax 449 The template syntax provides a simple format for URI transformation. 450 A template is a string containing brace-enclosed ("{}") variable 451 names marking the parts of the string that are to be substituted by 452 the variable values. A template is transformed into a URI by 453 substituting the variables with their calculated value. If a 454 variable name is prefixed by "%", any character in the variable value 455 other than unreserved MUST be percent-encoded per [RFC3986]. 457 To construct a URI using a template, the input URI is parsed into its 458 URI components and each component value assigned to a variable name. 459 The template variable substitution is based on the URI vocabulary 460 defined by [RFC3986] section 3 and includes: "scheme", "authority", 461 "path", "query", "fragment", "userinfo", "host", and "port". In 462 addition, it defines the "uri" variable as the entire input URI 463 excluding the fragment component and the "#" fragment separator. 465 foo://william@example.com:8080/over/there?name=ferret#nose 466 \_/ \______________________/\_________/ \_________/ \__/ 467 | | | | | 468 scheme authority path query fragment 470 foo://william@example.com:8080/over/there?name=ferret#nose 471 \_____/ \_________/ \__/ 472 | | | 473 userinfo host port 475 foo://william@example.com:8080/over/there?name=ferret#nose 476 \___________________________________________________/ 477 | 478 uri 480 For example, given the input URI "http://example.com/r/1?f=xml#top", 481 each of the following templates will produce the associated output 482 URI: 484 http://example.org?q={%uri} --> 485 http://example.org?q=http%3A%2F%2Fexample.com%2Fr%2F1%3Ff%3Dxml 487 http://meta.{host}:8080{path}?{query} --> 488 http://meta.example.com:8080/r/1?f=xml 490 https://{authority}/v1{path}#{fragment} --> 491 https://example.com/v1/r/1#top 493 7. Security Considerations 495 The methods used to perform discovery are not secure, private or 496 integrity-guaranteed, and due caution should be exercised when using 497 them. Applications that perform discovery should consider the attack 498 vectors opened by automatically following, trusting, or otherwise 499 using links gathered from elements, HTTP Link headers, or 500 host-meta documents. 502 8. IANA Considerations 504 8.1. The Link-Pattern host-meta Field 506 This specification registers the Link-Pattern host-meta field in the 507 host-meta Field Registry [I-D.nottingham-site-meta]. 509 Field Name: Link-Pattern 511 Change controller: IETF 513 Specification document(s): [[ this document ]] 515 Related information: [I-D.nottingham-http-link-header] 517 8.2. The describedby Relation Type 519 [[ this section will be removed if the "describedby" relation type is 520 registered by the time it is published ]] 522 This specification registers the "describedby" relation type in the 523 Link Relation Type Registry [I-D.nottingham-http-link-header]. 525 o Relation Name: describedby 527 o Description: The relationship A "describedby" B asserts that 528 resource B provides a description of resource A. There are no 529 constraints on the format or representation of either A or B, 530 neither are there any further constraints on either resource. 532 o Documentation: [POWDER] 534 Appendix A. Descriptor Discovery vs. Service Discovery 536 Descriptor discovery provides a process for obtaining information 537 about a resource identified with a URI. It allows servers to 538 describe their resources in a machine-readable format, enabling 539 automatic interoperability by user-agents and resource consuming 540 applications. Discovery enables applications to utilize a wide range 541 of web services and resources across multiple providers without the 542 need to know about their capabilities in advance, reducing the need 543 for manual configuration and resource-specific software. 545 When discussing discovery, it is important to differentiate between 546 descriptor discovery and service discovery. Both types attempts to 547 associate capabilities with resources, but they approach it from 548 opposite ends. 550 Service discovery centers on identifying the location of qualified 551 resources, typically finding an endpoint capable of certain protocols 552 and capabilities. In contrast, descriptor discovery begins with a 553 resource, trying to find which capabilities it supports. 555 A simple way to distinguish between the two types of discovery is to 556 define the questions they are each trying to answer: 558 Descriptor-Discovery: Given a resource, what are its attributes: 559 capabilities, characteristics, and relationships to other 560 resources? 562 Service-Discovery: Given a set of attributes, which available 563 resources match the desired set and what is their location? 565 While this memo deals exclusively with descriptor discovery, it is 566 important to note that the two discovery types are closely related 567 and are usually used in tandem. In fact, a typical use case will 568 switch between service discovery and descriptor discovery multiple 569 times in a single workflow, and can start with either one. 571 One reason for this dependency between the two discovery types is 572 that resource descriptors usually contain not only a list of 573 capabilities, but also relationships to other resources. Since those 574 relationships are usually typed, the process in which an application 575 chooses which links to use is in fact service discovery. 577 Applications use descriptor discovery to obtain the list of links, 578 and service discovery to choose the relevant links. In another 579 common example, the application uses service discovery to find a 580 resource with a given capability, then uses descriptor discovery to 581 find out what other capabilities it supports. 583 Appendix B. Methods Suitability Analysis 585 Due to the wide range of use cases requiring resource descriptors, 586 and the desire to reuse as much as possible, no single solution has 587 been found to sufficiently cover the requirements for linking between 588 the resource URI and the descriptor URI. The following analysis 589 attempts to list all the method proposed for addressing descriptor 590 discovery. It is included here to provide background information as 591 to why certain methods have been selected while others rejected from 592 the discovery process. It has been updated to match the terms used 593 in this memo and its structure. 595 Appendix B.1. Requirements 597 Getting from a resource URI to its descriptor document can be 598 implemented in many ways. The problem is that none of the current 599 methods address all of the requirements presented by the common use 600 cases. The requirements are simple, but the more we try to address, 601 the less elegant and accessible the process becomes. While working 602 on the now defunct XRDS-Simple specification [XRDS-Simple] and 603 talking to companies and individual about it, the following 604 requirements emerged for any proposed process: 606 Self Declaration: 608 Allow resources to declare the availability of descriptor 609 information and its location. When a resource is accessed, it 610 needs to have a way to communicate to the client that it 611 supports the discovery protocol and to indicates the location 612 of such descriptor. 614 This is useful when the client is able or is already 615 interacting with the resource but can enhance its interaction 616 with additional information. For example, accessing a blog 617 page enhanced if it was generated from an Atom feed or Atom 618 entry and that feed supports Atom authoring. 620 Direct Descriptor Access: 622 Enable direct retrieval of the resource descriptor without 623 interacting with the resource itself. Before a resource is 624 accessed, the client should have a way to obtain the resource 625 descriptor without accessing the resource. This is important 626 for two reasons. 628 First, accessing an unknown resource may have undesirable 629 consequences. After all, the information contained in the 630 descriptor is supposed to inform the client how to interact 631 with the resource. The second is efficiency - removing the 632 need to first obtain the resource in order to get its 633 descriptor (reducing HTTP round-trips, network bandwidth, and 634 application latency). 636 Web Architecture Compliant: 638 Work with well-established web infrastructure. This may sound 639 obvious but it is in fact the most complex requirement. 640 Deploying new extensions to the HTTP protocol is a complicated 641 endeavor. Beside getting applications to support a new header, 642 method, or content negotiation, existing caches and proxies 643 must be enhanced to properly handle these requests, and they 644 must not fail performing their normal duties without such 645 enhancements. 647 For example, a new content negotiation method may cause an 648 existing cache to serve the wrong data to a non-discovery 649 client due to its inability to distinguish the metadata request 650 from the resource representation request. 652 Scale and Technology Agnostic: 654 Support large and small web providers regardless of the size of 655 operations and deployment. Any solution must work for a small 656 hosted web site as well as the world largest search engine. It 657 must be flexible enough to allow developers with restricted 658 access to the full HTTP protocol (such as limited access to 659 request or response headers) to be able to both provide and 660 consume resource descriptors. Any solution should also support 661 caching as much as possible and allow reuse of source code and 662 data. 664 Extensible: 666 Accommodate future enhancements and unknown descriptor formats. 667 It should support the existing set of descriptor formats such 668 as XRD and POWDER, as well as new descriptor relationships that 669 might emerge in the future. In addition, the solution should 670 not depend on the descriptor format itself and work equally 671 well with any document format - it should aim to keep the road 672 and destination separate. 674 Appendix B.2. Analysis 676 The following is a list of proposed and implemented methods trying to 677 address descriptor discovery. Each method is reviewed for its 678 compliance with the requirements identified previously. The [-], 679 [+], or [+-] symbols next to each requirement indicate how well the 680 method complies with the requirement. 682 Appendix B.2.1. HTTP Response Header 684 When a resource representation is retrieved using and HTTP GET 685 request, the server includes in the response a header pointing to the 686 location of the descriptor document. For example, POWDER uses the 687 "Link" response header to create an association between the resource 688 and its descriptor. XRDS [XRDS] (based on the Yadis protocol 689 [Yadis]) uses a similar approach, but since the Link header was not 690 available when Yadis was first drafted, it defines a custom header 691 X-XRDS-Location which serves a similar but less generic purpose. 693 [+] Self Declaration - using the Link header, any resource can point 694 to its descriptor documents. 696 [-] Direct Descriptor Access - the header is only accessible when 697 requesting the resource itself via an HTTP GET request. While 698 HTTP GET is meant to be a safe operation, it is still possible for 699 some resource to have side-effects. 701 [+] Web Architecture Compliant - uses the Link header which is an 702 IETF Internet Standard [[ currently a standard-track draft ]], and 703 is consistent with HTTP protocol design. 705 [-] Scale and Technology Agnostic - since discovery accounts for a 706 small percent of resource requests, the extra Link header is 707 wasteful. For some hosted servers, access to HTTP headers is 708 limited and will prevent implementation. 710 [+] Extensible - the Link header provides built-in extensibility by 711 allowing new link relations, mime-types, and other extensions. 713 Minimum roundtrips to retrieve the resource descriptor: 2 715 Appendix B.2.2. HTTP Response Header Via HEAD 717 Same as the HTTP Response Header method but used with an HTTP HEAD 718 request. The idea of using the HEAD method is to solve the wasteful 719 overhead of including the Link header in every reply. By limiting 720 the appearance of the Link header only to HEAD responses, typical GET 721 requests are not encumbered by the extra bytes. 723 [+] Self Declaration - Same as the HTTP Response Header method. 725 [-] Direct Descriptor Access - Same as the HTTP Response Header 726 method. 728 [-] Web Architecture Compliant - HTTP HEAD should return the exact 729 same response as HTTP GET with the sole exception that the 730 response body is omitted. By adding headers only to the HEAD 731 response, this solution violates the HTTP protocol and might not 732 work properly with proxies as they can return the header of the 733 cached GET request. 735 [+] Scale and Technology Agnostic - solves the wasted bandwidth 736 associated with the HTTP Response Header method, but still suffers 737 from the limitation imposed by requiring access to HTTP headers. 739 [+] Extensible - Same as the HTTP Response Header method. 741 Minimum roundtrips to retrieve the resource descriptor: 2 743 Appendix B.2.3. HTTP Content Negotiation 745 Using the HTTP Accept request header or Transparent Content 746 Negotiation as defined in [RFC2295], the client informs the server it 747 is interested in the descriptor and not the resource itself, to which 748 the server responds with the descriptor document or its location. In 749 Yadis, the client sends an HTTP GET (or HEAD) request to the resource 750 URI with an Accept header and content-type application/xrds+xml. 751 This informs the server of the client's discovery interest, which in 752 turn may reply with the descriptor document itself, redirect to it, 753 or return its location via the X-XRDS-Location response header. 755 [-] Self Declaration - does not address as it focuses on the client 756 declaring its intentions. 758 [+] Direct Descriptor Access - provides a simple method for directly 759 requesting the descriptor document. 761 [-] Web Architecture Compliant - while it can be argued that the 762 descriptor can be considered another representation of the 763 resource, it is very much external to it. Using the Accept header 764 to request a separate resource (as opposed to a different 765 representation of the same resource) violates web architecture. 766 It also prevents using the discovery content-type as a valid 767 (self-standing) web resource having its own descriptor. 769 [-] Scale and Technology Agnostic - requires access to HTTP request 770 and response headers, as well as the registration of multiple 771 handlers for the same resource URI based on the Accept header. In 772 addition, improper use or implementation of the Vary header in 773 conjunction with the Accept header will cause caches to serve the 774 descriptor document instead of the resource itself - a great 775 concern to large providers with frequently visited front-pages. 777 [-] Extensible - applies an implicit relation type to the descriptor 778 mime-type, limiting descriptor formats to a single purpose. It 779 also prevents using existing mime-types from being used as a 780 descriptor format. 782 Minimum roundtrips to retrieve the resource descriptor: 1 784 Appendix B.2.4. HTTP Header Negotiation 786 Similar to the HTTP Content Negotiation method, this solution uses a 787 custom HTTP request header to inform the server of the client's 788 discovery intentions. The server responds by serving the same 789 resource representation (via an HTTP GET or HEAD requests) with the 790 relevant Link headers. It attempts to solve the HTTP Response Header 791 waste issue by allowing the client to explicitly request the 792 inclusion of Link headers. One such header can be called "Request- 793 links" to inform the server the client would like it to include 794 certain Link headers of a given "rel" type in its reply. 796 [+] Self Declaration - same as HTTP Response Header with the option 797 of selective inclusion. 799 [-] Direct Descriptor Access - does not address. 801 [-] Web Architecture Compliant - HTTP does not include any mechanism 802 for header negotiation and any custom solution will break existing 803 caches. 805 [+-] Scale and Technology Agnostic - Requires advance access to HTTP 806 headers on both the client and server sides, but solves the 807 bandwidth waste issue of the HTTP Response Header method. 809 [+] Extensible - builds on top of Link header extensibility. 811 Minimum roundtrips to retrieve the resource descriptor: 2 813 Appendix B.2.5. Element 815 Embeds the location of the descriptor document within the resource 816 representation by leveraging the HTML header element (as 817 opposed to the HTTP header). Applies to HTML resource 818 representations or similar markup-based formats with support for 819 "Link"-like elements such as Atom. POWDER uses the element in 820 this manner, while XRDS uses the HTML element with an "http- 821 equiv" attribute equals to X-XRDS-Location (to create an embedded 822 version of the X-XRDS-Location custom header). 824 [+] Self Declaration - similar to HTTP Response Header method but 825 limited to HTML resources. 827 [-] Direct Descriptor Access - the method requires fetching the 828 entire resource representation in order to obtain the descriptor 829 location. In addition, it requires changing the resource HTML 830 representation which makes discovery an intrusive process. 832 [+] Web Architecture Compliant - uses the element as 833 designed. 835 [+] Scale and Technology Agnostic - while this solution requires 836 direct retrieval of the resource and manipulation of its content, 837 it is extremely accessible in many platforms. 839 [-] Extensible - extensibility is restricted to HTML representations 840 or similar markup formats with support for a similar element. 842 Minimum roundtrips to retrieve the resource descriptor: 2 844 Appendix B.2.6. HTTP OPTIONS Method 846 The HTTP OPTIONS method is used to interact with the HTTP server with 847 regard to its capabilities and communication-related information 848 about its resources. The OPTIONS method, together with an optional 849 request header, can be used to request both the descriptor location 850 and descriptor content itself. 852 [-] Self Declaration - does not address. 854 [+] Direct Descriptor Access - provides a clean mechanism for 855 requesting descriptor information about a resource without 856 interacting with it. 858 [+] Web Architecture Compliant - uses an existing HTTP featured. 860 [-] Scale and Technology Agnostic - requires client and server 861 access to the OPTIONS HTTP method. Also does not support caching 862 which makes this solution inefficient. 864 [+] Extensible - built-into the OPTIONS method. 866 Minimum roundtrips to retrieve the resource descriptor: 1 868 Appendix B.2.7. WebDAV PROPFIND Method 870 Similar to the HTTP OPTIONS method, the WebDAV PROPFIND method 871 defined in [RFC4918] can be used to request resource specific 872 properties, one of which can hold the location of the descriptor 873 document. PROPFIND, unlike OPTIONS, cannot return the descriptor 874 itself, unless it is returned in the required PROPFIND schema (a 875 multi-status XML element). Other alternatives include URIQA [URIQA], 876 an HTTP extension which defines a method called MGET, and ARK 877 (Archival Resource Key) [ARK] - a method similar to PROPFIND that 878 allows the retrieval of resource attributes using keys (which 879 describe the resource). 881 [-] Self Declaration - does not address. 883 [+-] Direct Descriptor Access - does not require interaction with 884 the resource, but does require at least two requests to get the 885 descriptor (get location, get document). 887 [+] Web Architecture Compliant - uses an HTTP extension with less 888 support than core HTTP, but still based on published standards. 890 [-] Scale and Technology Agnostic - same as the HTTP OPTIONS Method. 892 [+-] Extensible - uses extensible protocols but at the same time 893 depends on solutions that have already gone beyond the standard 894 HTTP protocol, which makes further extensions more complex and 895 unsupported. 897 Minimum roundtrips to retrieve the resource descriptor: 2 899 Appendix B.2.8. Custom HTTP Method 901 Similar to the HTTP OPTIONS Method, a new method can be defined (such 902 as DISCOVER) to return (or redirect to) the descriptor document. The 903 new method can allow caching. 905 [-] Self Declaration - does not address. 907 [+] Direct Descriptor Access - same as the HTTP OPTIONS Method. 909 [-] Web Architecture Compliant - depends heavily on extending every 910 platform to support the extension. Unlikely to be supported by 911 existing proxy services and caches. 913 [-] Scale and Technology Agnostic - same as HTTP OPTIONS Method with 914 the additional burden on smaller sites requiring access to the new 915 protocol. 917 [+] Extensible - new protocol that can extend as needed. 919 Minimum roundtrips to retrieve the resource descriptor: 1 921 Appendix B.2.9. Static Resource URI Transformation 923 Instead of using HTTP facilities to access the descriptor location, 924 this method defines a template to transform any resource URI to the 925 descriptor document URI. This can be done by adding a prefix or 926 suffix to the resource URI, which turns it into a new resource URI. 927 The new URI points to the descriptor document. For example, to fetch 928 the descriptor document for http://example.com/resource, the client 929 makes an HTTP GET request to http://example.com/resource;about using 930 a static template that adds the ";about" suffix. 932 [-] Self Declaration - does not address. 934 [+] Direct Descriptor Access - creates a unique URI for the 935 descriptor document. 937 [+-] Web Architecture Compliant - uses basic HTTP facilities but 938 intrudes on the domain authority namespace as it defines a static 939 template for URI transformation that is not likely to be 940 compatible with many existing URI naming conventions. 942 [+-] Scale and Technology Agnostic - depending on the static mapping 943 chosen. Some hosted environment will have a problem gaining 944 access to the mapped URI based on the URI format chosen. 946 [-] Extensible - provides a very specific and limited method to map 947 between resources and their descriptor, since each relation type 948 must mint its own static template. 950 Minimum roundtrips to retrieve the resource descriptor: 1 952 Appendix B.2.10. Dynamic Resource URI Transformation 954 Same as the Static Resource URI Transformation method but with the 955 ability for each domain authority to specify its own discovery 956 transformation template. This can done by placing a configuration 957 file at a known location (such as robots.txt) which contains the 958 template needed to perform the URL mapping. The client first obtains 959 the configuration document (which may be cached using normal HTTP 960 facilities), parses it, then uses that information to transform the 961 resource URI and access the descriptor document. 963 [+-] Self Declaration - does not address individual resources, but 964 allows entire domains to declare their support (and how to use 965 it). 967 [+-] Direct Descriptor Access - once the mapping template has been 968 obtained, descriptors can be accessed directly. 970 [+-] Web Architecture Compliant - uses an existing known-location 971 design pattern (such as robots.txt) and standard HTTP facilities. 972 The use of a known-location if not ideal and is considered a 973 violation of web architecture but if it serves as the last of its 974 kind, can be tolerated. An alternative to the known-location 975 approach can be using DNS to store either the location of the 976 mapping or the map template itself, but DNS adds a layer of 977 complexity not always available. 979 [+-] Scale and Technology Agnostic - works well at the URI authority 980 level (domain) but is inefficient at the URI path level (resource 981 path) and harder to implement when different paths within the same 982 domain need to use different templates. With the decreasing cost 983 of custom domains and sub-domains hosting, this will not be an 984 issue for most services, but it does require sharing configuration 985 at the domain/sub-domain level. 987 [+-] Extensible - can be, depending on the schema used to format the 988 known-location configuration document. 990 Minimum roundtrips to retrieve the resource descriptor: initially 2, 991 1 after caching 993 Appendix C. Acknowledgments 995 With the exception of the host-meta template extension, very little 996 of this memo is original work. Many communities and individuals have 997 been working on solving discovery for many years and this work is a 998 direct result of their hard and dedicated efforts. 1000 Inspiration for this memo derived from previous work on a descriptor 1001 format called XRDS-Simple, which in turn derived from another 1002 descriptor format, XRDS. Previous discovery workflows include Yadis 1003 which is currently used by the OpenID community. While suffering 1004 from significant shortcomings, Yadis was a breakthrough approach to 1005 performing discovery using extremely restricted hosting environments, 1006 and this memo has strived to preserve as much of that spirit as 1007 possible. 1009 The use of Link elements and headers and the introduction of the 1010 "describedby" relation type in this memo is a direct result of the 1011 dedicated work and contribution of Phil Archer to the W3C POWDER 1012 specification and Jonathan Rees to the W3C review of Uniform Access 1013 to Information About. The host-meta approach was first proposed by 1014 Mark Nottingham as an alternative to attaching links directly to 1015 resource representations. 1017 The author wishes to thanks the OASIS XRI community for their 1018 support, encouragement, and enthusiasm for this work. Special thanks 1019 go to Lisa Dusseault, Joseph Holsten, Mark Nottingham, John Panzer, 1020 Drummond Reed, and Jonathan Rees for their invaluable feedback. 1022 The author takes all responsibility for errors and omissions. 1024 Appendix D. Document History 1026 [[ to be removed by the RFC editor before publication as an RFC ]] 1028 -02 1029 o Changed focus from an HTTP-based process to Link-based process. 1031 o Completely revised and restructured document for better clarity. 1033 o Realigned the methods to produce consistent results and changed 1034 the way redirections and client-errors are handled. 1036 o Updated to use newer version of site-meta, now called host-meta, 1037 including a new plaintext-based format to replace the previous XML 1038 format. 1040 o Renamed Link-Template to Link-Pattern to avoid future conflict 1041 with a previously proposed Link-Template HTTP header. 1043 o Removed support for the "scheme" Link-Template parameter. 1045 o Replaced restrictions with interoperability recommendations. 1047 o Added IANA considerations per new host-meta registry requirements. 1049 -01 1051 o Rename 'resource discovery' to 'descriptor discovery'. 1053 o Added informative reference to Metalink. 1055 o Clarified that the resource descriptor URI can use any URI scheme, 1056 not just "http" or "https". 1058 o Removed comment regarding redirects when using Elements. 1060 o Clarified that HTTPS must be used with "https" URIs for both Link 1061 headers and host-meta retrieval. 1063 o Removed DNS verification step for host-meta with schemes other 1064 then "http" and "https". Replaced with a general discussion of 1065 authority and a security consideration comment. 1067 o Organized host-meta section into another sub-section level. 1069 o Enlarged the template vocabulary from a single "uri" variable to 1070 include smaller URI components. 1072 o Added informative reference to RFC 2295 in analysis appendix. 1074 -00 1075 o Initial draft. 1077 9. References 1079 9.1. Normative References 1081 [I-D.nottingham-http-link-header] 1082 Nottingham, M., "Link Relations and HTTP Header Linking", 1083 draft-nottingham-http-link-header-03 (work in progress), 1084 November 2008. 1086 [I-D.nottingham-site-meta] 1087 Nottingham, M. and E. Hammer-Lahav, "Host Metadata for the 1088 Web", draft-nottingham-site-meta-01 (work in progress), 1089 February 2009. 1091 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1092 Requirement Levels", BCP 14, RFC 2119, March 1997. 1094 [RFC2295] Holtman, K. and A. Mutz, "Transparent Content Negotiation 1095 in HTTP", RFC 2295, March 1998. 1097 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1098 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1099 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1101 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. 1103 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1104 Resource Identifier (URI): Generic Syntax", STD 66, 1105 RFC 3986, January 2005. 1107 [RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom 1108 Syndication Format", RFC 4287, December 2005. 1110 [RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed 1111 Authoring and Versioning (WebDAV)", RFC 4918, June 2007. 1113 [W3C.REC-html401-19991224] 1114 Raggett, D., Jacobs, I., and A. Hors, "HTML 4.01 1115 Specification", World Wide Web Consortium 1116 Recommendation REC-html401-19991224, December 1999, 1117 . 1119 [W3C.REC-xhtml1-20020801] 1120 Pemberton, S., "XHTML[TM] 1.0 The Extensible HyperText 1121 Markup Language (Second Edition)", World Wide Web 1122 Consortium Recommendation REC-xhtml1-20020801, 1123 August 2002, 1124 . 1126 9.2. Informative References 1128 [ARK] Kunze, J. and R. Rodgers, "The ARK Identifier Scheme", 1129 . 1131 [I-D.bryan-metalink] 1132 Bryan, A., "The Metalink Download Description Format", 1133 draft-bryan-metalink-05 (work in progress), January 2009. 1135 [POWDER] Archer, P., Ed., Smith, K., Ed., and A. Perego, Ed., 1136 "POWDER: Protocol for Web Description Resources", 1137 . 1139 [URIQA] Nokia, "The URI Query Agent Model", 1140 . 1142 [XRD] Hammer-Lahav, E., Ed., "XRD 1.0 [[ replace with new XRD 1143 specification reference ]]". 1145 [XRDS] Wachob, G., Reed, D., Chasen, L., Tan, W., and S. 1146 Churchill, "Extensible Resource Identifier (XRI) 1147 Resolution V2.0", . 1150 [XRDS-Simple] 1151 Hammer-Lahav, E., "XRDS-Simple 1.0", 1152 . 1154 [Yadis] Miller, J., "Yadis Specification 1.0", 1155 . 1157 URIs 1159 [1] 1161 Author's Address 1163 Eran Hammer-Lahav 1164 Yahoo! 1166 Email: eran@hueniverse.com 1167 URI: http://hueniverse.com