idnits 2.17.1 draft-nottingham-site-meta-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 10, 2009) is 5553 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-10) exists of draft-nottingham-http-link-header-03 ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Nottingham 3 Internet-Draft E. Hammer-Lahav 4 Intended status: Informational February 10, 2009 5 Expires: August 14, 2009 7 Host Metadata for the Web 8 draft-nottingham-site-meta-01 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on August 14, 2009. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. 45 Abstract 47 This memo describes a method for locating host-specific metadata for 48 the Web. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 3 54 3. The host-meta File Format . . . . . . . . . . . . . . . . . . 4 55 3.1. The Link host-meta Field . . . . . . . . . . . . . . . . . 5 56 4. Discovering host-meta Files . . . . . . . . . . . . . . . . . 5 57 5. Minting New meta-fields . . . . . . . . . . . . . . . . . . . 6 58 6. Security Considerations . . . . . . . . . . . . . . . . . . . 6 59 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 60 7.1. application/host-meta Media Type Registration . . . . . . 6 61 7.2. The host-meta Field Registry . . . . . . . . . . . . . . . 7 62 7.2.1. Registration Template . . . . . . . . . . . . . . . . 8 63 7.2.2. The Link host-meta field . . . . . . . . . . . . . . . 8 64 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 65 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 66 8.2. Informative References . . . . . . . . . . . . . . . . . . 9 67 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 9 68 Appendix B. Frequently Asked Questions . . . . . . . . . . . . . 10 69 B.1. Is this mechanism appropriate for all kinds of 70 metadata? . . . . . . . . . . . . . . . . . . . . . . . . 10 71 B.2. Why not use OPTIONS * with content negotiation to 72 discover different types of metadata directly? . . . . . . 10 73 B.3. Why not use a META tag or microformat in the root 74 resource? . . . . . . . . . . . . . . . . . . . . . . . . 10 75 B.4. Why not use response headers on the root resource, and 76 have clients use HEAD? . . . . . . . . . . . . . . . . . . 10 77 B.5. Why scope metadata to an authority? . . . . . . . . . . . 10 78 B.6. Why /host-meta? . . . . . . . . . . . . . . . . . . . . . 11 79 B.7. Aren't you concerned about pre-empting an authority's 80 URI namespace? . . . . . . . . . . . . . . . . . . . . . . 11 81 B.8. Why use link relations instead of media types to 82 identify kinds of metadata? . . . . . . . . . . . . . . . 11 83 B.9. What impact does this have on existing mechanisms, 84 such as P3P and robots.txt? . . . . . . . . . . . . . . . 11 85 B.10. Why not (insert existing similar mechanism here)? . . . . 11 86 Appendix C. Document History . . . . . . . . . . . . . . . . . . 11 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 89 1. Introduction 91 It is increasingly common for Web-based protocols to require the 92 discovery of policy or metadata before making a request. For 93 example, the Robots Exclusion Protocol specifies a way for automated 94 processes to obtain permission to access resources; likewise, the 95 Platform for Privacy Preferences [W3C.REC-P3P-20020416] tells user- 96 agents how to discover privacy policy beforehand. 98 While there are several ways to access per-resource metadata (e.g., 99 HTTP headers, WebDAV's PROPFIND [RFC4918]), the overhead associated 100 with them often precludes their use in these scenarios. 102 When this happens, it is common to designate a "well-known location" 103 for such metadata, so that it can be easily located. However, this 104 approach has the drawback of risking collisions, both with other such 105 designated "well-known locations" and with pre-existing resources. 107 To address this, this memo proposes a single (and hopefully last) 108 "well-known location", /host-meta, which acts as a directory to the 109 interesting metadata about a particular authority. Future mechanisms 110 that require authority-wide metadata can easily include an entry in 111 the host-meta resource, thereby making their metadata cheaply 112 available (indeed, because it can be cached, the more mechanisms that 113 use it, the more efficient it becomes) without impinging on others' 114 URI space. 116 Note that the metadata provided by a host-meta resource is explicitly 117 scoped to apply to the entire authority (in the URI [RFC3986] sense) 118 associated with it (using the process described in Section 4); it 119 does not apply to a subset, nor does it apply to other authorities 120 (e.g., using another port, or a different hostname in the same 121 domain). However, individual mechanisms (e.g., a relation type in 122 the Link field) MAY reduce or expand this scope. This should only be 123 done after careful consideration of the consequences upon security, 124 administration, interoperability and network load. 126 Please discuss this draft on the www-talk@w3.org [1] mailing list. 128 2. Notational Conventions 130 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 131 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 132 document are to be interpreted as described in RFC 2119 [RFC2119]. 134 This documnet uses the Augmented Backus-Naur Form (ABNF) notation of 135 [RFC5234], and explicitly includes the following rules from it: CRLF 136 (CR LF), OCTET (any 8-bit sequence of data), DIGIT, ALPHA, and WSP 137 (white space). 139 3. The host-meta File Format 141 The host-meta file format is an extremely simple textual language 142 that allows an authority to convey metadata about itself and its 143 resources. 145 Its syntax is similar to that of HTTP header-fields [RFC2616], but 146 has a few differences: 148 o White space is permissible both before and after the block of 149 fields, and 150 o fields MUST NOT be folded across multiple lines. 152 Furthermore, this format's use diverges from HTTP header-fields in a 153 number of ways: 155 o The fields are transferred as the message body, not as headers, 156 and 157 o rather than being related to a message, the fields in host-meta 158 pertain to the entire associated authority (see Section 4), and 159 o the permissible field-names are constrained by the host-meta field 160 registry. This specification defines one such field, Link. 162 host-meta = *( WSP / CRLF ) 163 *( meta-field CRLF ) 164 *( WSP / CRLF ) 165 meta-field = field-name ":" [ field-value ] 166 field-name = 1*tchar 167 field-value = *( field-content / WSP ) 168 field-content = 169 tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" 170 / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" 171 / DIGIT / ALPHA 173 For example, 175 Link: ; rel="robots" 176 Link: ; rel="privacy"; type="application/p3p.xml" 177 Link: ; rel="http://example.com/rel" 179 As with HTTP headers, field-names are not case-sensitive, 180 unrecognised field-names SHOULD be silently ignored when parsing this 181 format, and ordering of fields SHOULD NOT be considered significant 182 unless specified otherwise. Additionally, although the syntax does 183 not explicitly allow empty lines between fields, parsers SHOULD 184 silently discard them (i.e., be permissive in what they accept). 186 Field content is constrained by the specification indicated by its 187 associated field-name. 189 3.1. The Link host-meta Field 191 The "Link" host-meta field uses the syntax of the Link HTTP header- 192 field [I-D.nottingham-http-link-header] to convey links whose context 193 is the entire authority, rather than a single resource. For example, 195 Link: ; rel="license" 197 indicates that the URI "/terms" refers to a license for all resources 198 associated with the authority. 200 The Link host-meta field differs from the Link header in the 201 following respects: 203 o Its context is defined as all resources that share its authority, 204 by default (although this MAY be overridden by a representation 205 obtained from the indicated resource), and 206 o When the link URI is relative, its base URI is the root resource 207 of the authority. For example, in the example above, if the 208 authority is "example.com", the full link URI would be 209 "http://example.com/me". 211 4. Discovering host-meta Files 213 The metadata for a given authority can be discovered by dereferencing 214 the path /host-meta on the same authority. For example, for an HTTP 215 URI [RFC2616], the following request would obtain metadata for the 216 authority "www.example.com:80"; 218 GET /host-meta HTTP/1.1 219 Host: www.example.com 221 The semantics of the protocol used for access to the resource apply. 222 Therefore, if the resource indicates the client should try a 223 different request (in HTTP, the 301, 302, 303 or 307 response status 224 code), the client SHOULD attempt to do so; note that this implies 225 that the host-meta file for one authority MAY be retrieved from a 226 different authority. Likewise, if the resource is not available or 227 existent (in HTTP, the 404 or 410 status code), the client SHOULD 228 infer that metadata is not available via this mechanism. 230 If a representation is successfully obtained, but is not in the 231 format described above, clients SHOULD infer that the authority is 232 using this URI for other purposes, and not process it as a host-meta 233 file. 235 To aid in this process, authorities using this mechanism SHOULD 236 correctly label host-meta responses with the "application/host-meta" 237 internet media type. 239 5. Minting New meta-fields 241 Applications that wish to mint new meta-fields for use in the host- 242 meta format MUST register them in the host-meta field-registry, 243 following the procedures in Section 7.2. Field-names MUST conform to 244 the field-name ABNF Section 3, and field-value syntax MUST be well- 245 defined (e.g., using ABNF, or a reference to the syntax of an 246 existing header field-value). Field-values SHOULD use the ISO-859-1 247 character encoding. If a field-value applies to a scope other than 248 the entire authority, that scope MUST be well-defined. 250 6. Security Considerations 252 The metadata returned by the /host-meta resource is presumed to be 253 under the control of the appropriate authority and representative of 254 all resources contained by it. If this resource is compromised or 255 otherwise under the control of another party, it may represent a risk 256 to the security of the server and data served by it, depending on 257 what mechanisms use /host-meta. 259 Scoping metadata to a single authority is the default in host-meta. 260 Thus "http://example.com/", "https://example.com" and 261 "http://www.example.com/" all have different host-meta files with 262 seperate and non-overlapping scopes of applicability. Applications 263 that change the scope of metadata can incur security risks without 264 careful consideration. 266 7. IANA Considerations 268 7.1. application/host-meta Media Type Registration 270 The host-meta format can be identified with the following media type: 272 MIME media type name: application 273 MIME subtype name: host-meta 274 Mandatory parameters: None. 275 Optional parameters: None. 276 Encoding considerations: field-values may specify any encoding for 277 their contents, although it is expected that most will use ISO- 278 8859-1 or a subset thereof (for both historic and interoperability 279 purposes). 280 Security considerations: As defined in this specification. [[update 281 upon publication]] 282 Interoperability considerations: There are no known interoperability 283 issues. 284 Published specification: This specification. [[update upon 285 publication]] 286 Applications which use this media type: No known applications 287 currently use this media type. 289 Additional information: 291 Magic number(s): 292 File extension: None. 293 Fragment identifiers: None. 294 Base URI: None. 295 Macintosh File Type code: TEXT 296 Person and email address to contact for further information: Mark 297 Nottingham 298 Intended usage: COMMON 299 Author/Change controller: This specification's author(s). [[update 300 upon publication]] 302 7.2. The host-meta Field Registry 304 This document establishes the host-meta field registry as the 305 namespace of field-names for use in meta-fields. Although some meta- 306 fields may be similar to message headers, both syntactically and 307 semantically, the host-meta field registry is separate from the 308 message header field registry [RFC3864] See Section 5 for details and 309 requirements for registered meta-fields. 311 meta-fields may be registered on the advice of a Designated Expert 312 (appointed by the IESG or their delegate), with a Specification 313 Required (using terminology from [RFC5226]). 315 Registration requests consist of the completed registration template 316 Section 7.2.1, typically published in an RFC or Open Standard (in the 317 sense described by [RFC2026], section 7). However, to allow for the 318 allocation of values prior to publication, the Designated Expert may 319 approve registration once they are satisfied that an RFC (or other 320 Open Standard) will be published. 322 Upon receiving a registration request (usually via IANA), the 323 Designated Expert should request review and comment from the apps- 324 discuss mailing list (or a successor designated by the APPS Area 325 Directors). Before a period of 30 days has passed, the Designated 326 Expert will either approve or deny the registration request, 327 communicating this decision both to the review list and to IANA. 328 Denials should include an explanation and, if applicable, suggestions 329 as to how to make the request successful. 331 7.2.1. Registration Template 333 Field name: The name requested for the new meta-field. This MUST 334 conform to the host-meta field specification details noted in 335 Section 3 336 Change controller: For RFCs, state "IETF". For other open 337 standards, give the name of the publishing body (e.g., ANSI, ISO, 338 ITU, W3C, etc.). A postal address, home page URI, telephone and 339 fax numbers may also be included. 340 Specification document(s): Reference to document that specifies the 341 field, preferably including a URI that can be used to retrieve a 342 copy of the document. An indication of the relevant sections may 343 also be included, but is not required. 344 Related information: Optionally, citations to additional documents 345 containing further relevant information. 347 7.2.2. The Link host-meta field 349 This specification registers one host-meta field. 351 Field name: Link 352 Change controller: IETF 353 Specification document(s): [[this document]] 354 Related information: [I-D.nottingham-http-link-header] 356 8. References 358 8.1. Normative References 360 [I-D.nottingham-http-link-header] 361 Nottingham, M., "Link Relations and HTTP Header Linking", 362 draft-nottingham-http-link-header-03 (work in progress), 363 November 2008. 365 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 366 3", BCP 9, RFC 2026, October 1996. 368 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 369 Requirement Levels", BCP 14, RFC 2119, March 1997. 371 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 372 Resource Identifier (URI): Generic Syntax", STD 66, 373 RFC 3986, January 2005. 375 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 376 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 377 May 2008. 379 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 380 Specifications: ABNF", STD 68, RFC 5234, January 2008. 382 8.2. Informative References 384 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 385 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 386 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 388 [RFC3864] Klyne, G., Nottingham, M., and J. Mogul, "Registration 389 Procedures for Message Header Fields", BCP 90, RFC 3864, 390 September 2004. 392 [RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed 393 Authoring and Versioning (WebDAV)", RFC 4918, June 2007. 395 [W3C.REC-P3P-20020416] 396 Marchiori, M., "The Platform for Privacy Preferences 1.0 397 (P3P1.0) Specification", W3C REC REC-P3P-20020416, 398 April 2002. 400 URIs 402 [1] 404 Appendix A. Acknowledgements 406 We would like to acknowledge the contributions of everyone who 407 provided feedback and use cases for this draft; in particular, Phil 408 Archer, Dirk Balfanz, Tim Bray, Paul Hoffman, Barry Leiba, Ashok 409 Malhotra, Breno de Medeiros, and John Panzer. The authors take all 410 responsibility for errors and omissions. 412 Appendix B. Frequently Asked Questions 414 B.1. Is this mechanism appropriate for all kinds of metadata? 416 No. The primary use cases are described in the introduction; when 417 it's necessary to discover metadata or policy before a resource is 418 accessed, and/or it's necessary to describe metadata for a whole 419 authority (or large portions of it), host-meta is appropriate. In 420 other cases (e.g., fine-grained metadata that doesn't need to be 421 known ahead of time), other mechanisms are more appropriate. 423 B.2. Why not use OPTIONS * with content negotiation to discover 424 different types of metadata directly? 426 Two reasons; a) OPTIONS is not cacheable -- a severe problem for 427 scaling -- and b) it is not well-supported in browsers, and difficult 428 to configure in servers. 430 B.3. Why not use a META tag or microformat in the root resource? 432 This places constraints on the format of an authority's root resource 433 to be HTML or similar. While extremely common, it isn't universal 434 (e.g., mobile sites, machine-to-machine communication, etc.). Also, 435 some root resources are very large, which would place additional 436 overhead on clients and intervening networks. 438 B.4. Why not use response headers on the root resource, and have 439 clients use HEAD? 441 The headers on a root resource pertain to that resource, not the 442 whole site. While it is possible to mint new message headers that 443 apply to the whole site, such a header would need to be sent on every 444 response for the root resource, whether it was useful or not, with 445 the potential for substantially increasing the size of those 446 responses (which are often popular, and not very cacheable). 448 B.5. Why scope metadata to an authority? 450 The alternative is to allow scoping to be dynamic and determined 451 locally, but this has its own issues, which usually come down to a) 452 an unreasonable number of requests to determine authoritative 453 metadata, b) increased complexity, with a higher likelihood of 454 implementation and interoperability (or even security) problems. 455 Besides, many mechanisms on the Web already presume a single 456 authority scope (e.g., robots.txt, P3P, cookies, javascript 457 security), and the effort and cost required to mint a new URI 458 authority is small and shrinking. 460 B.6. Why /host-meta? 462 It's short, descriptive and according to search indices, not widely 463 used. 465 B.7. Aren't you concerned about pre-empting an authority's URI 466 namespace? 468 Yes, but it's unfortunately a necessary (and already present) evil; 469 this proposal tries to minimise future abuses. 471 B.8. Why use link relations instead of media types to identify kinds of 472 metadata? 474 A link relation declares the intent and use of the link (or inline 475 content, when present); a media type defines the format and 476 processing model for those bits. 478 B.9. What impact does this have on existing mechanisms, such as P3P and 479 robots.txt? 481 None, until they choose to use this mechanism. 483 B.10. Why not (insert existing similar mechanism here)? 485 We are aware that there are several existing proposals with similar 486 functionality. In our estimation, none have gained sufficient 487 traction. This may be because they were perceived to be too complex, 488 or tied too closely to one use case. 490 Appendix C. Document History 492 [[RFC Editor: please remove this section before publication.]] 494 o -01 495 * Changed "site-meta" to "host-meta" after feedback. 496 * Changed from XML to text-based header-like format. 497 * Remove capability for generic inline content. 498 * Added registry for host-meta fields. 499 * Clarified scope of metadata application. 500 * Added security consideration about HTTP vs. HTTPS, expanding 501 scope. 503 Authors' Addresses 505 Mark Nottingham 507 Email: mnot@mnot.net 508 URI: http://www.mnot.net/ 510 Eran Hammer-Lahav 512 Email: eran@hueniverse.com 513 URI: http://hueniverse.com/