idnits 2.17.1 draft-pritchard-http-links-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 507 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. (A line matching the expected section header was found, but with an unexpected indentation: ' 1. Introduction' ) ** The document seems to lack a Security Considerations section. (A line matching the expected section header was found, but with an unexpected indentation: ' 10. Security Considerations' ) ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 114 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (21 November 1996) is 10011 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 156 looks like a reference -- Missing reference section? '4' on line 397 looks like a reference -- Missing reference section? '2' on line 415 looks like a reference -- Missing reference section? '6' on line 200 looks like a reference -- Missing reference section? '7' on line 258 looks like a reference -- Missing reference section? '3' on line 227 looks like a reference -- Missing reference section? '8' on line 228 looks like a reference -- Missing reference section? '5' on line 397 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft John Pritchard 2 Columbia U Computer Science 4 Expires June 1996 21 November 1996 6 Efficient HyperLink Maintenance for HTTP 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working documents of 11 the Internet Engineering Task Force (IETF), its areas, and its working 12 groups. Note that other groups may also distribute working documents as 13 Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six months and 16 may be updated, replaced, or obsoleted by other documents at any time. It is 17 inappropriate to use Internet- Drafts as reference material or to cite them 18 other than as ``work in progress.'' 20 To learn the current status of any Internet-Draft, please check the 21 ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow 22 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au 23 (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West 24 Coast). 26 Distribution of this document is unlimited. Please send comments to John 27 Pritchard at 29 Abstract 31 Hyperlink maintenance allows robots and servers to cooperate in propagating 32 the effects of daily changes in the millions of resource locations in the 33 wwweb. Here, we propose developing the definitions of the LINK and UNLINK 34 methods defined for HTTP since RFC 1945 and which remain largely 35 unimplemented and unused. We believe that the only reason these methods have 36 not been employed is that they remain too loosely defined and implicitly too 37 inefficient. A new syntax and semantics simplify implementation and improve 38 utility. 40 Author's address 42 John Pritchard 43 315 W 82nd Street, #4 44 New York, NY 10024 46 48 Contents 50 1. Introduction 52 2. Link Terminology 54 3. Implementation Terminology 56 4. Current HTTP Link Management Protocol 58 5. Some linking practices 60 6. Proposed Facility 62 7. Methods 64 1. LINK 66 2. UNLINK 68 3. UNLINKR 70 4. LINKMOD 72 8. Implementation 74 9. Indempotency 76 10. Security Considerations 78 11. Syntax 80 12. References 82 1. Introduction 84 The HTTP protocol has recognized the importance of link management 85 since HTTP/1.0 RFC 1945 [1]. However, the methods defined in HTTP/1.0 86 are limited and remain largely unimplemented. The existing link concept 87 is defined irrespective of direction, ie, reference or resource, and so 88 leaves too much semantically implied. The revised methods define simple 89 and efficient syntax and semantics for a complete hyperlink management 90 protocol within HTTP. 92 Dangling links are a bigger and bigger problem on a large and growing 93 wwweb. Messages like the following are common: 95 The URL which you entered, ... , was not found on this server. 96 You may have entered it incorrectly, or it may no longer exist. 97 If you arrived here by clicking on a link in another page, 98 please tell that page's owner/administrator that the link no 99 longer exists. 101 This one resulted from a URL stored in a popular search engine. A 102 solution is readily available in defining HTTP's LINK and UNLINK 103 methods with syntax and semantics that effectively and efficiently 104 provide for hyperlink maintenace. 106 Hyperlink maintenance implies communication, processing and storage 107 costs. The proposed methods cut processing with syntax by not defining 108 semantics that imply searching on behalf of call receivers. The 109 proposed methods' semantics also match storage requirements to the HTML 110 LINK tag concept. Storage space is not required on behalf of robots for 111 implementation. 113 The protocol detailed here is currently being implemented in an 114 HTTP/1.1 compliant, commercial wwweb server and agent platform under 115 the extensions provisions of that specification. This protocol has been 116 realized as the result of that effort. 118 2. Link Terminology 120 In this context we refer exclusively to links that are Uniform Resource 121 Locators, see URL [4] and [2]. URLs are Uniform Resource Indentifiers, 122 URIs [1], pointing to particular resources without variation per user 123 identity, class or input, or other particularly perishable or localized 124 circumstances. 126 A link has two end points, one in an HTML anchor or otherwise a URL 127 reference, and the other in the HTTP service providing access to a 128 resource via a reference. The source end of a link is the client or 129 anchor end, sometimes the tail, and the target end of a link is the 130 resource end, sometimes the head. 132 source: anchor, reference, tail 134 target: resource, head, server, named anchor 136 Usage for source and target include direct reference to documents, or 137 reference locators (URLs), or the services (hosts) at the respective 138 ends of a link. 140 For discussing efficiency, we describe a shorter URI as coarser, and a 141 longer one finer. The comparison could be made for URIs into the same 142 sub-wwweb, for example 144 http://www.target.com/some/long/path/ A 146 http://www.target.com/some/path/ B 148 B is coarser than A. If a coarser URI replaces a finer one, the 149 implication of clobbered namespaces arises as well as a greater 150 potential need for link modifications. Remember that handling URLs, or 151 particular resource locators, implies that for each link there's an 152 unlink. 154 3. Implementation Terminology 156 In agreement with the HTTP specification documents and RFC 1123 [1], we 157 employ must, shall or required to indicate implementation syntax or 158 semantics that are not optional for software conforming to this 159 specification, may for recommended features and should for optional 160 features. 162 Please note that this draft does not constitute a modification of any 163 standard, rfc, or draft document but a proposal for review by the HTTP 164 Working Group and the internet administration and development 165 community. 167 4. Current HTTP Link Management Protocol 169 The LINK and UNLINK methods are described in HTTP/1.1 [2] draft seven, 170 sections 19.6.1.2 and 3, respectively. In short, the link and unlink 171 request lines include method names and a request URI. 173 The specification [2] states (section 5.3) 175 The LINK method establishes one or more Link relationships 176 between the existing resource identified by the Request-URI 177 and other existing resources. 179 The UNLINK method removes one or more Link relationships 180 from the existing resource identified by the Request-URI. 181 These relationships may have been established using the 182 LINK method or by any other method supporting the Link 183 header. The removal of a link to a resource does not imply 184 that the resource ceases to exist or becomes inaccessible 185 for future references. 187 Without providing both the source and target of a link for LINKing or 188 UNLINKing, the processing requirements for implementation of the 189 current methods imply looking up the other end of the link. Link source 190 or unlink target information is required in request headers, or on the 191 request line to allow a valuable optimization -- eliminating excess 192 searching or indexing. 194 5. Some linking practices 196 Hyperlink maintenance methods are required for wwweb organization and 197 must be interoperable across wwweb servers and robots in order to be 198 effective. Robots and wanderers maintain catalogs of URI references and 199 hypertext. Currently, unlink maintenance of these catalogs is largely 200 manual. The Robot Exclusion Standard or "/robots.txt" [6] is currently 201 considering a new facility for informing robots of changes to a 202 server's sub-web, but doesn't address the server to server case that 203 most links fall into. The passive existance of a link directive 204 instrument on a server would require every server to get the linking 205 directives from every other server and apply them heuristically to try 206 to weed out broken links. This is untenable for broad use by 207 communication and processing requirements and by the complexity of 208 implementation. RES is useful for directing searches on subwebs by 209 robots and is fairly widely employed by search engines and other 210 robots. 212 The URN [7] proposal is another idea that is sometimes mentioned but 213 really isn't relevant. It creates a hierarchical global namespace for 214 resources, and is designed for resources with extensive lifetimes, and 215 not the ordinary class of information. Named linking would be extremely 216 useful for putting hyperlinks into this document for reference 217 material. With a particular URN namespace, the reader would potentially 218 find the closest copy, perhaps a local copy of an RFC or Internet-Draft 219 document, rather than simply use the link provided to the USA East 220 Coast repository provided here. But even URN may not be appropriate for 221 drafts with six month lifetimes. 223 WWWeb meta information and versioning are important in this context as 224 the proposed link maintenance extensions could benefit from mutual 225 implementation in a wwweb server's object management system in 226 conjunction with "Version management with meta-level links via 227 HTTP/1.1" [3]. Content level links (see "Link" content header in 228 HTTP/1.0 [2] and LINK entity in HTML 2.0 [8]) provide a default storage 229 mechanism for link maintenance information. 231 6. Proposed Facility 233 Required semantics are very limited. Only support for the LINK call, 234 and clean disposal of other calls, is required by implementing systems. 236 This simple, lightweight form doesn't require storage overhead on 237 robots, crawlers, etc.. 239 The cost of employing this automation is lower than might first be 240 imagined as link changes with coarser effects are rarer than link 241 changes with finer effects. Unlinks potentially occur for each link, 242 without matching coarse URIs into fine URLs. 244 If the wwweb server maintains a table of LINKs for the target document, 245 it can issue UNLINKs to delete or revise others' information when the 246 location changes or is deleted. So the average cost in simple network 247 calls and table size is linear in number of links. Unlink calls' 248 generation versus link calls' receipt ratio depends entirely on the 249 server site characteristics. 251 The table for a particular doc.html would store link source info, or 252 reverse links. The UNLINK call is made to the host in the source end of 253 the link, with the source and target links so that it can handle the 254 request with minimal overhead. The LINK call is made to the host 255 serving the target when the reference locator is used in a link-source 256 document. 258 Although HTML [7] defines LINK entities, in practice one doesn't want 259 the wwweb server to download its link set with each HTML document -- if 260 for no other reason than minimizing general bandwidth consumption. 262 7. Methods 264 1. LINK 266 Linking provides for subsequent link modifications from the target 267 to the source. Links change at their target side, so the link 268 establishment between two HTTP implementing systems needs to allow 269 the target side to tell the source side when a link URL has 270 changed. 272 The LINKMOD option tells the target end of the link that LINKMOD 273 calls should be made to the source end. 275 The target maintains a table of source links associated with 276 particular resources so that if their URIs change the target can 277 notify the source. 279 LINK Source-URL Target-URL 281 LINK Source-URL Target-URL LINKMOD 283 Request 284 The source tells the target that a URL to the target has 285 been stored at the source. 287 Reply 288 The target will accept LINK calls with 200 Ok unless the 289 Target-URL is invalid. In this case it will respond with a 290 417 Invalid target URI. If the LINKMOD option is requested 291 but not enabled, the 207 No Linkmod reply will be generated. 293 2. UNLINK 295 UNLINK removes previous LINK information. A source tells a target 296 that the previous source referenced in a prior LINK call no longer 297 exists or has moved. 299 UNLINK Source-URL Target-URL 301 UNLINK Source-URL Target-URL Repl-Source-URL 303 Request 304 The source notifies the target that the source link has 305 changed. Optionally, the source may specify a replacement 306 source URL. 308 Reply 309 The target replies with 200 Ok unless the source has 310 specified invalid source or target URLs. In the case of 311 erroneous source or target URIs, the target replies with one 312 of 416 Invalid source URI or 417 Invalid target URI. The 313 invalid target may indicate only that UNLINKR has not been 314 supported by the target or source system. The invalid source 315 reply occurs when there is no such source link information 316 known to the target. 318 3. UNLINKR 320 This method allows the target to inform the source that a link has 321 changed. It specifies that the first argument refers to a source 322 link that it stores and the second argument refers to a target 323 link from that source. It is redundant on the semantics of the 324 UNLINK method if the semantics of the UNLINK method included 325 determining whether the recipient of the call is the source or the 326 target. 328 For UNLINK, the receiver is the target end, and with UNLINKR, the 329 receiver is the source end. 331 UNLINKR Source-URL Target-URL 333 UNLINKR Source-URL Target-URL Repl-Target-URL 335 Request 336 The target notifies the source that the Target-URL 337 referenced from location Source-URL is no longer valid. The 338 target optionally provides the source with a replacement 339 target URL. 341 Reply 342 The source replies with 200 Ok unless the target has 343 specified invalid target or source URLs. In the case of 344 erroneous target or source URIs, the source replies with one 345 of 416 Invalid target URI or 417 Invalid source URI. The 346 invalid source may indicate only that UNLINK has not been 347 supported by the source or target system. The invalid target 348 reply occurs when there is no such target link information 349 known to the source. 351 4. LINKMOD 353 A LINKMOD call could notify robots that a page has been updated. 354 this would require that LINK be extended with optional request for 355 LINKMOD calls. 357 LINKMOD would be accepted by robots and crawlers in addition to 358 UNLINK. The source will react according to its need for this 359 information. 361 LINKMOD Source-URL Target-URL 363 Request 364 The target informs the source that the Target-URI has 365 been modified. 367 Reply 368 The source replies with 200 Ok unless the target has 369 specified invalid target or source URLs. In the case of 370 erroneous target or source URIs, the source replies with one 371 of 416 Invalid target URI or 417 Invalid source URI. The 372 invalid source may indicate only that UNLINK has not been 373 supported by the source or target system. The invalid target 374 reply occurs when there is no such target link information 375 known to the source. 377 8. Implementation 379 We can divide all classes of HTTP-implementing software into two 380 categories for specifying implementation requirements. The first is the 381 class of systems that maintain no link references (no HTML or URL 382 catalogs) in their internal data. These have no implementation 383 requirements. 385 The second is systems that maintain link references in HTML or URL 386 catalog data. These include wwweb servers and search engines. 388 The implementation must include LINK and may implement UNLINK, UNLINKR 389 and LINKMOD. If it is only implementing LINK, it must reply with an Ok 390 status code to any UNLINK, UNLINKR and LINKMOD calls it receives. 392 9. Indempotency 394 All of these methods are indempotent. Successive identical calls have 395 identical effect as a single call. However, this requires that LINK is 396 implemented to not replicate identical data. Please refer to RFCs 1738 397 [4] and 1808 [5] and HTTP/1.1 [2] Section 3.2.3 "URI Comparison" for 398 information on determining when a LINK request should be discarded in 399 preserving indempotency. 401 10. Security Considerations 403 The UNLINK and UNLINKR methods' calls should be manually reviewed or 404 automated and secured for trusted or authenticated hosts. 406 At least robot-level spamming would be segmented into LINKMOD domain 407 until people used UNLINK or the variation based on 408 replicating pages, ie, UNLINK . 410 11. Syntax 412 The syntax employs an induction operator, "=" (parser), and a deduction 413 operator ":" (compiler). Literals are double quoted. Alternatives 414 succeed "|". Where noted in ";" line comments, a syntactic variable may 415 be defined in HTTP/1.1 [2]. Two linebreaks terminate a clause, any 416 amount of whitespace is identical to a single token separator. 418 Method = "LINK" 419 | "UNLINK" 420 | "UNLINKR" 421 | "LINKMOD" 423 Request = Link-Request-Line 424 | Unlink-Request-Line 425 | UnlinkR-Request-Line 426 | LinkMod-Request-Line 427 *( general-header ) ; HTTP/1.1 07 4.5 428 CRLF 430 Link-Request-Line 431 = "LINK" Source-URL Target-URL 432 | "LINK" Source-URL Target-URL "LINKMOD" 434 Unlink-Request-Line 435 = "UNLINK" Source-URL Target-URL 436 | "UNLINK" Source-URL Target-URL Repl-Source-URL 438 UnlinkR-Request-Line 439 = "UNLINKR" Source-URL Target-URL 440 | "UNLINKR" Source-URL Target-URL Repl-Target-URL 442 LinkMod-Request-Line 443 = "LINKMOD" Source-URL Target-URL 445 Source-URL : URL ; RFC 1738 Resource Locator 447 Target-URL : URL 449 Repl-Target-URL 450 : URL ; Suggested Link Replacement 452 Repl-Source-URL 453 : URL ; Suggested Link Replacement 455 Response = Status-Line ; As HTTP/1.1 457 Status-Code = "200" ; Ok 458 | "207" ; No Linkmod 459 | "400" ; Bad Request 460 | "404" ; Not found 461 | "416" ; Invalid source URI 462 | "417" ; Invalid target URI 463 | "500" ; Internal Server Error 465 12. References 467 1. Hypertext Transfer Protocol -- HTTP/1.0 468 rfc1945 469 T. Berners-Lee, R. Fielding, H. Frystyk 470 May 1996 472 2. Hypertext Transfer Protocol -- HTTP/1.1 473 draft-ietf-http-v11-spec-07 474 R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, T. Berners-Lee 475 August 1996 477 3. Version management with meta-level links via HTTP/1.1 478 draft-ota-http-version-00 479 K. Ota, K. Takahashi, K. Sekiya 480 November 1996 482 4. Uniform Resource Locators (URL) 483 rfc1738 484 T. Berners-Lee, L. Masinter, M. McCahill 485 December 1994 487 5. Relative Uniform Resource Locators 488 rfc1808 489 R. Fielding 490 June 1995 492 6. Robot Exclusion Standard 493 norobots.html 494 Martijn Koster 496 7. A Framework for the Assignment and Resolution of Uniform Resource 497 Names 498 draft-daigle-urnframework-00 499 Leslie L. Daigle 500 June 1996 502 8. Hypertext Markup Language - 2.0 503 draft-ietf-html-spec-06 504 T. Berners-Lee, D. Connolly 505 September 1995