idnits 2.17.1 

draft-pritchard-http-links-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-19) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 507 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.
     (A line matching the expected section header was found, but with an
    unexpected indentation:
     '  1. Introduction' )

  ** The document seems to lack a Security Considerations section.
     (A line matching the expected section header was found, but with an
    unexpected indentation:
     ' 10. Security Considerations' )

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 114 instances of too long lines in the document, the longest
     one being 4 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (21 November 1996) is 10011 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '1' on line 156 looks like a reference

  -- Missing reference section? '4' on line 397 looks like a reference

  -- Missing reference section? '2' on line 415 looks like a reference

  -- Missing reference section? '6' on line 200 looks like a reference

  -- Missing reference section? '7' on line 258 looks like a reference

  -- Missing reference section? '3' on line 227 looks like a reference

  -- Missing reference section? '8' on line 228 looks like a reference

  -- Missing reference section? '5' on line 397 looks like a reference


     Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Draft                                       John Pritchard
2	<draft-pritchard-http-links-00>         Columbia U Computer Science

4	Expires June 1996                                  21 November 1996

6	                  Efficient HyperLink Maintenance for HTTP

8	Status of this Memo

10	This document is an Internet-Draft. Internet-Drafts are working documents of
11	the Internet Engineering Task Force (IETF), its areas, and its working
12	groups. Note that other groups may also distribute working documents as
13	Internet-Drafts.

15	Internet-Drafts are draft documents valid for a maximum of six months and
16	may be updated, replaced, or obsoleted by other documents at any time. It is
17	inappropriate to use Internet- Drafts as reference material or to cite them
18	other than as ``work in progress.''

20	To learn the current status of any Internet-Draft, please check the
21	``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow
22	Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au
23	(Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West
24	Coast).

26	Distribution of this document is unlimited. Please send comments to John
27	Pritchard at <jdp@cs.columbia.edu>

29	Abstract

31	Hyperlink maintenance allows robots and servers to cooperate in propagating
32	the effects of daily changes in the millions of resource locations in the
33	wwweb. Here, we propose developing the definitions of the LINK and UNLINK
34	methods defined for HTTP since RFC 1945 and which remain largely
35	unimplemented and unused. We believe that the only reason these methods have
36	not been employed is that they remain too loosely defined and implicitly too
37	inefficient. A new syntax and semantics simplify implementation and improve
38	utility.

40	Author's address

42	John Pritchard
43	315 W 82nd Street, #4
44	New York, NY 10024

46	<jdp@cs.columbia.edu>

48	Contents

50	  1. Introduction

52	  2. Link Terminology

54	  3. Implementation Terminology

56	  4. Current HTTP Link Management Protocol

58	  5. Some linking practices

60	  6. Proposed Facility

62	  7. Methods

64	       1. LINK

66	       2. UNLINK

68	       3. UNLINKR

70	       4. LINKMOD

72	  8. Implementation

74	  9. Indempotency

76	 10. Security Considerations

78	 11. Syntax

80	 12. References

82	  1. Introduction

84	     The HTTP protocol has recognized the importance of link management
85	     since HTTP/1.0 RFC 1945 [1]. However, the methods defined in HTTP/1.0
86	     are limited and remain largely unimplemented. The existing link concept
87	     is defined irrespective of direction, ie, reference or resource, and so
88	     leaves too much semantically implied. The revised methods define simple
89	     and efficient syntax and semantics for a complete hyperlink management
90	     protocol within HTTP.

92	     Dangling links are a bigger and bigger problem on a large and growing
93	     wwweb. Messages like the following are common:

95	       The URL which you entered, ... , was not found on this server.
96	       You may have entered it incorrectly, or it may no longer exist.
97	       If you arrived here by clicking on a link in another page,
98	       please tell that page's owner/administrator that the link no
99	       longer exists.

101	     This one resulted from a URL stored in a popular search engine. A
102	     solution is readily available in defining HTTP's LINK and UNLINK
103	     methods with syntax and semantics that effectively and efficiently
104	     provide for hyperlink maintenace.

106	     Hyperlink maintenance implies communication, processing and storage
107	     costs. The proposed methods cut processing with syntax by not defining
108	     semantics that imply searching on behalf of call receivers. The
109	     proposed methods' semantics also match storage requirements to the HTML
110	     LINK tag concept. Storage space is not required on behalf of robots for
111	     implementation.

113	     The protocol detailed here is currently being implemented in an
114	     HTTP/1.1 compliant, commercial wwweb server and agent platform under
115	     the extensions provisions of that specification. This protocol has been
116	     realized as the result of that effort.

118	  2. Link Terminology

120	     In this context we refer exclusively to links that are Uniform Resource
121	     Locators, see URL [4] and [2]. URLs are Uniform Resource Indentifiers,
122	     URIs [1], pointing to particular resources without variation per user
123	     identity, class or input, or other particularly perishable or localized
124	     circumstances.

126	     A link has two end points, one in an HTML anchor or otherwise a URL
127	     reference, and the other in the HTTP service providing access to a
128	     resource via a reference. The source end of a link is the client or
129	     anchor end, sometimes the tail, and the target end of a link is the
130	     resource end, sometimes the head.

132	       source: anchor, reference, tail

134	       target: resource, head, server, named anchor

136	     Usage for source and target include direct reference to documents, or
137	     reference locators (URLs), or the services (hosts) at the respective
138	     ends of a link.

140	     For discussing efficiency, we describe a shorter URI as coarser, and a
141	     longer one finer. The comparison could be made for URIs into the same
142	     sub-wwweb, for example

144	        http://www.target.com/some/long/path/     A

146	        http://www.target.com/some/path/          B

148	     B is coarser than A. If a coarser URI replaces a finer one, the
149	     implication of clobbered namespaces arises as well as a greater
150	     potential need for link modifications. Remember that handling URLs, or
151	     particular resource locators, implies that for each link there's an
152	     unlink.

154	  3. Implementation Terminology

156	     In agreement with the HTTP specification documents and RFC 1123 [1], we
157	     employ must, shall or required to indicate implementation syntax or
158	     semantics that are not optional for software conforming to this
159	     specification, may for recommended features and should for optional
160	     features.

162	     Please note that this draft does not constitute a modification of any
163	     standard, rfc, or draft document but a proposal for review by the HTTP
164	     Working Group and the internet administration and development
165	     community.

167	  4. Current HTTP Link Management Protocol

169	     The LINK and UNLINK methods are described in HTTP/1.1 [2] draft seven,
170	     sections 19.6.1.2 and 3, respectively. In short, the link and unlink
171	     request lines include method names and a request URI.

173	     The specification [2] states (section 5.3)

175	          The LINK method establishes one or more Link relationships
176	          between the existing resource identified by the Request-URI
177	          and other existing resources.

179	          The UNLINK method removes one or more Link relationships
180	          from the existing resource identified by the Request-URI.
181	          These relationships may have been established using the
182	          LINK method or by any other method supporting the Link
183	          header. The removal of a link to a resource does not imply
184	          that the resource ceases to exist or becomes inaccessible
185	          for future references.

187	     Without providing both the source and target of a link for LINKing or
188	     UNLINKing, the processing requirements for implementation of the
189	     current methods imply looking up the other end of the link. Link source
190	     or unlink target information is required in request headers, or on the
191	     request line to allow a valuable optimization -- eliminating excess
192	     searching or indexing.

194	  5. Some linking practices

196	     Hyperlink maintenance methods are required for wwweb organization and
197	     must be interoperable across wwweb servers and robots in order to be
198	     effective. Robots and wanderers maintain catalogs of URI references and
199	     hypertext. Currently, unlink maintenance of these catalogs is largely
200	     manual. The Robot Exclusion Standard or "/robots.txt" [6] is currently
201	     considering a new facility for informing robots of changes to a
202	     server's sub-web, but doesn't address the server to server case that
203	     most links fall into. The passive existance of a link directive
204	     instrument on a server would require every server to get the linking
205	     directives from every other server and apply them heuristically to try
206	     to weed out broken links. This is untenable for broad use by
207	     communication and processing requirements and by the complexity of
208	     implementation. RES is useful for directing searches on subwebs by
209	     robots and is fairly widely employed by search engines and other
210	     robots.

212	     The URN [7] proposal is another idea that is sometimes mentioned but
213	     really isn't relevant. It creates a hierarchical global namespace for
214	     resources, and is designed for resources with extensive lifetimes, and
215	     not the ordinary class of information. Named linking would be extremely
216	     useful for putting hyperlinks into this document for reference
217	     material. With a particular URN namespace, the reader would potentially
218	     find the closest copy, perhaps a local copy of an RFC or Internet-Draft
219	     document, rather than simply use the link provided to the USA East
220	     Coast repository provided here. But even URN may not be appropriate for
221	     drafts with six month lifetimes.

223	     WWWeb meta information and versioning are important in this context as
224	     the proposed link maintenance extensions could benefit from mutual
225	     implementation in a wwweb server's object management system in
226	     conjunction with "Version management with meta-level links via
227	     HTTP/1.1" [3]. Content level links (see "Link" content header in
228	     HTTP/1.0 [2] and LINK entity in HTML 2.0 [8]) provide a default storage
229	     mechanism for link maintenance information.

231	  6. Proposed Facility

233	     Required semantics are very limited. Only support for the LINK call,
234	     and clean disposal of other calls, is required by implementing systems.

236	     This simple, lightweight form doesn't require storage overhead on
237	     robots, crawlers, etc..

239	     The cost of employing this automation is lower than might first be
240	     imagined as link changes with coarser effects are rarer than link
241	     changes with finer effects. Unlinks potentially occur for each link,
242	     without matching coarse URIs into fine URLs.

244	     If the wwweb server maintains a table of LINKs for the target document,
245	     it can issue UNLINKs to delete or revise others' information when the
246	     location changes or is deleted. So the average cost in simple network
247	     calls and table size is linear in number of links. Unlink calls'
248	     generation versus link calls' receipt ratio depends entirely on the
249	     server site characteristics.

251	     The table for a particular doc.html would store link source info, or
252	     reverse links. The UNLINK call is made to the host in the source end of
253	     the link, with the source and target links so that it can handle the
254	     request with minimal overhead. The LINK call is made to the host
255	     serving the target when the reference locator is used in a link-source
256	     document.

258	     Although HTML [7] defines LINK entities, in practice one doesn't want
259	     the wwweb server to download its link set with each HTML document -- if
260	     for no other reason than minimizing general bandwidth consumption.

262	  7. Methods

264	       1. LINK

266	          Linking provides for subsequent link modifications from the target
267	          to the source. Links change at their target side, so the link
268	          establishment between two HTTP implementing systems needs to allow
269	          the target side to tell the source side when a link URL has
270	          changed.

272	          The LINKMOD option tells the target end of the link that LINKMOD
273	          calls should be made to the source end.

275	          The target maintains a table of source links associated with
276	          particular resources so that if their URIs change the target can
277	          notify the source.

279	            LINK Source-URL Target-URL

281	            LINK Source-URL Target-URL LINKMOD

283	               Request
284	                    The source tells the target that a URL to the target has
285	               been stored at the source.

287	               Reply
288	                    The target will accept LINK calls with 200 Ok unless the
289	               Target-URL is invalid. In this case it will respond with a
290	               417 Invalid target URI. If the LINKMOD option is requested
291	               but not enabled, the 207 No Linkmod reply will be generated.

293	       2. UNLINK

295	          UNLINK removes previous LINK information. A source tells a target
296	          that the previous source referenced in a prior LINK call no longer
297	          exists or has moved.

299	            UNLINK Source-URL Target-URL

301	            UNLINK Source-URL Target-URL Repl-Source-URL

303	               Request
304	                    The source notifies the target that the source link has
305	               changed. Optionally, the source may specify a replacement
306	               source URL.

308	               Reply
309	                    The target replies with 200 Ok unless the source has
310	               specified invalid source or target URLs. In the case of
311	               erroneous source or target URIs, the target replies with one
312	               of 416 Invalid source URI or 417 Invalid target URI. The
313	               invalid target may indicate only that UNLINKR has not been
314	               supported by the target or source system. The invalid source
315	               reply occurs when there is no such source link information
316	               known to the target.

318	       3. UNLINKR

320	          This method allows the target to inform the source that a link has
321	          changed. It specifies that the first argument refers to a source
322	          link that it stores and the second argument refers to a target
323	          link from that source. It is redundant on the semantics of the
324	          UNLINK method if the semantics of the UNLINK method included
325	          determining whether the recipient of the call is the source or the
326	          target.

328	          For UNLINK, the receiver is the target end, and with UNLINKR, the
329	          receiver is the source end.

331	            UNLINKR Source-URL Target-URL

333	            UNLINKR Source-URL Target-URL Repl-Target-URL

335	               Request
336	                    The target notifies the source that the Target-URL
337	               referenced from location Source-URL is no longer valid. The
338	               target optionally provides the source with a replacement
339	               target URL.

341	               Reply
342	                    The source replies with 200 Ok unless the target has
343	               specified invalid target or source URLs. In the case of
344	               erroneous target or source URIs, the source replies with one
345	               of 416 Invalid target URI or 417 Invalid source URI. The
346	               invalid source may indicate only that UNLINK has not been
347	               supported by the source or target system. The invalid target
348	               reply occurs when there is no such target link information
349	               known to the source.

351	       4. LINKMOD

353	          A LINKMOD call could notify robots that a page has been updated.
354	          this would require that LINK be extended with optional request for
355	          LINKMOD calls.

357	          LINKMOD would be accepted by robots and crawlers in addition to
358	          UNLINK. The source will react according to its need for this
359	          information.

361	            LINKMOD Source-URL Target-URL

363	               Request
364	                    The target informs the source that the Target-URI has
365	               been modified.

367	               Reply
368	                    The source replies with 200 Ok unless the target has
369	               specified invalid target or source URLs. In the case of
370	               erroneous target or source URIs, the source replies with one
371	               of 416 Invalid target URI or 417 Invalid source URI. The
372	               invalid source may indicate only that UNLINK has not been
373	               supported by the source or target system. The invalid target
374	               reply occurs when there is no such target link information
375	               known to the source.

377	  8. Implementation

379	     We can divide all classes of HTTP-implementing software into two
380	     categories for specifying implementation requirements. The first is the
381	     class of systems that maintain no link references (no HTML or URL
382	     catalogs) in their internal data. These have no implementation
383	     requirements.

385	     The second is systems that maintain link references in HTML or URL
386	     catalog data. These include wwweb servers and search engines.

388	     The implementation must include LINK and may implement UNLINK, UNLINKR
389	     and LINKMOD. If it is only implementing LINK, it must reply with an Ok
390	     status code to any UNLINK, UNLINKR and LINKMOD calls it receives.

392	  9. Indempotency

394	     All of these methods are indempotent. Successive identical calls have
395	     identical effect as a single call. However, this requires that LINK is
396	     implemented to not replicate identical data. Please refer to RFCs 1738
397	     [4] and 1808 [5] and HTTP/1.1 [2] Section 3.2.3 "URI Comparison" for
398	     information on determining when a LINK request should be discarded in
399	     preserving indempotency.

401	 10. Security Considerations

403	     The UNLINK and UNLINKR methods' calls should be manually reviewed or
404	     automated and secured for trusted or authenticated hosts.

406	     At least robot-level spamming would be segmented into LINKMOD domain
407	     until people used UNLINK <target> <target> or the variation based on
408	     replicating pages, ie, UNLINK <target> <copy of target>.

410	 11. Syntax

412	     The syntax employs an induction operator, "=" (parser), and a deduction
413	     operator ":" (compiler). Literals are double quoted. Alternatives
414	     succeed "|". Where noted in ";" line comments, a syntactic variable may
415	     be defined in HTTP/1.1 [2]. Two linebreaks terminate a clause, any
416	     amount of whitespace is identical to a single token separator.

418	             Method        = "LINK"
419	                           | "UNLINK"
420	                           | "UNLINKR"
421	                           | "LINKMOD"

423	             Request       = Link-Request-Line
424	                           | Unlink-Request-Line
425	                           | UnlinkR-Request-Line
426	                           | LinkMod-Request-Line
427	                           *( general-header  )      ; HTTP/1.1 07 4.5
428	                           CRLF

430	             Link-Request-Line
431	               = "LINK" Source-URL Target-URL
432	               | "LINK" Source-URL Target-URL "LINKMOD"

434	             Unlink-Request-Line
435	               = "UNLINK" Source-URL Target-URL
436	               | "UNLINK" Source-URL Target-URL Repl-Source-URL

438	             UnlinkR-Request-Line
439	               = "UNLINKR" Source-URL Target-URL
440	               | "UNLINKR" Source-URL Target-URL Repl-Target-URL

442	             LinkMod-Request-Line
443	               = "LINKMOD" Source-URL Target-URL

445	             Source-URL    : URL     ; RFC 1738 Resource Locator

447	             Target-URL    : URL

449	             Repl-Target-URL
450	               : URL                 ; Suggested Link Replacement

452	             Repl-Source-URL
453	               : URL                 ; Suggested Link Replacement

455	             Response      = Status-Line  ; As HTTP/1.1

457	             Status-Code   = "200"   ; Ok
458	                           | "207"   ; No Linkmod
459	                           | "400"   ; Bad Request
460	                           | "404"   ; Not found
461	                           | "416"   ; Invalid source URI
462	                           | "417"   ; Invalid target URI
463	                           | "500"   ; Internal Server Error

465	 12. References

467	       1. Hypertext Transfer Protocol -- HTTP/1.0
468	          rfc1945
469	          T. Berners-Lee, R. Fielding, H. Frystyk
470	          May 1996

472	       2. Hypertext Transfer Protocol -- HTTP/1.1
473	          draft-ietf-http-v11-spec-07
474	          R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, T. Berners-Lee
475	          August 1996

477	       3. Version management with meta-level links via HTTP/1.1
478	          draft-ota-http-version-00
479	          K. Ota, K. Takahashi, K. Sekiya
480	          November 1996

482	       4. Uniform Resource Locators (URL)
483	          rfc1738
484	          T. Berners-Lee, L. Masinter, M. McCahill
485	          December 1994

487	       5. Relative Uniform Resource Locators
488	          rfc1808
489	          R. Fielding
490	          June 1995

492	       6. Robot Exclusion Standard
493	          norobots.html
494	          Martijn Koster

496	       7. A Framework for the Assignment and Resolution of Uniform Resource
497	          Names
498	          draft-daigle-urnframework-00
499	          Leslie L. Daigle
500	          June 1996

502	       8. Hypertext Markup Language - 2.0
503	          draft-ietf-html-spec-06
504	          T. Berners-Lee, D. Connolly
505	          September 1995