idnits 2.17.1 draft-giudici-web-robots-cntrl-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Expected the document's filename to be given on the first page, but didn't find any Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 219 has weird spacing: '...ex.html atp...' == Line 223 has weird spacing: '...ex.html atp:/...' == Line 225 has weird spacing: '...ex.html atp:/...' == Line 227 has weird spacing: '...ex.html not a...' == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- The document date (August 22, 1997) is 9741 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 1738 (ref. '2') (Obsoleted by RFC 4248, RFC 4266) ** Obsolete normative reference: RFC 822 (ref. '4') (Obsoleted by RFC 2822) Summary: 13 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT F. Giudici, A. Sappia 3 Category: Informational University of Genoa, Italy 4 February 22, 1997 Expires August 22, 1997 6 An Extension to the Web Robots Control Method 7 for supporting Mobile Agents 9 Status of this Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents on the Internet Engineering Task Force (IETF), its areas, 13 and its working groups. Note that other groups may also distribute 14 working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet-Drafts as reference 19 material or to cite them other than as ``work in progress''. 21 To learn the current status of any Internet-Draft, please check the 22 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 1. Abstract 29 The Web Robots Control Standard [1] is a method for administrators of 30 sites on the World-Wide-Web to give instructions to visiting Web 31 robots. This document describes an extension for supporting Robots 32 based on Mobile Agents, in a way that is independent of the 33 technology used for their actual implementation. 35 2. Introduction 37 Web Robots are Web client programs that automatically traverse the 38 World Wide Web by retrieving a document and recursively retrieving 39 all documents that are referenced. Robots are used for maintenance, 40 indexing and search purposes. 42 ``Classic'' Robots perform their job from the host from which they 43 have been launched; recent technologies offer the possibility of 44 writing Robots that are able to physically move through the network, 45 to operate within the website that hosts data being processed. 47 Mobile Robots can lead to bandwidth and computational power savings, 48 as well as to personalized search robots. A more detailed discussion 49 of Mobile Robots pros and cons is out of the purposes of this 50 document. 52 Mobile Agents [5] is a technology that, among other things, allows 53 the implementation of Mobile Robots. Mobile Agents are a 54 computational paradigm in which programs can ``migrate'' from host to 55 host, preserving their current state. 57 To migrate through the Internet, Mobile Agents have to transfer data 58 over the networks, for both their code and their internal data 59 structures. On this purpose, they need a communication protocol. 61 To receive and execute a Mobile Agent, a host must be equipped with a 62 proper daemon that listen a port for incoming requests. 64 Given the protocol name and the port number that the daemon is 65 listening, addresses for Mobile Agents destinations can be written in 66 form of a URL [2] as follows: 68 :// : 70 For instance, considering the Agent Transfer Protocol (ATP) [3] and 71 given a fictional site www.fict.org, a valid address for dispatching 72 a Mobile Agent could be 74 atp://www.fict.org:434 76 3. Specification 78 To control the way Robots can access a WWW site, a method is being 79 currently used [1]. Simply speaking, the method states that a special 80 document, named /robots.txt and whose MIME type is text/plain, should 81 be available at the root of the website. Referring to the previous 82 example, the URL of this document would be 84 http://www.fict.org/robots.txt 86 /robots.txt contains a list of records that describe in details which 87 subtrees of the website are available for exploration by a given 88 Robot and which are not. The format of these records is the following 89 one: 91 ":" 93 A typical example follows: 95 User-agent: webcrawler 96 Allow: / 97 Disallow: /reserved 99 The method specifications allow extensions to this structure, so new 100 records can be added by just defining new tokens. 102 3.1. The-Mobile-agent-server record 104 To control dispatching of Mobile Robots, a new record type is defined 105 with the following form (the formal syntax is described in the next 106 section): 108 Mobile-agent-server: 110 These records associate a well defined path on the website to the URL 111 of a host that accepts Mobile Robots for exploring that path. 113 More than one Mobile-agent-server line can be used, and in this case 114 more recent lines always override older ones. Using multiple lines 115 allows to assign different subtrees to different Mobile Agent capable 116 hosts, or eventually to none. In the following example the website 117 root (/) is not assigned to any host, while /dir1 and /dir1/dir2 are 118 assigned to different targets: 120 Mobile-agent-server: / none 121 Mobile-agent-server: /dir1 atp://www.fict.org:544 122 Mobile-agent-server: /dir1/dir2 atp://www.fict.org:543 124 This mechanism is independent of the protocol and the programming 125 language used for implementing the Mobile Robot. 127 3.2. Formal Syntax 129 This is a BNF-like description of the Mobile-agent-server record 130 line, using the conventions of RFC 822 [4], except that "|" is used 131 to designate alternatives. Briefly, literals are quoted with "", 132 parentheses "(" and ")" are used to group elements, optional elements 133 are enclosed in [brackets], and elements may be preceded with * to 134 designate n or more repetitions of the following element; n defaults 135 to 0. 137 The Mobile Robot extension defines a new record line as follows: 139 mobileagentrec = "Mobile-agent-server:" *space path 140 *space (simplified_url | "none") 142 simplified_url = scheme "://" net_loc 143 scheme = 1*( alpha | digit | "+" | "-" | "." ) 144 net_loc = *( pchar | ";" | "?" ) 146 space = 1*(SP | HT) 148 The simplified URL is a subcase of a URL as defined in RFC 1808 [2] 149 and only designates a protocol, a network location and a port number. 151 The syntax for "path" and other symbols are defined in RFC 1808 and 152 reproduced here for convenience: 154 path = fsegment *( "/" segment) 155 fsegment = 1*pchar 156 segment = *pchar 158 pchar = uchar | ":" | "@" | "&" | "=" 159 uchar = unreserved | escape 160 unreserved = alpha | digit | safe | extra 162 escape = "%" hex hex 163 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | 164 "a" | "b" | "c" | "d" | "e" | "f" 166 alpha = lowalpha | hialpha 167 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | 168 "h" | "i" | "j" | "k" | "l" | "m" | "n" | 169 "o" | "p" | "q" | "r" | "s" | "t" | "u" | 170 "v" | "w" | "x" | "y" | "z" 171 hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | 172 "H" | "I" | "J" | "K" | "L" | "M" | "N" | 173 "O" | "P" | "Q" | "R" | "S" | "T" | "U" | 174 "V" | "W" | "X" | "Y" | "Z" 176 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | 177 "7" | "8" | "9" 179 safe = "$" | "-" | "_" | "." | "+" 180 extra = "!" | "*" | "'" | "(" | ")" | "," 182 4. Examples 184 This section contains an example of how an extended /robots.txt may 185 be used. 187 Let us suppose that a fictional site has the following URLs: 189 http://www.fict.org/ 190 http://www.fict.org/index.html 191 http://www.fict.org/services/ 192 http://www.fict.org/services/index.html 193 http://www.fict.org/robots.txt 194 http://www.fict.org/home/ 195 http://www.fict.org/home/user1/ 196 http://www.fict.org/home/user1/index.html 197 http://www.fict.org/home/user2/ 198 http://www.fict.org/home/user2/index.html 199 http://www.fict.org/home/user3/ 200 http://www.fict.org/home/user3/index.html 202 Let be user1.fict.org and user2.fict.org two hosts equipped for 203 receiving Mobile Agents, for example by means of the ATP protocol. 205 The /robots.txt contains Mobile Agents directives as follows: 207 Mobile-agent-server: / atp://www.fict.org:8001 208 Mobile-agent-server: /home/ none 209 Mobile-agent-server: /home/user1/ atp://user1.fict.org:854 210 Mobile-agent-server: /home/user2/ atp://user2.fict.org:831 212 The following matrix shows if Mobile Agents are supported for 213 indexing a given document, and on which host: 215 URL HOST 217 http://www.fict.org/index.html atp://www.fict.org:8001 218 http://www.fict.org/services/ atp://www.fict.org:8001 219 http://www.fict.org/services/index.html atp://www.fict.org:8001 220 http://www.fict.org/robots.txt atp://www.fict.org:8001 221 http://www.fict.org/home/ not available 222 http://www.fict.org/home/user1/ atp://user1.fict.org:854 223 http://www.fict.org/home/user1/index.html atp://user1.fict.org:854 224 http://www.fict.org/home/user2/ atp://user1.fict.org:831 225 http://www.fict.org/home/user2/index.html atp://user1.fict.org:831 226 http://www.fict.org/home/user3/ not available 227 http://www.fict.org/home/user3/index.html not available 229 5. Security considerations 231 The Mobile-agent-server record can expose the existence of resources 232 not otherwise linked to on the site, which may aid people guessing 233 for URLs. 235 If the exposed resource is the URL of a document, no further risks 236 are induced other than those ones already implied by the standard 237 mechanism. 239 If the exposed resource is the URL of a site that can host Mobile 240 Agents, security problems are to be dealt with at the site itself by 241 means of a proper security model that should allow incoming Robots to 242 only perform those operations needed for exploring the assigned 243 website subtrees. However this is an issue related to the specific 244 technology used for the implementation of the Mobile Robots and it is 245 not to be discussed here. 247 The same considerations about impersonation and encryption stated in 248 the Standard Specification also apply here. 250 6. References 252 [1] Koster, M. "A Standard for Robot Exclusion", 253 http://info.webcrawler.com/mak/projects/robot/norobots.html, June 254 1994. 256 [2] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform Resource 257 Locators (URL)", RFC 1738, CERN, Xerox PARC, University of Minnesota, 258 December 1994. 260 [3] Lange, D. B., "Agent Transfer Protocol - ATP/0.1 Draft", IBM 261 Tokyo Research Laboratory, 262 http://www.trl.ibm.co.jp/aglets/atp/atp.htm, July 1996. 264 [4] Crocker, D., "Standard for the Format of ARPA Internet Text 265 Messages", STD 11, RFC 822, UDEL, August 1982. 267 [5] Chang, D. T., and Lange, D. B., "Mobile Agents: A New Paradigm 268 for Distributed Object Computing on the WWW", IBM Tokyo Research 269 Laboratory, OOPSLA'96 Workshop "Toward the integration of WWW and 270 Distributed Object Technology", 271 http://www.trl.ibm.co.jp/aglets/atp/ma.html. 273 7. Authors' Addresses 275 Fabrizio Giudici, fritz@dibe.unige.it, phone: +39-10-3532192 276 Andrea Sappia, sappia@dibe.unige.it, phone: +39-10-3532192 278 Electronic Systems and Networking Group 279 Department of Biophysical and Electronic Engineering 280 University of Genoa 281 Via Opera Pia 11/a, 16145 - Genoa, ITALY 283 Expires August 22, 1997