idnits 2.17.1 draft-rfced-exp-lakov-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-18) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == There are 8 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 433 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There are 125 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 60 has weird spacing: '...fill in and u...' == Line 351 has weird spacing: '...ate for next ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 11 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT EXPIRES AUGUST 1998 INTERNET DRAFT 3 American University in Bulgaria 4 Peter Lazarov Lakov 6 The Keyword Protocol (KP) 7 9 Status of This Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its 13 areas, and its working groups. Note that other groups may also 14 distribute working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six 17 months and may be updated, replaced, or obsoleted by other 18 documents at any time. It is inappropriate to use Internet- 19 Drafts as reference material or to cite them other than as 20 "work in progress." 22 To learn the current status of any Internet-Draft, please check 23 the "1id-abstracts.txt" listing contained in the Internet- 24 Drafts Shadow Directories on ftp.is.co.za (Africa), 25 ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), 26 ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 28 Distribution of this document is unlimited. 30 i) Summary 31 ii) Introduction 32 iii) Overview of the job done by the KP server 33 iv) Communication with the server. 34 iv.i) Communication with remote computers (wanders, robots) 35 iv.ii) Communication with local computers (users updating the 36 information about their pages) 37 v) Actions taken for each command 38 vi) Messages 40 i) Summary 42 This document provides a proposal for a new Internet protocol, 43 which should improve the relevance of results of search requests 44 for information in the WWW. It contains a description of the 45 basic model and the minimum set of commands necessary to run 46 properly. It is neither meant to nor should be regarded as a 47 definitive version, but rather as a suggestion to a new way of 48 looking at the relations between the search engines on one hand 49 and hosts running WEB servers on the other. 51 ii) Introduction 53 The purpose of this protocol is to improve the results of search 54 requests sent to search engines like Alta-Vista, Lycos, Infoseek, 55 etc. 57 Whenever a user browsing the WEB sends a search request to any of 58 the search engines, there is a high probability that the result 59 will contain many irrelevant entries. This happens because of the 60 way WEB wanderers, robots and spiders fill in and update their 61 databases. Currently their algorithms for finding information is 62 visiting different sites, browsing the file and taking out 63 keywords which will be used in subsequent queries. The keywords 64 taken out will depend on the algorithms of the robots, but 65 generally they will be either too few, or too many. If robots 66 parse only the title of a file, then surely many keywords, useful 67 for finding the file, will not be available, for it is not 68 possible to include them all in the title. If, on the other hand, 69 the whole file is parsed, surely many keywords, which have nothing 70 to do with the topic of the file, will be included, thus enabling 71 this file to appear in search results for completely different 72 topics. In any case, the owner of the file (or the WEB server 73 supervisor) - the main person interested in assuring that the 74 information will reach the target audience - has no power in 75 assisting this process. His/her actions are reduced to the very 76 passive role of only providing access to the data. 78 The HTML tags �contents� and �keywords� can only partially 79 alleviate this problem, because many of the HTML files, linked in 80 a document already parsed by a robot, do not contain those tags. 81 In this case the robot should decide whether keywords should be 82 extracted from those files, or they should be disregarded. A 83 search engine may extracts keywords from a file which was not 84 intended to provide such - in this case it will fill its database 85 with worthless data. Similarly, it may neglect files with some 86 important data. In either cases, the owner of the files has very 87 little power in communicating the precise information to the 88 robots. 90 The KP protocol will give the WEB server supervisor and each 91 single user an opportunity for very close control of the 92 information they provide for public access. Each user will be 93 able to edit the exact keywords necessary to describe his/her 94 files. The cornerstone of the suggestion is that all descriptions 95 of the files will be handled by a central and well-known server, 96 which will both increase accuracy and decrease the time necessary 97 to browse a WEB server. 99 iii) Overview of the job done by the KP server. 101 The KP server operates on the client-server paradigm through a 102 reliable TCP/IP byte stream using the ASCII character set. The 103 server performs a listen on a well-known port and when a client 104 requests a connection to that port, the server accepts the 105 connection. Once it is created, the client starts sending 106 commands to the server, which performs the action and returns a 107 response and (when applicable) data. The response of the server 108 may be of either predictable, or unpredictable line length. In 109 case the an unpredictable line length answer, the last line 110 contains only a full stop. Commands and replies are terminated by 111 a new line character (more on the command syntax in part 4.) 113 >From the point of view of the server, there are two types of 114 objects which can contact the server - users and robots (from now 115 on, till the end of the document, I shall refer to a robot as 116 program written only to contact the KP server and update the 117 databases of search engines. Do not confuse with previously 118 mentioned robots, web wanders, crawlers.) The difference between 119 the two is that first, users need to supply password, while the 120 robot does not. Actually, the robot does not present any 121 identification whatsoever, so any person without a user permission 122 could login as a robot. There is no harm taken in this, because 123 the information is meant for public use anyway. Second, users 124 have permission to edit some of the information, while robots have 125 only read only permissions. 127 The KP server needs to handle a single copy of each of following 128 files with the following proposed fields: 130 File: DATA 131 Fields: , , , 133 File: PASSWORD 134 Fields: , 136 File: PATCH 137 Fields: , 139 File: USED_PATCH 140 Fields: 142 and many of the following tables: 144 File: P_1, P_2, P_3, �, P_N 145 Fields: , , , 147 where the meaning of the fields for each table is as follows: 149 DATA 150 a unique record identifier (same for the 151 other tables) 152 the name of the user, to whom the file belongs 153 the actual name of the file 154 a string with keywords, separated by comma 156 PASSWORD 157 same as in table DATA 158 the password for , used at login time. 160 PATCH 161 the name of the previous to last patch file 162 the name of the last patch file 164 USED 165 the name of the patch files, already sent 166 to robots. 168 P_1, P_2, �P_N 169 same as in DATA 170 action to be taken when the patch file is 171 merged into the database. N is for new (add 172 new record with , and 173 like those in the current table), D is for delete 174 (Delete the records with equal to of the 175 current table) 176 same as in DATA 177 same as in DATA 179 File DATA contains information concerning all available files and 180 their respective keywords. When a robot contacts KP server for 181 the first time, it should first download the file DATA using the 182 GETALL command (which sends back all the records of file DATA. 183 Then, the robot can send the command NEXTPATCH a number of times 184 until it records all the changes done to file DATA. The rule for 185 generating a new patch file is simple: whenever a robot visits the 186 last patch file, create a new patch file and use it to store all 187 changes thereafter. 189 Changes are made only by users (see above,) only with the commands 190 ADDFILE and DELETEFILE. Whenever one of these two commands is 191 used, the action taken is stored to the last, unvisited by robots, 192 patch file. Each user can change only the files, which are 193 referred to by DATA as his/her username. 195 iv) Communication with the server. 197 Each command should be terminated with CRLF characters. The space left 198 blank between the commands and the parameters should be considered as 199 white space. CRLF characters and white spaces are not shown explicitely 200 in the description of the commands lest they become too overburdened. 202 iv.i) Communication with remote computers (wanders, robots.) 204 GETALL the server sends to the client all the 205 records with the files and keywords. 206 ACTION is N for all the records. 208 GETPATCH the server sends to the client only 209 the records from the file . 211 NEXTPATCH the server sends to the client only the 212 name of the next patch file. No records 213 from the patch are actually transferred. 214 If is empty, then the return 215 value is the first patch of the whole 216 database. 218 iv.ii) Communication with local computers (those updating the files.) 220 USER the client sends to the server the 221 username of the person who wants to 222 update the database. Username robot is 223 reserved for robots, WEB wanderers and 224 staff. 226 PASS the client sends to the server the 227 password of the user. 229 ADDFILE the client sends to the server 230 a line containing a filename (possibly 231 the URL) and the keywords which should 232 get in the search engines� databases for 233 that file. In case there is already an 234 entry in the server�s database for that 235 file the keywords should be replaced with 236 the new ones. 238 DELETEFILE the server deletes the entry for 239 this file from its database. A user would 240 typically want to do this operation if 241 the file is deleted or moved to a new 242 position. If the last patch file has been 243 sent to at least one robot/wander (or 244 there are no patch files yet), the server 245 should create a new patch file and add 246 the entry in it. 248 LISTLIKE the server sends to the client a 249 list of files matching the specified 250 condition. If the is empty, the 251 server sends all the files 253 LISTMINE the server sends to the client only the 254 files belonging to the user currently 255 logged in. 257 EXACT switch exact string comparison ON/OFF. 258 When exact mode is ON, a string is equal 259 to another only when they have the same 260 sequence of characters. When exact is 261 OFF, a string is equal to another when it 262 is a sub-string of the second. All 263 comparison is case-sensitive. When exact 264 mode is OFF, 266 HELP the server sends a short help message to 267 the client about the command specified. 268 If no command is specified, the server 269 sends the list of all the commands. 271 QUIT request that the connection with the 272 server be terminated. 274 v) Actions taken for each command. 276 USER 278 1) Check if username is �robot�. If yes, then this is a robot. 279 Let it in without asking for password and apply only the commands 280 for robots. If it enters other commands, then send a message 205. 282 2) If the name is not "robot", check for it in the table 283 password. If the name is not found, send a message 210. Else 284 send message 101. 286 PASS 288 1) Check whether user has already logged in. If yes, send a 289 message 204. 290 2) If the user hasn�t logged in yet, check the password sent 291 against the one stored in file password for that user. If 292 different, send 207. Else send 102. 294 ADDFILE 296 1) Add the record to table DATA. 297 2) Send message 103. 298 3) Check if there is already a patch file. 299 * If no patch file exists yet, 300 * add field to table PATCH with fields: DATA, P_1. 301 * Create patch file P_1 and add the field to it. 302 * If a patch file exists, 303 *locate the last patch 304 *If it has been sent to robots, 305 * Add field to table PATCH with: P, P 306 *Create patch file P and add the field to it. 307 *If it hasn�t been sent to robots: 308 * Add the field to the last patch file. 309 4) Send confirmation message 103 311 DELETEFILE 313 1) Locate the file and check that it belongs to the user. If the 314 file is not present in the database, send message 208. If the 315 file does not belong to the user, send message 209. In either of 316 the two cases, goto step 6) 317 2) Delete the record from table DATA. 318 3) Send message 104. 319 4) Follow the same steps, as for ADDFILE step 3. 320 5) Send confirmation message. 321 6) End of DELETEFILE command. 323 LISTLIKE 325 1) Send the user the files matching the pattern. 327 LISTMINE 329 1) Send the user only the files belonging to him/her. 331 EXACT 333 1) Change the comparison mode. 334 2) Send the user the message 105 or 106. 336 HELP 338 1) Send to the user a help for the command. If is empty, 339 send a list of all available commands. 341 GETALL 343 1) Send all records of file DATA. 345 GETPATCH 347 1) Send all records of file . 349 NEXTPATCH 351 2) Locate for next patch file in PATCH table. 352 3) Send message 107. 353 4) Send name of next patch file. 355 vi) Messages. 357 101 +OK Enter password. 358 102 +OK Welcome to KP version 1.0. 359 103 +OK Your file has been added to the database. 360 104 +OK The file has been deleted from the database. 361 105 +OK Exact mode is ON. 362 106 +OK Exact mode is OFF. 363 107 +OK The next patch file is: 364 201 -ERR Unknown command. 365 202 -ERR Command USER expected. 366 203 -ERR Command PASS expected. 367 204 -ERR You have already logged in. 368 205 -ERR Command not allowed for your class. 369 206 -ERR No patch file with this name. 370 207 -ERR Password incorrect. Try again. 371 208 -ERR File ID not found. 372 209 -ERR You have no write permission for this file. 373 210 -ERR User unknown. 375 Author's Contact Informationa 377 Peter Lakov 378 lakov@wizcom.bg 380 INTERNET DRAFT EXPIRES AUGUST 1998 INTERNET DRAFT