idnits 2.17.1 draft-rfced-info-faith-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-18) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 17 instances of too long lines in the document, the longest one being 5 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 84: '... 1.3.2 of [RFC1122] of using the capitalized words MUST, REQUIRED,...' RFC 2119 keyword, line 85: '... SHOULD, RECOMMENDED, MAY, and OPT...' RFC 2119 keyword, line 158: '... Command lines MUST NOT exceed 1024 ...' RFC 2119 keyword, line 211: '... All string parameters MUST conform to...' RFC 2119 keyword, line 215: '...fic text, then there MAY or MAY NOT be...' (23 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 906 has weird spacing: '...gy word mat...' -- The exact meaning of the all-uppercase expression 'MAY NOT' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: If no parameters are present, and the server implementation pro-vides no implementation-specific text, then there MAY or MAY NOT be a space after the response code. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (26 July 1997) is 9763 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'WORD-NET' is mentioned on line 59, but not defined == Missing Reference: 'RFC1122' is mentioned on line 84, but not defined == Missing Reference: 'RFC1854' is mentioned on line 747, but not defined ** Obsolete undefined reference: RFC 1854 (Obsoleted by RFC 2197) == Unused Reference: 'RFC977' is defined on line 1056, but no explicit reference was found in the text == Unused Reference: 'RFC1985' is defined on line 1064, but no explicit reference was found in the text == Unused Reference: 'RFC2068' is defined on line 1071, but no explicit reference was found in the text == Unused Reference: 'WORDNET' is defined on line 1083, but no explicit reference was found in the text ** Obsolete normative reference: RFC 821 (Obsoleted by RFC 2821) ** Obsolete normative reference: RFC 822 (Obsoleted by RFC 2822) ** Obsolete normative reference: RFC 977 (Obsoleted by RFC 3977) ** Obsolete normative reference: RFC 1738 (Obsoleted by RFC 4248, RFC 4266) ** Obsolete normative reference: RFC 1854 (ref. 'RFC1985') (Obsoleted by RFC 2197) ** Obsolete normative reference: RFC 2068 (Obsoleted by RFC 2616) Summary: 18 errors (**), 0 flaws (~~), 9 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT EXPIRES JANUARY 1998 INTERNET-DRAFT 3 Network Working Group R. Faith 4 INTERNET-DRAFT U. North Carolina, Chapel Hill 5 Category: Informational B. Martin 6 Miranda Productions 7 26 July 1997 9 A Dictionary Server Protocol 10 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its areas, 16 and its working groups. Note that other groups may also distribute 17 working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other docu- 21 ments at any time. It is inappropriate to use Internet- Drafts as 22 reference material or to cite them other than as ``work in 23 progress.'' 25 To learn the current status of any Internet-Draft, please check the 26 ``1id-abstracts.txt'' listing contained in the Internet- Drafts 27 Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net 28 (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East 29 Coast), or ftp.isi.edu (US West Coast). 31 Authors' Note 33 [[This document has not yet been submitted or accepted as an offi- 34 cial RFC. Two independent server implementations have been com- 35 pleted, one at dict://dict.miranda.org:2628 and the other at 36 dict://proteus.cs.unc.edu:2628. This note should be deleted when 37 this memo is assigned an RFC number.]] 39 Abstract 41 The Dictionary Server Protocol (DICT) is a TCP transaction based 42 query/response protocol that allows a client to access dictionary 43 definitions from a set of natural language dictionary databases. 45 1. Introduction 47 For many years, the Internet community has relied on the "webster" 48 protocol for access to natural language definitions. The webster 49 protocol supports access to a single dictionary and (optionally) to 50 a single thesaurus. In recent years, the number of publicly avail- 51 able webster servers on the Internet has dramatically decreased. 53 Fortunately, several freely-distributable dictionaries and lexicons 54 have recently become available on the Internet. However, these 55 freely-distributable databases are not accessible via a uniform 56 interface, and are not accessible from a single site. They are 57 often small and incomplete individually, but would collectively 58 provide an interesting and useful database of English words. Exam- 59 ples include the Jargon file [JARGON], the WordNet database [WORD- 60 NET], MICRA's version of the 1913 Webster's Revised Unabridged Dic- 61 tionary [WEB1913], and the Free Online Dictionary of Computing 62 [FOLDOC]. Translating and non-English dictionaries are also becom- 63 ing available (for example, the FOLDOC dictionary is being trans- 64 lated into Spanish). 66 The webster protocol is not suitable for providing access to a 67 large number of separate dictionary databases, and extensions to 68 the current webster protocol were not felt to be a clean solution 69 to the dictionary database problem. 71 The DICT protocol is designed to provide access to multiple 72 databases. Word definitions can be requested, the word index can 73 be searched (using an easily extended set of algorithms), informa- 74 tion about the server can be provided (e.g., which index search 75 strategies are supported, or which databases are available), and 76 information about a database can be provided (e.g., copyright, 77 citation, or distribution information). Further, the DICT protocol 78 has hooks that can be used to restrict access to some or all of the 79 databases. 81 1.1. Requirements 83 In this document, we adopt the convention discussed in Section 84 1.3.2 of [RFC1122] of using the capitalized words MUST, REQUIRED, 85 SHOULD, RECOMMENDED, MAY, and OPTIONAL to define the significance 86 of each particular requirement specified in this document. 88 In brief: "MUST" (or "REQUIRED") means that the item is an absolute 89 requirement of the specification; "SHOULD" (or "RECOMMENDED") means 90 there may exist valid reasons for ignoring this item, but the full 91 implications should be understood before doing so; and "MAY" (or 92 "OPTIONAL") means that his item is optional, and may be omitted 93 without careful consideration. 95 2. Protocol Overview 97 2.1. Link Level 99 The DICT protocol assumes a reliable data stream such as provided 100 by TCP. When TCP is used, a DICT server listens on port 2628 (typ- 101 ically, webster servers listened on port 2627). 103 This server is only an interface between programs and the dictio- 104 nary databases. It does not perform any user interaction or pre- 105 sentation-level functions. 107 2.2. Lexical Tokens 109 Commands and replies are composed of characters from the ISO-8859-1 110 character set [ISO]. More specifically, using the grammar conven- 111 tions from [RFC822]: 113 ; ( Octal, Decimal.) 114 CHAR = ; ( 0-177, 0.-127.) 115 CTL = ; ( 177, 127.) 117 CR = ; ( 15, 13.) 118 LF = ; ( 12, 10.) 119 SPACE = ; ( 40, 32.) 120 HTAB = ; ( 11, 9.) 121 <"> = ; ( 42, 34.) 122 <'> = ; ( 47, 39.) 123 CRLF = CR LF 124 WS = 1*(SPACE / HTAB) 126 dqstring = <"> *(dqtext/quoted-pair) <"> 127 dqtext = , "\", and CTLs> 128 sqstring = <'> *(dqtext/quoted-pair) <'> 129 sqtext = , "\", and CTLs> 130 quoted-pair = "\" CHAR 132 atom = 1*, <">, and "\"> 133 string = * 134 word = * 135 description = * 136 text = * 138 2.3. Commands 140 Commands consist of a command word followed by zero or more parame- 141 ters. Commands with parameters must separate the parameters from 142 each other and from the command by one or more space or tab charac- 143 ters. Command lines must be complete with all required parameters, 144 and may not contain more than one command. 146 Each command line must be terminated by a CRLF. 148 The grammar for commands is: 150 command = cmd-word * 151 cmd-word = atom 152 cmd-param = database / strategy / word 153 database = atom 154 strategy = atom 156 Commands are not case sensitive. 158 Command lines MUST NOT exceed 1024 characters in length, counting 159 all characters including spaces, separators, punctuation, and the 160 trailing CRLF. There is no provision for the continuation of com- 161 mand lines. 163 2.4. Responses 165 Responses are of two kinds, status and textual. 167 2.4.1. Status Responses 169 Status responses indicate the server's response to the last command 170 received from the client. 172 Status response lines begin with a 3 digit numeric code which is 173 sufficient to distinguish all responses. Some of these may herald 174 the subsequent transmission of text. 176 The first digit of the response broadly indicates the success, 177 failure, or progress of the previous command (based generally on 178 [RFC640,RFC821]): 180 1yz - Positive Preliminary reply 181 2yz - Positive Completion reply 182 3yz - Positive Intermediate reply (not used by DICT) 183 4yz - Transient Negative Completion reply 184 5yz - Permanent Negative Completion reply 186 The next digit in the code indicates the response category: 188 x0z - Syntax 189 x1z - Information (e.g., help) 190 x2z - Connections 191 x3z - Authentication 192 x4z - Unspecified as yet 193 x5z - DICT System (These replies indicate the status of the 194 receiver mail system vis-a-vis the requested transfer 195 or other DICT system action.) 196 x8z - Nonstandard (private implementation) extensions 198 The exact response codes that should be expected from each command 199 are detailed in the description of that command. 201 Certain status responses contain parameters such as numbers and 202 strings. The number and type of such parameters is fixed for each 203 response code to simplify interpretation of the response. Other 204 status responses do not require specific text identifiers. Parame- 205 ter requirements are detailed in the description of relevant com- 206 mands. Except for specifically detailed parameters, the text fol- 207 lowing response codes is server-dependent. 209 Parameters are separated from the numeric response code and from 210 each other by a single space. All numeric parameters are decimal, 211 and may have leading zeros. All string parameters MUST conform to 212 the "atom" or "dqstring" grammar productions. 214 If no parameters are present, and the server implementation pro- 215 vides no implementation-specific text, then there MAY or MAY NOT be 216 a space after the response code. 218 Response codes not specified in this standard may be used for any 219 installation-specific additional commands also not specified. 220 These should be chosen to fit the pattern of x8z specified above. 221 The use of unspecified response codes for standard commands is pro- 222 hibited. 224 2.4.2. General Status Responses 226 In response to every command, the following general status 227 responses are possible: 229 500 Syntax error, command not recognized 230 501 Syntax error, illegal parameters 231 502 Command not implemented 232 503 Command parameter not implemented 233 420 Server temporarily unavailable 234 421 Server shutting down at operator request 236 2.4.3. Text Responses 238 Before text is sent a numeric status response line, using a 1yz 239 code, will be sent indicating text will follow. Text is sent as a 240 series of successive lines of textual matter, each terminated with 241 a CRLF. A single line containing only a period (decimal code 46, 242 ".") is sent to indicate the end of the text (i.e., the server will 243 send a CRLF at the end of the last line of text, a period, and 244 another CRLF). 246 If a line of original text contained a period as the first charac- 247 ter of the line, that first period is doubled by the DICT server. 248 Therefore, the client must examine the first character of each line 249 received. Those that begin with two periods must have those two 250 periods collapsed into one period. Those that contain only a sin- 251 gle period followed by a CRLF indicate the end of the text 252 response. 254 Following a text response, a 2yz response code will be sent. 256 Text lines MUST NOT exceed 1024 characters in length, counting all 257 characters including spaces, separators, punctuation, the extra 258 initial period (if needed), and the trailing CRLF. 260 It is recommended that text use the US-ASCII [ASCII] or ISO-8859-1 261 [ISO] character sets, although it is currently beyond the scope of 262 this standard to specify encoding for text. In the future, after 263 significant experience with large databases in various languages 264 has been gained, and after evaluating the need for character set 265 and other encodings (e.g., compressed or BASE64 encoding), standard 266 extensions to this protocol should be proposed. In the mean time, 267 private extensions should be used to explore the parameter space to 268 determine how best to implement these extensions. 270 3. Command and Response Details 272 Below, each DICT command and appropriate responses are detailed. 273 Each command is shown in upper case for clarity, but the DICT 274 server is case-insensitive. 276 Except for the AUTH command, every command described in this sec- 277 tion MUST be implemented by all DICT servers. 279 3.1. Initial Connection 281 When a client initially connects to a DICT server, a code 220 is 282 sent if the client's IP is allowed to connect: 284 220 text capabilities msg-id 286 The code 220 is a banner, usually containing host name and DICT 287 server version information. 289 The second-to-last sequence of characters in the banner is the 290 optional capabilities string, which will allow servers to declare 291 support for extensions to the DICT protocol. The capabilities 292 string is defined below: 294 capabilities = ["<" msg-atom *("." msg-atom) ">"] 295 msg-atom = 1*", ".", and "\"> 298 Individual capabilities are described by a single msg-atom. For 299 example, the string might be used to describe a server 300 that supports extensions which allow HTML or compressed output. 301 Capability names beginning with "x" or "X" are reserved for experi- 302 mental extensions, and SHOULD NOT be defined in any future DICT 303 protocol specification. 305 The last sequence of characters in the banner is a msg-id, similar 306 to the format specified in [RFC822]. The simplified description is 307 given below: 309 msg-id = "<" spec ">" ; Unique message id 310 spec = local-part "@" domain 311 local-part = msg-atom *("." msg-atom) 312 domain = msg-atom *("." msg-atom) 314 Note that, in contrast to [RFC822], spaces and quoted pairs are not 315 allowed in the msg-id. This restriction makes the msg-id much eas- 316 ier for the client to locate and parse but does not significantly 317 decrease any security benefits, since the msg-id may be arbitrarily 318 long (as bounded by the response length limits set forth elsewhere 319 in this document). 321 Note also that the open and close brackets are part of the msg-id 322 and should be included in the string that is used to compute the 323 MD5 checksum. 325 This message id will be used by the client when formulating the 326 authentication string used in the AUTH command. 328 If the client's IP is not allowed to connect, then a code 530 is 329 sent instead: 331 530 Access denied 333 Transient failure responses are also possible: 335 420 Server temporarily unavailable 336 421 Server shutting down at operator request 338 For example, response code 420 should be used if the server cannot 339 currently fork a server process (or cannot currently obtain other 340 resources required to proceed with a usable connection), but 341 expects to be able to fork or obtain these resources in the near 342 future. 344 Response code 421 should be used when the server has been shut down 345 at operator request, or when conditions indicate that the ability 346 to service more requests in the near future will be impossible. 347 This may be used to allow a graceful operator-mediated temporary 348 shutdown of a server, or to indicate that a well known server has 349 been permanently removed from service (in which case, the text mes- 350 sage might provide more information). 352 3.2. The DEFINE Command 354 DEFINE database word 356 3.2.1. Description 358 This command will look up the specified word in the specified 359 database. All DICT servers MUST implement this command. 361 If the database name is specified with an exclamation point (deci- 362 mal code 33, "!"), then all of the databases will be searched until 363 a match is found, and all matches in that database will be dis- 364 played. If the database name is specified with a star (decimal 365 code 42, "*"), then all of the matches in all available databases 366 will be displayed. In both of these special cases, the databases 367 will be searched in the same order as that printed by the "SHOW DB" 368 command. 370 If the word was not found, then status code 552 is sent. 372 If the word was found, then status code 150 is sent, indicating 373 that one or more definitions follow. 375 For each definition, status code 151 is sent, followed by the tex- 376 tual body of the definition. The first three space-delimited 377 parameters following status code 151 give the word retrieved, the 378 name of the database (which is the same as the first column of the 379 SHOW DB command), and a short description for the database (which 380 is the same as the second column of the SHOW DB command). The 381 short name is suitable for printing as: 383 From name: 385 before the definition is printed. This provides source information 386 for the user. 388 The textual body of each definition is terminated with a CRLF 389 period CRLF sequence. 391 After all of the definitions have been sent, status code 250 is 392 sent. This command can provide optional timing information (which 393 is server dependent and is not intended to be parsable by the 394 client). This additional information is useful when debugging and 395 tuning the server. 397 3.2.2. Responses 399 550 Invalid database, use "SHOW DB" for list of databases 400 552 No match 401 150 n definitions retrieved - definitions follow 402 151 word database name - text follows 403 250 ok (optional timing information here) 405 Response codes 150 and 151 require special parameters as part of 406 their text. The client can use these parameters to display infor- 407 mation on the user's terminal. 409 For code 150, parameters 1 indicates the number of definitions 410 retrieved. 412 For code 151, parameter 1 is the word retrieved, parameter 2 is the 413 database name (the first name as shown by "SHOW DB") from which the 414 definition has been retrieved, and parameter 3 is the the short 415 database description (the second column of the "SHOW DB" command). 417 3.2.3. A Note on Virtual Databases 419 The ability to search all of the provided databases using a single 420 command is given using the special "*" and "!" databases. 422 However, sometimes, a client may want to search over some but not 423 all of the databases that a particular server provides. One alter- 424 native is for the client to use the SHOW DB command to obtain a 425 list of databases and descriptions, and then (perhaps with the help 426 of a human), select a subset of these databases for an interative 427 search. Once this selection has been done once, the results can be 428 saved, for example, in a client configuration file. 430 Another alternative is for the server to provide "virtual" 431 databases which merge several of the regular databases into one. 432 For example, a virtual database may be provided which includes all 433 of the translating dictionaries, but which does not include regular 434 dictionaries or thesauri. The special "*" and "!" databases can be 435 considered as names of virtual databases which provide access to 436 all of the databases. If a server implements virtual databases, 437 then the special "*" and "!" databases should probably exclude 438 other virtual databases (since they merely provide information 439 duplicated in other databases). If virtual databases are sup- 440 ported, they should be listed as a regular database with the SHOW 441 DB command (although, since "*" and "!" are required, they need not 442 be listed). 444 Virtual databases are an implementation-specific detail which has 445 absolutely no impact on the DICT protocol. The DICT protocol views 446 virtual and non-virtual databases the same way. 448 We mention virtual databases here, however, because they solve a 449 problem of database selection which could also have been solved by 450 changes in the protocol. For example, each dictionary could be 451 assigned attributes, and the protocol could be extended to specify 452 searches over databases with certain attributes. However, this 453 needlessly complicates the parsing and analysis that must be per- 454 formed by the implementation. Further, unless the classification 455 system is extremely general, there is a risk that it would restrict 456 the types of databases that can be used with the DICT protocol 457 (although the protocol has been designed with human-language 458 databases in mind, it is applicable to any read-only database 459 application, especially those with a single semi-unique alphanu- 460 meric key and textual data). 462 3.3. The MATCH Command 464 MATCH database strategy word 466 3.3.1. Description 468 This command searches an index for the dictionary, and reports 469 words which were found using a particular strategy. Not all 470 strategies are useful for all dictionaries, and some dictionaries 471 may support additional search strategies (e.g., reverse lookup). 472 All DICT servers MUST implement the MATCH command, and MUST support 473 the "exact" and "prefix" strategies. These are easy to implement 474 and are generally the most useful. Other strategies are server 475 dependent. 477 The "exact" strategy matches a word exactly, although different 478 servers may treat non-alphanumeric data differently. We have found 479 that a case-insensitive comparison which ignores non-alphanumeric 480 characters and which folds whitespace is useful for English-lan- 481 guage dictionaries. Other comparisons may be more appropriate for 482 other languages or when using extended character sets. 484 The "prefix" strategy is similar to "exact", except that it only 485 compares the first part of the word. 487 Different servers may implement these algorithms differently. The 488 requirement is that strategies with the names "exact" and "prefix" 489 exist so that a simple client can use them. 491 Other strategies that might be considered by a server implementor 492 are matches based on substring, suffix, regular expressions, 493 soundex [KNUTH73], and Levenshtein [PZ85] algorithms. These last 494 two are especially useful for correcting spelling errors. Other 495 useful strategies perform some sort of "reverse" lookup (i.e., by 496 searching definitions to find the word that the query suggests). 498 If the database name is specified with an exclamation point (deci- 499 mal code 33, "!"), then all of the databases will be searched until 500 a match is found, and all matches in that database will be dis- 501 played. If the database name is specified with a star (decimal 502 code 42, "*"), then all of the matches in all available databases 503 will be displayed. In both of these special cases, the databases 504 will be searched in the same order as that printed by the "SHOW DB" 505 command. 507 If the strategy is specified using a period (decimal code 46, "."), 508 then the word will be matched using a server-dependent default 509 strategy, which should be the best strategy available for interac- 510 tive spell checking. This is usually a derivative of the Leven- 511 shtein algorithm [PZ85]. 513 If no matches are found in any of the searched databases, then sta- 514 tus code 552 will be returned. 516 Otherwise, status code 152 will be returned followed by a list of 517 matched words, one per line, in the form: 519 database word 521 This makes the responses directly useful in a DEFINE command. 523 The textual body of the match list is terminated with a CRLF period 524 CRLF sequence. 526 Following the list, status code 250 is sent, which may include 527 server-specific timing and statistical information, as discussed in 528 the section on the DEFINE command. 530 3.3.2. Responses 532 550 Invalid database, use "SHOW DB" for list of databases 533 551 Invalid strategy, use "SHOW STRAT" for a list of strategies 534 552 No match 535 152 n matches found - text follows 536 250 ok (optional timing information here) 538 Response code 152 requires a special parameter as part of its text. 539 Parameter 1 must be the number of matches retrieved. 541 3.4. The SHOW Command 543 3.4.1. SHOW DB 545 SHOW DB 546 SHOW DATABASES 548 3.4.1.1. Description 550 Displays the list of currently accessible databases, one per line, 551 in the form: 553 database description 555 The textual body of the database list is terminated with a CRLF 556 period CRLF sequence. All DICT servers MUST implement this com- 557 mand. 559 Note that some databases may be restricted due to client domain or 560 lack of user authentication (see the AUTH command). Information 561 about these databases is not available until authentication is per- 562 formed. Until that time, the client will interact with the server 563 as if the additional databases did not exist. 565 3.4.1.2. Responses 567 110 n databases present - text follows 568 554 No databases present 570 Response code 110 requires a special parameter. Parameter 1 must 571 be the number of databases available to the user. 573 3.4.2. SHOW STRAT 575 SHOW STRAT 576 SHOW STRATEGIES 578 3.4.2.1. Description 580 Displays the list of currently supported search strategies, one per 581 line, in the form: 583 strategy description 585 The textual body of the strategy list is terminated with a CRLF 586 period CRLF sequence. All DICT servers MUST implement this 587 command. 589 3.4.2.2. Responses 591 111 n strategies available - text follows 592 555 No strategies available 594 Response code 111 requires a special parameter. Parameter 1 must 595 be the number of strategies available. 597 3.4.3. SHOW INFO 599 SHOW INFO database 601 3.4.3.1. Description 603 Displays the source, copyright, and licensing information about the 604 specified database. The information is free-form text and is suit- 605 able for display to the user in the same manner as a definition. 606 The textual body of the information is terminated with a CRLF 607 period CRLF sequence. All DICT servers MUST implement this com- 608 mand. 610 3.4.3.2. Responses 612 550 Invalid database, use "SHOW DB" for list of databases 613 112 database information follows 615 These response codes require no special parameters. 617 3.4.4. SHOW SERVER 619 SHOW SERVER 621 3.4.4.1. Description 623 Displays local server information written by the local administra- 624 tor. This could include information about local databases or 625 strategies, or administrative information such as who to contact 626 for access to databases requiring authentication. All DICT servers 627 MUST implement this command. 629 3.4.4.2. Responses 631 114 server information follows 633 This response code requires no special parameters. 635 3.5. The CLIENT Command 637 CLIENT text 639 3.5.1. Description 641 This command allows the client to provide information about itself 642 for possible logging and statistical purposes. All clients SHOULD 643 send this command after connecting to the server. All DICT servers 644 MUST implement this command (note, though, that the server doesn't 645 have to do anything with the information provided by the client). 647 3.5.2. Responses 649 250 ok (optional timing information here) 651 This response code requires no special parameters. 653 3.6. The STATUS Command 655 STATUS 657 3.6.1. Description 659 Display some server-specific timing or debugging information. This 660 information may be useful in debugging or tuning a DICT server. 661 All DICT servers MUST implement this command (note, though, that 662 the text part of the response is not specified and may be omitted). 664 3.6.2. Responses 666 210 (optional timing and statistical information here) 668 This response code requires no special parameters. 670 3.7. The HELP Command 672 HELP 674 3.7.1. Description 676 Provides a short summary of commands that are understood by this 677 implementation of the DICT server. The help text will be presented 678 as a textual response, terminated by a single period on a line by 679 itself. All DICT servers MUST implement this command. 681 3.7.2. Responses 683 113 help text follows 685 This response code requires no special parameters. 687 3.8. The QUIT Command 689 QUIT 691 3.8.1. Description 693 This command is used by the client to cleanly exit the server. All 694 DICT servers MUST implement this command. 696 3.8.2. Responses 698 221 Closing Connection 700 This response code requires no special parameters. 702 3.9. The AUTH Command 704 AUTH username authentication-string 706 3.9.1. Description 708 The client can authenticate itself to the server using a username 709 and password. The authentication-string will be computed as in the 710 APOP protocol discussed in [RFC1939]. Briefly, the authentication- 711 string is the MD5 checksum of the concatenation of the msg-id 712 (obtained from the initial banner) and the "shared secret" that is 713 stored in the server and client configuration files. Since the 714 user does not have to type this shared secret when accessing the 715 server, the shared secret can be an arbitrarily long passphrase. 716 Because of the computational ease of computing the MD5 checksum, 717 the shared secret should be significantly longer than a usual pass- 718 word. 720 Authentication may make more dictionary databases available for the 721 current session. For example, there may be some publicly dis- 722 tributable databases available to all users, and other private 723 databases available only to authenticated users. Or, a server may 724 require authentication from all users to minimize resource utiliza- 725 tion on the server machine. 727 Authentication is an optional server capability. The AUTH command 728 MAY be implemented by a DICT server. 730 3.9.2. Responses 732 230 Authentication successful 733 531 Access denied, use "SHOW INFO" for server information 735 These response codes require no special parameters. 737 4. Command Pipelining 739 All DICT servers MUST be able to accept multiple commands in a sin- 740 gle TCP send operation. Using a single TCP send operation for mul- 741 tiple commands can improved DICT performance significantly, espe- 742 cially in the face of high latency network links. 744 The possible implementation problems for a DICT server which would 745 prevent command pipelining are similar to the problems that prevent 746 pipelining in an SMTP server. These problems are discussed in 747 detail in [RFC1854], which should be consulted by all DICT server 748 implementors. 750 The main implication is that a DICT server implementation MUST NOT 751 flush or otherwise lose the contents of the TCP input buffer under 752 any circumstances whatsoever. 754 A DICT client may pipeline several commands and must check the 755 responses to each command individually. If the server has shut 756 down, it is possible that all of the commands will not be pro- 757 cessed. For example, a simple DICT client may pipeline a CLIENT, 758 DEFINE, and QUIT command sequence as it is connecting to the 759 server. If the server is shut down, the initial response code sent 760 by the server may be 420 (temporarily unavailable) instead of 220 761 (banner). In this case, the definition cannot be retrieved, and 762 the client should report and error or retry the command. If the 763 server is working, it may be able to send back the banner, defini- 764 tion, and termination message in a single TCP send operation. 766 5. URL Specification 768 The DICT URL scheme is used to refer to definitions or word lists 769 available using the DICT protocol: 771 dict://:@:/d::: 772 dict://:@:/m:::: 774 The "/d" syntax specifies the DEFINE command (see section 3.2), 775 whereas the "/m" specifies the MATCH command (section 3.3). 777 Some or all of ":@", ":", "", 778 "", and "" may be omitted. 780 "" will usually be omitted, but when included, it specifies the 781 nth definition or match of a word. A method for extracting exactly 782 this information from the server is not avaiable using the DICT 783 protocol. However, a client using the URL specification could 784 obtain all of the definitions or matches, and then select the one 785 that is specified. 787 If ":@" is omitted, no authentication is done. 788 If ":" is omitted, the default port (2628) SHOULD be used. 789 If "" is omitted, "!" SHOULD be used (see section 3.2.1). 790 If "" is omitted, "." SHOULD be used (see section 3.3.1). 792 Trailing colons may be omitted. For example, the following URLs 793 might specify definitions or matches: 795 dict://dict.org/d:shortcake: 796 dict://dict.org/d:shortcake:* 797 dict://dict.org/d:shortcake:wordnet: 798 dict://dict.org/d:shortcake:wordnet:1 799 dict://dict.org/d:abcdefgh 800 dict://dict.org/d:sun 801 dict://dict.org/d:sun::1 803 dict://dict.org/m:sun 804 dict://dict.org/m:sun::soundex 805 dict://dict.org/m:sun:wordnet::1 806 dict://dict.org/m:sun::soundex:1 807 dict://dict.org/m:sun::: 809 See [RFC1738] for the specification of Uniform Resource Locators. 811 6. Extensions 813 This protocol was designed so that flat text databases can be used 814 with a server after a minimum of analysis and formatting. Our 815 experience is that merely constructing an index for a database may 816 be sufficient to make it useful with a DICT server. The ability to 817 serve preformatted text is especially important since freely-avail- 818 able databases are often distributed as flat text files without any 819 semantic mark-up information (and often contain "ASCII art" which 820 precludes the automation of even simple formatting). 822 However, given a database with sufficient mark-up information, it 823 may be possible to generate output in a variety of different for- 824 mats (e.g., simple HTML or more sophisticated SGML). The specifi- 825 cation of formatting is beyond the scope of this document. The 826 requirements for negotiation of format (including character set and 827 other encodings) is complex and should be examined over time as 828 more experience is gained. We suggest that the use of different 829 formats, as well as other server features, be explored as exten- 830 sions to the protocol. 832 6.1. Experimental Command Syntax 834 Single-letter commands are reserved for debugging and testing, 835 SHOULD NOT be defined in any future DICT protocol specification, 836 and MUST NOT be used by any client software. 838 Commands beginning with the letter "X" are reserved for experimen- 839 tal extensions, and SHOULD NOT be defined in any future DICT proto- 840 col specification. Authors of client software should understand 841 that these commands are not part of the DICT protocol and may not 842 be available on all DICT servers. 844 6.2. Experimental Commands and Pipelining 846 Experimental commands should be designed so that a client can 847 pipeline the experimental commands without knowing if a server sup- 848 ports the commands (e.g., instead of using feature negotiation). 849 If the server does not support the commands, then a response code 850 of 500 will be given, notifying the client that the extension is 851 not supported. Of course, depending on the complexity of the 852 extensions added, feature negotiation may be necessary. To help 853 minimize negotiation time, server-supported features may be 854 announced in the banner (code 220) using the optional capabilities 855 parameter. 857 7. Summary of Response Codes 858 Below is a summary of response codes. A star (*) in the first col- 859 umn indicates the response has defined arguments that must be pro- 860 vided. 862 * 110 n databases present - text follows 863 * 111 n strategies available - text follows 864 112 database information follows 865 113 help text follows 866 114 server information follows 867 * 150 n definitions retrieved - definitions follow 868 * 151 word database name - text follows 869 * 152 n matches found - text follows 870 210 (optional timing and statistical information here) 871 * 220 text msg-id 872 221 Closing Connection 873 230 Authentication successful 874 250 ok (optional timing information here) 875 420 Server temporarily unavailable 876 421 Server shutting down at operator request 877 500 Syntax error, command not recognized 878 501 Syntax error, illegal parameters 879 502 Command not implemented 880 503 Command parameter not implemented 881 530 Access denied 882 531 Access denied, use "SHOW INFO" for server information 883 550 Invalid database, use "SHOW DB" for list of databases 884 551 Invalid strategy, use "SHOW STRAT" for a list of strategies 885 552 No match 886 554 No databases present 887 555 No strategies available 889 8. Sample Conversations 891 Theses are samples of the conversations that might be expected with 892 a typical DICT server. The notation "C:" indicates commands set by 893 the client, and "S:" indicates responses sent by the server. Blank 894 lines are included for clarity and do not indicate actual newlines 895 in the transaction. 897 8.1. Sample 1 - opening connection, HELP, DEFINE, and QUIT commands 898 C: [ client initiates connection ] 900 S: 220 dict.org dictd (version 0.9) <27831.860032493@dict.org> 902 C: HELP 904 S: 113 Help text follows 905 S: DEFINE database word look up word in database 906 S: MATCH database strategy word match word in database using strategy 907 S: [ more server-dependent help text ] 908 S: . 909 S: 250 Command complete 911 C: DEFINE ! penguin 913 S: 150 1 definitions found: list follows 914 S: 151 "penguin" wn "WordNet 1.5" : definition text follows 915 S: penguin 916 S: 1. n: short-legged flightless birds of cold southern esp. Antarctic 917 S: regions having webbed feet and wings modified as flippers 918 S: . 919 S: 250 Command complete 921 C: DEFINE * shortcacke 923 S: 150 2 definitions found: list follows 924 S: 151 "shortcake" wn "WordNet 1.5" : text follows 925 S: shortcake 926 S: 1. n: very short biscuit spread with sweetened fruit and usu. 927 S: whipped cream 928 S: . 929 S: 151 "Shortcake" web1913 "Webster's Dictionary (1913)" : text follows 930 S: Shortcake 931 S: \Short"cake`\, n. 932 S: An unsweetened breakfast cake shortened with butter or lard, 933 S: rolled thin, and baked. 934 S: . 935 S: 250 Command complete 936 C: DEFINE abcdefgh 938 S: 552 No match 940 C: quit 942 S: 221 Closing connection 944 8.2. Sample 2 - SHOW commands, MATCH command 946 C: SHOW DB 948 S: 110 3 databases present: list follows 949 S: wn "WordNet 1.5" 950 S: foldoc "Free On-Line Dictionary of Computing" 951 S: jargon "Hacker Jargon File" 952 S: . 953 S: 250 Command complete 955 C: SHOW STRAT 957 S: 111 5 strategies present: list follows 958 S: exact "Match words exactly" 959 S: prefix "Match word prefixes" 960 S: substring "Match substrings anywhere in word" 961 S: regex "Match using regular expressions" 962 S: reverse "Match words given definition keywords" 963 S: . 964 S: 250 Command complete 965 C: MATCH foldoc regex "s.si" 967 S: 152 7 matches found: list follows 968 S: foldoc Fast SCSI 969 S: foldoc SCSI 970 S: foldoc SCSI-1 971 S: foldoc SCSI-2 972 S: foldoc SCSI-3 973 S: foldoc Ultra-SCSI 974 S: foldoc Wide SCSI 975 S: . 976 S: 250 Command complete 978 C: MATCH wn substring "abcdefgh" 980 S: 552 No match 982 8.3. Sample 3 - Server downtime 984 C: [ client initiates connection ] 986 S: 420 Server temporarily unavailable 988 C: [ client initiates connection ] 990 S: 421 Server shutting down at operator request 992 8.4. Sample 4 - Authentication 994 C: [ client initiates connection ] 996 S: 220 dict.org dictd (version 0.9) <27831.860032493@dict.org> 997 C: SHOW DB 999 S: 110 1 database present: list follows 1000 S: free "Free database" 1001 S: . 1002 S: 250 Command complete 1004 C: AUTH joesmith authentication-string 1006 S: 230 Authentication successful 1008 C: SHOW DB 1010 S: 110 2 databases present: list follows 1011 S: free "Free database" 1012 S: licensed "Local licensed database" 1013 S: . 1014 S: 250 Command complete 1016 9. Security Considerations 1018 This RFC raises no security issues. 1020 10. References 1022 [ASCII] US-ASCII. Coded Character Set - 7-Bit American Standard 1023 Code for Information Interchange. Standard ANSI X3.4-1986, 1024 ANSI, 1986. 1026 [FOLDOC] Howe, Denis, ed. The Free On-Line Dictionary of Comput- 1027 ing, 1029 [ISO] ISO-8859. International Standard -- Information Processing -- 1030 8-bit Single-Byte Coded Graphic Character Sets -- Part 1: 1031 Latin alphabet No. 1, ISO 8859-1:1987. 1033 [JARGON] The on-line hacker Jargon File, version 4.0.0, 25 JUL 1034 1996, 1036 [KNUTH73] Knuth, Donald E. "The Art of Computer Programming", Vol- 1037 ume 3: Sorting and Searching (Addison-Wesley Publishing Co., 1038 1973, pages 391 and 392). Knuth notes that the soundex method 1039 was originally described by Margaret K. Odell and Robert C. 1040 Russell [US Patents 1261167 (1918) and 1435663 (1922)]. 1042 [PZ85] Pollock, Joseph J. and Zamora, Antonio, "Automatic spelling 1043 correction in scientific and scholarly text," CACM, 27(4): 1044 Apr. 1985, 358-368. 1046 [RFC640] Postel, J., "Revised FTP Reply Codes", RFC-640, June, 1047 1975. 1049 [RFC821] Postel, J., "Simple Mail Transfer Protocol", RFC-821, 1050 USC/Information Sciences Institute, August, 1982. 1052 [RFC822] Crocker, D., "Standard for the Format of ARPA Internet 1053 Text Messages", RFC-822, Department of Electrical Engineering, 1054 University of Delaware, August, 1982. 1056 [RFC977] Kantor, B., Lapsley, P., "Network News Transfer Protocol: 1057 A Proposed Standard for the Stream-Based Transmission of 1058 News", RFC-977, U.C. San Diego, U.C. Berkeley, February, 1986. 1060 [RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform 1061 Resource Locators (URL)", RFC-1738, CERN, Xerox PARC, Univer- 1062 sity of Minnesota, December 1994. 1064 [RFC1985] Freed, N., and Cargille, A., "SMTP Service Extension for 1065 Command Pipelining", RFC-1854, Innosoft International, Inc., 1066 and Network Working Group, October 1995. 1068 [RFC1939] Myers, J., Rose, M., "Post Office Protocol - Version 3", 1069 RFC-1939, Carnegie Mellon/Dover Beach Consulting, May, 1996. 1071 [RFC2068] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Bern- 1072 ers-Lee, T., "Hypertext Transfer Protocol -- HTTP/1.1", 1073 RFC-2068, U.C. Irvine, DEC, MIT/LCS, January, 1997. 1075 [WEB1913] Webster's Revised Unabridged Dictionary (G & C. Merriam 1076 Co., 1913, edited by Noah Porter). Online version prepared by 1077 MICRA, Inc., Plainfield, N.J. and edited by Patrick Cassidy 1078 . For further information, see 1079 , and 1083 [WORDNET] Miller, G.A. (1990), ed. WordNet: An On-Line Lexical 1084 Database. International Journal of Lexicography. Volume 3, 1085 Number 4. 1087 11. Acknowledgements 1089 Thanks to Arnt Gulbrandsen and Nicolai Langfeldt for many helpful 1090 discussions. Thanks to Bennet Yee, Doug Hoffman, Kevin Martin, and 1091 Jay Kominek for extensive testing and feedback on the initial 1092 implementations of the DICT server. Thanks to Zhong Shao for 1093 advice and support. 1095 Thanks to Brian Kanto, Phil Lapsley, and Jon Postel for writing 1096 exemplary RFCs which were consulted during the preparation of this 1097 document. 1099 12. Author's Addresses 1101 Rickard E. Faith 1102 EMail: faith@cs.unc.edu (or faith@acm.org) 1104 Bret Martin 1105 EMail: bamartin@miranda.org 1107 The majority of this work was completed while Bret Martin was 1108 a student at Yale University.