idnits 2.17.1 

draft-rfced-info-faith-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-18) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 17 instances of too long lines in the document, the longest
     one being 5 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 84: '...     1.3.2 of [RFC1122] of using the capitalized words MUST, REQUIRED,...'
     RFC 2119 keyword, line 85: '...     SHOULD, RECOMMENDED, MAY, and OPT...'
     RFC 2119 keyword, line 158: '...   Command lines MUST NOT exceed 1024 ...'
     RFC 2119 keyword, line 211: '... All string parameters MUST conform to...'
     RFC 2119 keyword, line 215: '...fic text, then there MAY or MAY NOT be...'
     (23 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 906 has weird spacing: '...gy word    mat...'

  -- The exact meaning of the all-uppercase expression 'MAY NOT' is not
     defined in RFC 2119.  If it is intended as a requirements expression, it
     should be rewritten using one of the combinations defined in RFC 2119;
     otherwise it should not be all-uppercase.

  == The expression 'MAY NOT', while looking like RFC 2119 requirements text,
     is not defined in RFC 2119, and should not be used.  Consider using 'MUST
     NOT' instead (if that is what you mean).
     
     Found 'MAY NOT' in this paragraph:
     
     If no parameters are present, and the server implementation
     pro-vides no implementation-specific text, then there MAY or MAY NOT be a
     space after the response code.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (26 July 1997) is 9763 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'WORD-NET' is mentioned on line 59, but not defined

  == Missing Reference: 'RFC1122' is mentioned on line 84, but not defined

  == Missing Reference: 'RFC1854' is mentioned on line 747, but not defined

  ** Obsolete undefined reference: RFC 1854 (Obsoleted by RFC 2197)

  == Unused Reference: 'RFC977' is defined on line 1056, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1985' is defined on line 1064, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2068' is defined on line 1071, but no explicit
     reference was found in the text

  == Unused Reference: 'WORDNET' is defined on line 1083, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC  821 (Obsoleted by RFC 2821)

  ** Obsolete normative reference: RFC  822 (Obsoleted by RFC 2822)

  ** Obsolete normative reference: RFC  977 (Obsoleted by RFC 3977)

  ** Obsolete normative reference: RFC 1738 (Obsoleted by RFC 4248, RFC 4266)

  ** Obsolete normative reference: RFC 1854 (ref. 'RFC1985') (Obsoleted by
     RFC 2197)

  ** Obsolete normative reference: RFC 2068 (Obsoleted by RFC 2616)


     Summary: 18 errors (**), 0 flaws (~~), 9 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	INTERNET-DRAFT            EXPIRES JANUARY 1998            INTERNET-DRAFT
3	Network Working Group                                           R. Faith
4	INTERNET-DRAFT                            U. North Carolina, Chapel Hill
5	Category: Informational                                        B. Martin
6	                                                     Miranda Productions
7	                                                            26 July 1997

9	                      A Dictionary Server Protocol
10	                    <draft-rfced-info-faith-01.txt>

12	Status of this Memo

14	     This document is an Internet-Draft.  Internet-Drafts are working
15	     documents of the Internet Engineering Task Force (IETF), its areas,
16	     and its working groups.  Note that other groups may also distribute
17	     working documents as Internet-Drafts.

19	     Internet-Drafts are draft documents valid for a maximum of six
20	     months and may be updated, replaced, or obsoleted by other docu-
21	     ments at any time.  It is inappropriate to use Internet- Drafts as
22	     reference material or to cite them other than as ``work in
23	     progress.''

25	     To learn the current status of any Internet-Draft, please check the
26	     ``1id-abstracts.txt'' listing contained in the Internet- Drafts
27	     Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
28	     (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
29	     Coast), or ftp.isi.edu (US West Coast).

31	Authors' Note

33	     [[This document has not yet been submitted or accepted as an offi-
34	     cial RFC.  Two independent server implementations have been com-
35	     pleted, one at dict://dict.miranda.org:2628 and the other at
36	     dict://proteus.cs.unc.edu:2628.  This note should be deleted when
37	     this memo is assigned an RFC number.]]

39	Abstract

41	     The Dictionary Server Protocol (DICT) is a TCP transaction based
42	     query/response protocol that allows a client to access dictionary
43	     definitions from a set of natural language dictionary databases.

45	1.  Introduction

47	     For many years, the Internet community has relied on the "webster"
48	     protocol for access to natural language definitions.  The webster
49	     protocol supports access to a single dictionary and (optionally) to
50	     a single thesaurus.  In recent years, the number of publicly avail-
51	     able webster servers on the Internet has dramatically decreased.

53	     Fortunately, several freely-distributable dictionaries and lexicons
54	     have recently become available on the Internet.  However, these
55	     freely-distributable databases are not accessible via a uniform
56	     interface, and are not accessible from a single site.  They are
57	     often small and incomplete individually, but would collectively
58	     provide an interesting and useful database of English words.  Exam-
59	     ples include the Jargon file [JARGON], the WordNet database [WORD-
60	     NET], MICRA's version of the 1913 Webster's Revised Unabridged Dic-
61	     tionary [WEB1913], and the Free Online Dictionary of Computing
62	     [FOLDOC].  Translating and non-English dictionaries are also becom-
63	     ing available (for example, the FOLDOC dictionary is being trans-
64	     lated into Spanish).

66	     The webster protocol is not suitable for providing access to a
67	     large number of separate dictionary databases, and extensions to
68	     the current webster protocol were not felt to be a clean solution
69	     to the dictionary database problem.

71	     The DICT protocol is designed to provide access to multiple
72	     databases.  Word definitions can be requested, the word index can
73	     be searched (using an easily extended set of algorithms), informa-
74	     tion about the server can be provided (e.g., which index search
75	     strategies are supported, or which databases are available), and
76	     information about a database can be provided (e.g., copyright,
77	     citation, or distribution information).  Further, the DICT protocol
78	     has hooks that can be used to restrict access to some or all of the
79	     databases.

81	1.1.  Requirements

83	     In this document, we adopt the convention discussed in Section
84	     1.3.2 of [RFC1122] of using the capitalized words MUST, REQUIRED,
85	     SHOULD, RECOMMENDED, MAY, and OPTIONAL to define the significance
86	     of each particular requirement specified in this document.

88	     In brief: "MUST" (or "REQUIRED") means that the item is an absolute
89	     requirement of the specification; "SHOULD" (or "RECOMMENDED") means
90	     there may exist valid reasons for ignoring this item, but the full
91	     implications should be understood before doing so; and "MAY" (or
92	     "OPTIONAL") means that his item is optional, and may be omitted
93	     without careful consideration.

95	2.  Protocol Overview

97	2.1.  Link Level

99	     The DICT protocol assumes a reliable data stream such as provided
100	     by TCP.  When TCP is used, a DICT server listens on port 2628 (typ-
101	     ically, webster servers listened on port 2627).

103	     This server is only an interface between programs and the dictio-
104	     nary databases.  It does not perform any user interaction or pre-
105	     sentation-level functions.

107	2.2.  Lexical Tokens

109	     Commands and replies are composed of characters from the ISO-8859-1
110	     character set [ISO].  More specifically, using the grammar conven-
111	     tions from [RFC822]:

113	                                                      ; (  Octal, Decimal.)
114	          CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
115	          CTL         =  <any ASCII control           ; (  0- 37,  0.- 31.)
116	                          character and DEL>          ; (    177,     127.)
117	          CR          =  <ASCII CR, carriage return>  ; (     15,      13.)
118	          LF          =  <ASCII LF, linefeed>         ; (     12,      10.)
119	          SPACE       =  <ASCII SP, space>            ; (     40,      32.)
120	          HTAB        =  <ASCII HT, horizontal-tab>   ; (     11,       9.)
121	          <">         =  <ASCII quote mark>           ; (     42,      34.)
122	          <'>         =  <ASCII single quote mark>    ; (     47,      39.)
123	          CRLF        =  CR LF
124	          WS          =  1*(SPACE / HTAB)

126	          dqstring    =  <"> *(dqtext/quoted-pair) <">
127	          dqtext      =  <any CHAR except <">, "\", and CTLs>
128	          sqstring    =  <'> *(dqtext/quoted-pair) <'>
129	          sqtext      =  <any CHAR except <'>, "\", and CTLs>
130	          quoted-pair =  "\" CHAR

132	          atom        =  1*<any CHAR except SPACE, CTLs, <'>, <">, and "\">
133	          string      =  *<dqstring / sqstring / quoted-pair>
134	          word        =  *<atom / string>
135	          description =  *<word / WS>
136	          text        =  *<word / WS>

138	2.3.  Commands

140	     Commands consist of a command word followed by zero or more parame-
141	     ters.  Commands with parameters must separate the parameters from
142	     each other and from the command by one or more space or tab charac-
143	     ters.  Command lines must be complete with all required parameters,
144	     and may not contain more than one command.

146	     Each command line must be terminated by a CRLF.

148	     The grammar for commands is:

150	          command     = cmd-word *<WS cmd-param>
151	          cmd-word    = atom
152	          cmd-param   = database / strategy / word
153	          database    = atom
154	          strategy    = atom

156	     Commands are not case sensitive.

158	     Command lines MUST NOT exceed 1024 characters in length, counting
159	     all characters including spaces, separators, punctuation, and the
160	     trailing CRLF.  There is no provision for the continuation of com-
161	     mand lines.

163	2.4.  Responses

165	     Responses are of two kinds, status and textual.

167	2.4.1.  Status Responses

169	     Status responses indicate the server's response to the last command
170	     received from the client.

172	     Status response lines begin with a 3 digit numeric code which is
173	     sufficient to distinguish all responses.  Some of these may herald
174	     the subsequent transmission of text.

176	     The first digit of the response broadly indicates the success,
177	     failure, or progress of the previous command (based generally on
178	     [RFC640,RFC821]):

180	          1yz - Positive Preliminary reply
181	          2yz - Positive Completion reply
182	          3yz - Positive Intermediate reply (not used by DICT)
183	          4yz - Transient Negative Completion reply
184	          5yz - Permanent Negative Completion reply

186	     The next digit in the code indicates the response category:

188	          x0z - Syntax
189	          x1z - Information (e.g., help)
190	          x2z - Connections
191	          x3z - Authentication
192	          x4z - Unspecified as yet
193	          x5z - DICT System (These replies indicate the status of the
194	                receiver mail system vis-a-vis the requested transfer
195	                or other DICT system action.)
196	          x8z - Nonstandard (private implementation) extensions

198	     The exact response codes that should be expected from each command
199	     are detailed in the description of that command.

201	     Certain status responses contain parameters such as numbers and
202	     strings.  The number and type of such parameters is fixed for each
203	     response code to simplify interpretation of the response.  Other
204	     status responses do not require specific text identifiers.  Parame-
205	     ter requirements are detailed in the description of relevant com-
206	     mands.  Except for specifically detailed parameters, the text fol-
207	     lowing response codes is server-dependent.

209	     Parameters are separated from the numeric response code and from
210	     each other by a single space.  All numeric parameters are decimal,
211	     and may have leading zeros.  All string parameters MUST conform to
212	     the "atom" or "dqstring" grammar productions.

214	     If no parameters are present, and the server implementation pro-
215	     vides no implementation-specific text, then there MAY or MAY NOT be
216	     a space after the response code.

218	     Response codes not specified in this standard may be used for any
219	     installation-specific additional commands also not specified.
220	     These should be chosen to fit the pattern of x8z specified above.
221	     The use of unspecified response codes for standard commands is pro-
222	     hibited.

224	2.4.2.  General Status Responses

226	     In response to every command, the following general status
227	     responses are possible:

229	          500 Syntax error, command not recognized
230	          501 Syntax error, illegal parameters
231	          502 Command not implemented
232	          503 Command parameter not implemented
233	          420 Server temporarily unavailable
234	          421 Server shutting down at operator request

236	2.4.3.  Text Responses

238	     Before text is sent a numeric status response line, using a 1yz
239	     code, will be sent indicating text will follow. Text is sent as a
240	     series of successive lines of textual matter, each terminated with
241	     a CRLF.  A single line containing only a period (decimal code 46,
242	     ".") is sent to indicate the end of the text (i.e., the server will
243	     send a CRLF at the end of the last line of text, a period, and
244	     another CRLF).

246	     If a line of original text contained a period as the first charac-
247	     ter of the line, that first period is doubled by the DICT server.
248	     Therefore, the client must examine the first character of each line
249	     received.  Those that begin with two periods must have those two
250	     periods collapsed into one period.  Those that contain only a sin-
251	     gle period followed by a CRLF indicate the end of the text
252	     response.

254	     Following a text response, a 2yz response code will be sent.

256	     Text lines MUST NOT exceed 1024 characters in length, counting all
257	     characters including spaces, separators, punctuation, the extra
258	     initial period (if needed), and the trailing CRLF.

260	     It is recommended that text use the US-ASCII [ASCII] or ISO-8859-1
261	     [ISO] character sets, although it is currently beyond the scope of
262	     this standard to specify encoding for text.  In the future, after
263	     significant experience with large databases in various languages
264	     has been gained, and after evaluating the need for character set
265	     and other encodings (e.g., compressed or BASE64 encoding), standard
266	     extensions to this protocol should be proposed.  In the mean time,
267	     private extensions should be used to explore the parameter space to
268	     determine how best to implement these extensions.

270	3.  Command and Response Details

272	     Below, each DICT command and appropriate responses are detailed.
273	     Each command is shown in upper case for clarity, but the DICT
274	     server is case-insensitive.

276	     Except for the AUTH command, every command described in this sec-
277	     tion MUST be implemented by all DICT servers.

279	3.1.  Initial Connection

281	     When a client initially connects to a DICT server, a code 220 is
282	     sent if the client's IP is allowed to connect:

284	          220 text capabilities msg-id

286	     The code 220 is a banner, usually containing host name and DICT
287	     server version information.

289	     The second-to-last sequence of characters in the banner is the
290	     optional capabilities string, which will allow servers to declare
291	     support for extensions to the DICT protocol.  The capabilities
292	     string is defined below:

294	          capabilities =  ["<" msg-atom *("." msg-atom) ">"]
295	          msg-atom     =  1*<any CHAR except SPACE, CTLs,
296	                             "<", ">", ".", and "\">

298	     Individual capabilities are described by a single msg-atom.  For
299	     example, the string <html.gzip> might be used to describe a server
300	     that supports extensions which allow HTML or compressed output.
301	     Capability names beginning with "x" or "X" are reserved for experi-
302	     mental extensions, and SHOULD NOT be defined in any future DICT
303	     protocol specification.

305	     The last sequence of characters in the banner is a msg-id, similar
306	     to the format specified in [RFC822].  The simplified description is
307	     given below:

309	          msg-id       =  "<" spec ">"            ; Unique message id
310	          spec         =  local-part "@" domain
311	          local-part   =  msg-atom *("." msg-atom)
312	          domain       =  msg-atom *("." msg-atom)

314	     Note that, in contrast to [RFC822], spaces and quoted pairs are not
315	     allowed in the msg-id.  This restriction makes the msg-id much eas-
316	     ier for the client to locate and parse but does not significantly
317	     decrease any security benefits, since the msg-id may be arbitrarily
318	     long (as bounded by the response length limits set forth elsewhere
319	     in this document).

321	     Note also that the open and close brackets are part of the msg-id
322	     and should be included in the string that is used to compute the
323	     MD5 checksum.

325	     This message id will be used by the client when formulating the
326	     authentication string used in the AUTH command.

328	     If the client's IP is not allowed to connect, then a code 530 is
329	     sent instead:

331	          530 Access denied

333	     Transient failure responses are also possible:

335	          420 Server temporarily unavailable
336	          421 Server shutting down at operator request

338	     For example, response code 420 should be used if the server cannot
339	     currently fork a server process (or cannot currently obtain other
340	     resources required to proceed with a usable connection), but
341	     expects to be able to fork or obtain these resources in the near
342	     future.

344	     Response code 421 should be used when the server has been shut down
345	     at operator request, or when conditions indicate that the ability
346	     to service more requests in the near future will be impossible.
347	     This may be used to allow a graceful operator-mediated temporary
348	     shutdown of a server, or to indicate that a well known server has
349	     been permanently removed from service (in which case, the text mes-
350	     sage might provide more information).

352	3.2.  The DEFINE Command

354	     DEFINE database word

356	3.2.1.  Description

358	     This command will look up the specified word in the specified
359	     database.  All DICT servers MUST implement this command.

361	     If the database name is specified with an exclamation point (deci-
362	     mal code 33, "!"), then all of the databases will be searched until
363	     a match is found, and all matches in that database will be dis-
364	     played.  If the database name is specified with a star (decimal
365	     code 42, "*"), then all of the matches in all available databases
366	     will be displayed.  In both of these special cases, the databases
367	     will be searched in the same order as that printed by the "SHOW DB"
368	     command.

370	     If the word was not found, then status code 552 is sent.

372	     If the word was found, then status code 150 is sent, indicating
373	     that one or more definitions follow.

375	     For each definition, status code 151 is sent, followed by the tex-
376	     tual body of the definition.  The first three space-delimited
377	     parameters following status code 151 give the word retrieved, the
378	     name of the database (which is the same as the first column of the
379	     SHOW DB command), and a short description for the database (which
380	     is the same as the second column of the SHOW DB command).  The
381	     short name is suitable for printing as:

383	          From name:

385	     before the definition is printed.  This provides source information
386	     for the user.

388	     The textual body of each definition is terminated with a CRLF
389	     period CRLF sequence.

391	     After all of the definitions have been sent, status code 250 is
392	     sent.  This command can provide optional timing information (which
393	     is server dependent and is not intended to be parsable by the
394	     client).  This additional information is useful when debugging and
395	     tuning the server.

397	3.2.2.  Responses

399	          550 Invalid database, use "SHOW DB" for list of databases
400	          552 No match
401	          150 n definitions retrieved - definitions follow
402	          151 word database name - text follows
403	          250 ok (optional timing information here)

405	     Response codes 150 and 151 require special parameters as part of
406	     their text.  The client can use these parameters to display infor-
407	     mation on the user's terminal.

409	     For code 150, parameters 1 indicates the number of definitions
410	     retrieved.

412	     For code 151, parameter 1 is the word retrieved, parameter 2 is the
413	     database name (the first name as shown by "SHOW DB") from which the
414	     definition has been retrieved, and parameter 3 is the the short
415	     database description (the second column of the "SHOW DB" command).

417	3.2.3.  A Note on Virtual Databases

419	     The ability to search all of the provided databases using a single
420	     command is given using the special "*" and "!" databases.

422	     However, sometimes, a client may want to search over some but not
423	     all of the databases that a particular server provides.  One alter-
424	     native is for the client to use the SHOW DB command to obtain a
425	     list of databases and descriptions, and then (perhaps with the help
426	     of a human), select a subset of these databases for an interative
427	     search.  Once this selection has been done once, the results can be
428	     saved, for example, in a client configuration file.

430	     Another alternative is for the server to provide "virtual"
431	     databases which merge several of the regular databases into one.
432	     For example, a virtual database may be provided which includes all
433	     of the translating dictionaries, but which does not include regular
434	     dictionaries or thesauri.  The special "*" and "!" databases can be
435	     considered as names of virtual databases which provide access to
436	     all of the databases.  If a server implements virtual databases,
437	     then the special "*" and "!" databases should probably exclude
438	     other virtual databases (since they merely provide information
439	     duplicated in other databases).  If virtual databases are sup-
440	     ported, they should be listed as a regular database with the SHOW
441	     DB command (although, since "*" and "!" are required, they need not
442	     be listed).

444	     Virtual databases are an implementation-specific detail which has
445	     absolutely no impact on the DICT protocol.  The DICT protocol views
446	     virtual and non-virtual databases the same way.

448	     We mention virtual databases here, however, because they solve a
449	     problem of database selection which could also have been solved by
450	     changes in the protocol.  For example, each dictionary could be
451	     assigned attributes, and the protocol could be extended to specify
452	     searches over databases with certain attributes.  However, this
453	     needlessly complicates the parsing and analysis that must be per-
454	     formed by the implementation.  Further, unless the classification
455	     system is extremely general, there is a risk that it would restrict
456	     the types of databases that can be used with the DICT protocol
457	     (although the protocol has been designed with human-language
458	     databases in mind, it is applicable to any read-only database
459	     application, especially those with a single semi-unique alphanu-
460	     meric key and textual data).

462	3.3.  The MATCH Command

464	     MATCH database strategy word

466	3.3.1.  Description

468	     This command searches an index for the dictionary, and reports
469	     words which were found using a particular strategy.  Not all
470	     strategies are useful for all dictionaries, and some dictionaries
471	     may support additional search strategies (e.g., reverse lookup).
472	     All DICT servers MUST implement the MATCH command, and MUST support
473	     the "exact" and "prefix" strategies.  These are easy to implement
474	     and are generally the most useful.  Other strategies are server
475	     dependent.

477	     The "exact" strategy matches a word exactly, although different
478	     servers may treat non-alphanumeric data differently.  We have found
479	     that a case-insensitive comparison which ignores non-alphanumeric
480	     characters and which folds whitespace is useful for English-lan-
481	     guage dictionaries.  Other comparisons may be more appropriate for
482	     other languages or when using extended character sets.

484	     The "prefix" strategy is similar to "exact", except that it only
485	     compares the first part of the word.

487	     Different servers may implement these algorithms differently.  The
488	     requirement is that strategies with the names "exact" and "prefix"
489	     exist so that a simple client can use them.

491	     Other strategies that might be considered by a server implementor
492	     are matches based on substring, suffix, regular expressions,
493	     soundex [KNUTH73], and Levenshtein [PZ85] algorithms.  These last
494	     two are especially useful for correcting spelling errors.  Other
495	     useful strategies perform some sort of "reverse" lookup (i.e., by
496	     searching definitions to find the word that the query suggests).

498	     If the database name is specified with an exclamation point (deci-
499	     mal code 33, "!"), then all of the databases will be searched until
500	     a match is found, and all matches in that database will be dis-
501	     played.  If the database name is specified with a star (decimal
502	     code 42, "*"), then all of the matches in all available databases
503	     will be displayed.  In both of these special cases, the databases
504	     will be searched in the same order as that printed by the "SHOW DB"
505	     command.

507	     If the strategy is specified using a period (decimal code 46, "."),
508	     then the word will be matched using a server-dependent default
509	     strategy, which should be the best strategy available for interac-
510	     tive spell checking.  This is usually a derivative of the Leven-
511	     shtein algorithm [PZ85].

513	     If no matches are found in any of the searched databases, then sta-
514	     tus code 552 will be returned.

516	     Otherwise, status code 152 will be returned followed by a list of
517	     matched words, one per line, in the form:

519	          database word

521	     This makes the responses directly useful in a DEFINE command.

523	     The textual body of the match list is terminated with a CRLF period
524	     CRLF sequence.

526	     Following the list, status code 250 is sent, which may include
527	     server-specific timing and statistical information, as discussed in
528	     the section on the DEFINE command.

530	3.3.2.  Responses

532	          550 Invalid database, use "SHOW DB" for list of databases
533	          551 Invalid strategy, use "SHOW STRAT" for a list of strategies
534	          552 No match
535	          152 n matches found - text follows
536	          250 ok (optional timing information here)

538	     Response code 152 requires a special parameter as part of its text.
539	     Parameter 1 must be the number of matches retrieved.

541	3.4.  The SHOW Command

543	3.4.1.  SHOW DB

545	     SHOW DB
546	     SHOW DATABASES

548	3.4.1.1.  Description

550	     Displays the list of currently accessible databases, one per line,
551	     in the form:

553	          database description

555	     The textual body of the database list is terminated with a CRLF
556	     period CRLF sequence.  All DICT servers MUST implement this com-
557	     mand.

559	     Note that some databases may be restricted due to client domain or
560	     lack of user authentication (see the AUTH command).  Information
561	     about these databases is not available until authentication is per-
562	     formed.  Until that time, the client will interact with the server
563	     as if the additional databases did not exist.

565	3.4.1.2.  Responses

567	          110 n databases present - text follows
568	          554 No databases present

570	     Response code 110 requires a special parameter.  Parameter 1 must
571	     be the number of databases available to the user.

573	3.4.2.  SHOW STRAT

575	     SHOW STRAT
576	     SHOW STRATEGIES

578	3.4.2.1.  Description

580	     Displays the list of currently supported search strategies, one per
581	     line, in the form:

583	          strategy description

585	     The textual body of the strategy list is terminated with a CRLF
586	     period CRLF sequence.  All DICT servers MUST implement this
587	     command.

589	3.4.2.2.  Responses

591	          111 n strategies available - text follows
592	          555 No strategies available

594	     Response code 111 requires a special parameter.  Parameter 1 must
595	     be the number of strategies available.

597	3.4.3.  SHOW INFO

599	     SHOW INFO database

601	3.4.3.1.  Description

603	     Displays the source, copyright, and licensing information about the
604	     specified database.  The information is free-form text and is suit-
605	     able for display to the user in the same manner as a definition.
606	     The textual body of the information is terminated with a CRLF
607	     period CRLF sequence.  All DICT servers MUST implement this com-
608	     mand.

610	3.4.3.2.  Responses

612	          550 Invalid database, use "SHOW DB" for list of databases
613	          112 database information follows

615	     These response codes require no special parameters.

617	3.4.4.  SHOW SERVER

619	     SHOW SERVER

621	3.4.4.1.  Description

623	     Displays local server information written by the local administra-
624	     tor.  This could include information about local databases or
625	     strategies, or administrative information such as who to contact
626	     for access to databases requiring authentication.  All DICT servers
627	     MUST implement this command.

629	3.4.4.2.  Responses

631	          114 server information follows

633	     This response code requires no special parameters.

635	3.5.  The CLIENT Command

637	     CLIENT text

639	3.5.1.  Description

641	     This command allows the client to provide information about itself
642	     for possible logging and statistical purposes.  All clients SHOULD
643	     send this command after connecting to the server.  All DICT servers
644	     MUST implement this command (note, though, that the server doesn't
645	     have to do anything with the information provided by the client).

647	3.5.2.  Responses

649	          250 ok (optional timing information here)

651	     This response code requires no special parameters.

653	3.6.  The STATUS Command

655	     STATUS

657	3.6.1.  Description

659	     Display some server-specific timing or debugging information.  This
660	     information may be useful in debugging or tuning a DICT server.
661	     All DICT servers MUST implement this command (note, though, that
662	     the text part of the response is not specified and may be omitted).

664	3.6.2.  Responses

666	          210 (optional timing and statistical information here)

668	     This response code requires no special parameters.

670	3.7.  The HELP Command

672	     HELP

674	3.7.1.  Description

676	     Provides a short summary of commands that are understood by this
677	     implementation of the DICT server.  The help text will be presented
678	     as a textual response, terminated by a single period on a line by
679	     itself.  All DICT servers MUST implement this command.

681	3.7.2.  Responses

683	          113 help text follows

685	     This response code requires no special parameters.

687	3.8.  The QUIT Command

689	     QUIT

691	3.8.1.  Description

693	     This command is used by the client to cleanly exit the server.  All
694	     DICT servers MUST implement this command.

696	3.8.2.  Responses

698	          221 Closing Connection

700	     This response code requires no special parameters.

702	3.9.  The AUTH Command

704	     AUTH username authentication-string

706	3.9.1.  Description

708	     The client can authenticate itself to the server using a username
709	     and password.  The authentication-string will be computed as in the
710	     APOP protocol discussed in [RFC1939].  Briefly, the authentication-
711	     string is the MD5 checksum of the concatenation of the msg-id
712	     (obtained from the initial banner) and the "shared secret" that is
713	     stored in the server and client configuration files.  Since the
714	     user does not have to type this shared secret when accessing the
715	     server, the shared secret can be an arbitrarily long passphrase.
716	     Because of the computational ease of computing the MD5 checksum,
717	     the shared secret should be significantly longer than a usual pass-
718	     word.

720	     Authentication may make more dictionary databases available for the
721	     current session.  For example, there may be some publicly dis-
722	     tributable databases available to all users, and other private
723	     databases available only to authenticated users.  Or, a server may
724	     require authentication from all users to minimize resource utiliza-
725	     tion on the server machine.

727	     Authentication is an optional server capability.  The AUTH command
728	     MAY be implemented by a DICT server.

730	3.9.2.  Responses

732	          230 Authentication successful
733	          531 Access denied, use "SHOW INFO" for server information

735	     These response codes require no special parameters.

737	4.  Command Pipelining

739	     All DICT servers MUST be able to accept multiple commands in a sin-
740	     gle TCP send operation.  Using a single TCP send operation for mul-
741	     tiple commands can improved DICT performance significantly, espe-
742	     cially in the face of high latency network links.

744	     The possible implementation problems for a DICT server which would
745	     prevent command pipelining are similar to the problems that prevent
746	     pipelining in an SMTP server.  These problems are discussed in
747	     detail in [RFC1854], which should be consulted by all DICT server
748	     implementors.

750	     The main implication is that a DICT server implementation MUST NOT
751	     flush or otherwise lose the contents of the TCP input buffer under
752	     any circumstances whatsoever.

754	     A DICT client may pipeline several commands and must check the
755	     responses to each command individually.  If the server has shut
756	     down, it is possible that all of the commands will not be pro-
757	     cessed.  For example, a simple DICT client may pipeline a CLIENT,
758	     DEFINE, and QUIT command sequence as it is connecting to the
759	     server.  If the server is shut down, the initial response code sent
760	     by the server may be 420 (temporarily unavailable) instead of 220
761	     (banner).  In this case, the definition cannot be retrieved, and
762	     the client should report and error or retry the command.  If the
763	     server is working, it may be able to send back the banner, defini-
764	     tion, and termination message in a single TCP send operation.

766	5.  URL Specification

768	     The DICT URL scheme is used to refer to definitions or word lists
769	     available using the DICT protocol:

771	     dict://<user>:<passphrase>@<host>:<port>/d:<word>:<database>:<n>
772	     dict://<user>:<passphrase>@<host>:<port>/m:<word>:<database>:<strat>:<n>

774	     The "/d" syntax specifies the DEFINE command (see section 3.2),
775	     whereas the "/m" specifies the MATCH command (section 3.3).

777	     Some or all of "<user>:<passphrase>@", ":<port>", "<database>",
778	     "<strat>", and "<n>" may be omitted.

780	     "<n>" will usually be omitted, but when included, it specifies the
781	     nth definition or match of a word.  A method for extracting exactly
782	     this information from the server is not avaiable using the DICT
783	     protocol.  However, a client using the URL specification could
784	     obtain all of the definitions or matches, and then select the one
785	     that is specified.

787	     If "<user>:<passphrase>@" is omitted, no authentication is done.
788	     If ":<port>" is omitted, the default port (2628) SHOULD be used.
789	     If "<database>" is omitted, "!" SHOULD be used (see section 3.2.1).
790	     If "<strat>" is omitted, "." SHOULD be used (see section 3.3.1).

792	     Trailing colons may be omitted.  For example, the following URLs
793	     might specify definitions or matches:

795	          dict://dict.org/d:shortcake:
796	          dict://dict.org/d:shortcake:*
797	          dict://dict.org/d:shortcake:wordnet:
798	          dict://dict.org/d:shortcake:wordnet:1
799	          dict://dict.org/d:abcdefgh
800	          dict://dict.org/d:sun
801	          dict://dict.org/d:sun::1

803	          dict://dict.org/m:sun
804	          dict://dict.org/m:sun::soundex
805	          dict://dict.org/m:sun:wordnet::1
806	          dict://dict.org/m:sun::soundex:1
807	          dict://dict.org/m:sun:::

809	     See [RFC1738] for the specification of Uniform Resource Locators.

811	6.  Extensions

813	     This protocol was designed so that flat text databases can be used
814	     with a server after a minimum of analysis and formatting.  Our
815	     experience is that merely constructing an index for a database may
816	     be sufficient to make it useful with a DICT server.  The ability to
817	     serve preformatted text is especially important since freely-avail-
818	     able databases are often distributed as flat text files without any
819	     semantic mark-up information (and often contain "ASCII art" which
820	     precludes the automation of even simple formatting).

822	     However, given a database with sufficient mark-up information, it
823	     may be possible to generate output in a variety of different for-
824	     mats (e.g., simple HTML or more sophisticated SGML).  The specifi-
825	     cation of formatting is beyond the scope of this document.  The
826	     requirements for negotiation of format (including character set and
827	     other encodings) is complex and should be examined over time as
828	     more experience is gained.  We suggest that the use of different
829	     formats, as well as other server features, be explored as exten-
830	     sions to the protocol.

832	6.1.  Experimental Command Syntax

834	     Single-letter commands are reserved for debugging and testing,
835	     SHOULD NOT be defined in any future DICT protocol specification,
836	     and MUST NOT be used by any client software.

838	     Commands beginning with the letter "X" are reserved for experimen-
839	     tal extensions, and SHOULD NOT be defined in any future DICT proto-
840	     col specification.  Authors of client software should understand
841	     that these commands are not part of the DICT protocol and may not
842	     be available on all DICT servers.

844	6.2.  Experimental Commands and Pipelining

846	     Experimental commands should be designed so that a client can
847	     pipeline the experimental commands without knowing if a server sup-
848	     ports the commands (e.g., instead of using feature negotiation).
849	     If the server does not support the commands, then a response code
850	     of 500 will be given, notifying the client that the extension is
851	     not supported.  Of course, depending on the complexity of the
852	     extensions added, feature negotiation may be necessary.  To help
853	     minimize negotiation time, server-supported features may be
854	     announced in the banner (code 220) using the optional capabilities
855	     parameter.

857	7.  Summary of Response Codes
858	     Below is a summary of response codes.  A star (*) in the first col-
859	     umn indicates the response has defined arguments that must be pro-
860	     vided.

862	          * 110 n databases present - text follows
863	          * 111 n strategies available - text follows
864	            112 database information follows
865	            113 help text follows
866	            114 server information follows
867	          * 150 n definitions retrieved - definitions follow
868	          * 151 word database name - text follows
869	          * 152 n matches found - text follows
870	            210 (optional timing and statistical information here)
871	          * 220 text msg-id
872	            221 Closing Connection
873	            230 Authentication successful
874	            250 ok (optional timing information here)
875	            420 Server temporarily unavailable
876	            421 Server shutting down at operator request
877	            500 Syntax error, command not recognized
878	            501 Syntax error, illegal parameters
879	            502 Command not implemented
880	            503 Command parameter not implemented
881	            530 Access denied
882	            531 Access denied, use "SHOW INFO" for server information
883	            550 Invalid database, use "SHOW DB" for list of databases
884	            551 Invalid strategy, use "SHOW STRAT" for a list of strategies
885	            552 No match
886	            554 No databases present
887	            555 No strategies available

889	8.  Sample Conversations

891	     Theses are samples of the conversations that might be expected with
892	     a typical DICT server.  The notation "C:" indicates commands set by
893	     the client, and "S:" indicates responses sent by the server.  Blank
894	     lines are included for clarity and do not indicate actual newlines
895	     in the transaction.

897	8.1.  Sample 1 - opening connection, HELP, DEFINE, and QUIT commands
898	     C: [ client initiates connection ]

900	     S: 220 dict.org dictd (version 0.9) <27831.860032493@dict.org>

902	     C: HELP

904	     S: 113 Help text follows
905	     S: DEFINE database word            look up word in database
906	     S: MATCH database strategy word    match word in database using strategy
907	     S: [ more server-dependent help text ]
908	     S: .
909	     S: 250 Command complete

911	     C: DEFINE ! penguin

913	     S: 150 1 definitions found: list follows
914	     S: 151 "penguin" wn "WordNet 1.5" : definition text follows
915	     S: penguin
916	     S:   1. n: short-legged flightless birds of cold southern esp. Antarctic
917	     S:      regions having webbed feet and wings modified as flippers
918	     S: .
919	     S: 250 Command complete

921	     C: DEFINE * shortcacke

923	     S: 150 2 definitions found: list follows
924	     S: 151 "shortcake" wn "WordNet 1.5" : text follows
925	     S: shortcake
926	     S:   1. n: very short biscuit spread with sweetened fruit and usu.
927	     S:      whipped cream
928	     S: .
929	     S: 151 "Shortcake" web1913 "Webster's Dictionary (1913)" : text follows
930	     S: Shortcake
931	     S:    \Short"cake`\, n.
932	     S:    An unsweetened breakfast cake shortened with butter or lard,
933	     S:    rolled thin, and baked.
934	     S: .
935	     S: 250 Command complete
936	     C: DEFINE abcdefgh

938	     S: 552 No match

940	     C: quit

942	     S: 221 Closing connection

944	8.2.  Sample 2 - SHOW commands, MATCH command

946	     C: SHOW DB

948	     S: 110 3 databases present: list follows
949	     S: wn "WordNet 1.5"
950	     S: foldoc "Free On-Line Dictionary of Computing"
951	     S: jargon "Hacker Jargon File"
952	     S: .
953	     S: 250 Command complete

955	     C: SHOW STRAT

957	     S: 111 5 strategies present: list follows
958	     S: exact "Match words exactly"
959	     S: prefix "Match word prefixes"
960	     S: substring "Match substrings anywhere in word"
961	     S: regex "Match using regular expressions"
962	     S: reverse "Match words given definition keywords"
963	     S: .
964	     S: 250 Command complete
965	     C: MATCH foldoc regex "s.si"

967	     S: 152 7 matches found: list follows
968	     S: foldoc Fast SCSI
969	     S: foldoc SCSI
970	     S: foldoc SCSI-1
971	     S: foldoc SCSI-2
972	     S: foldoc SCSI-3
973	     S: foldoc Ultra-SCSI
974	     S: foldoc Wide SCSI
975	     S: .
976	     S: 250 Command complete

978	     C: MATCH wn substring "abcdefgh"

980	     S: 552 No match

982	8.3.  Sample 3 - Server downtime

984	     C: [ client initiates connection ]

986	     S: 420 Server temporarily unavailable

988	     C: [ client initiates connection ]

990	     S: 421 Server shutting down at operator request

992	8.4.  Sample 4 - Authentication

994	     C: [ client initiates connection ]

996	     S: 220 dict.org dictd (version 0.9) <27831.860032493@dict.org>
997	     C: SHOW DB

999	     S: 110 1 database present: list follows
1000	     S: free "Free database"
1001	     S: .
1002	     S: 250 Command complete

1004	     C: AUTH joesmith authentication-string

1006	     S: 230 Authentication successful

1008	     C: SHOW DB

1010	     S: 110 2 databases present: list follows
1011	     S: free "Free database"
1012	     S: licensed "Local licensed database"
1013	     S: .
1014	     S: 250 Command complete

1016	9.  Security Considerations

1018	     This RFC raises no security issues.

1020	10.  References

1022	     [ASCII] US-ASCII. Coded Character Set - 7-Bit American Standard
1023	          Code for Information Interchange. Standard ANSI X3.4-1986,
1024	          ANSI, 1986.

1026	     [FOLDOC] Howe, Denis, ed.  The Free On-Line Dictionary of Comput-
1027	          ing, <URL:http://wombat.doc.ic.ac.uk/>

1029	     [ISO] ISO-8859. International Standard -- Information Processing --
1030	          8-bit Single-Byte Coded Graphic Character Sets -- Part 1:
1031	          Latin alphabet No. 1, ISO 8859-1:1987.

1033	     [JARGON] The on-line hacker Jargon File, version 4.0.0, 25 JUL
1034	          1996, <URL:http://www.ccil.org/jargon/>

1036	     [KNUTH73] Knuth, Donald E. "The Art of Computer Programming", Vol-
1037	          ume 3: Sorting and Searching (Addison-Wesley Publishing Co.,
1038	          1973, pages 391 and 392). Knuth notes that the soundex method
1039	          was originally described by Margaret K. Odell and Robert C.
1040	          Russell [US Patents 1261167 (1918) and 1435663 (1922)].

1042	     [PZ85] Pollock, Joseph J. and Zamora, Antonio, "Automatic spelling
1043	          correction in scientific and scholarly text," CACM, 27(4):
1044	          Apr. 1985, 358-368.

1046	     [RFC640] Postel, J., "Revised FTP Reply Codes", RFC-640, June,
1047	          1975.

1049	     [RFC821] Postel, J., "Simple Mail Transfer Protocol", RFC-821,
1050	          USC/Information Sciences Institute, August, 1982.

1052	     [RFC822] Crocker, D., "Standard for the Format of ARPA Internet
1053	          Text Messages", RFC-822, Department of Electrical Engineering,
1054	          University of Delaware, August, 1982.

1056	     [RFC977] Kantor, B., Lapsley, P., "Network News Transfer Protocol:
1057	          A Proposed Standard for the Stream-Based Transmission of
1058	          News", RFC-977, U.C. San Diego, U.C. Berkeley, February, 1986.

1060	     [RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform
1061	          Resource Locators (URL)", RFC-1738, CERN, Xerox PARC, Univer-
1062	          sity of Minnesota, December 1994.

1064	     [RFC1985] Freed, N., and Cargille, A., "SMTP Service Extension for
1065	          Command Pipelining", RFC-1854, Innosoft International, Inc.,
1066	          and Network Working Group, October 1995.

1068	     [RFC1939] Myers, J., Rose, M., "Post Office Protocol - Version 3",
1069	          RFC-1939, Carnegie Mellon/Dover Beach Consulting, May, 1996.

1071	     [RFC2068] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Bern-
1072	          ers-Lee, T., "Hypertext Transfer Protocol -- HTTP/1.1",
1073	          RFC-2068, U.C. Irvine, DEC, MIT/LCS, January, 1997.

1075	     [WEB1913] Webster's Revised Unabridged Dictionary (G & C. Merriam
1076	          Co., 1913, edited by Noah Porter).  Online version prepared by
1077	          MICRA, Inc., Plainfield, N.J. and edited by Patrick Cassidy
1078	          <cassidy@micra.com>.  For further information, see
1079	          <URL:ftp://uiarchive.cso.uiuc.edu/pub/etext/guten-
1080	          berg/etext96/pgw*>, and <URL:http://humani-
1081	          ties.uchicago.edu/forms unrest/webster.form.html>

1083	     [WORDNET] Miller, G.A. (1990), ed. WordNet: An On-Line Lexical
1084	          Database. International Journal of Lexicography. Volume 3,
1085	          Number 4.  <URL:http://www.cogsci.princeton.edu/~wn/>

1087	11.  Acknowledgements

1089	     Thanks to Arnt Gulbrandsen and Nicolai Langfeldt for many helpful
1090	     discussions.  Thanks to Bennet Yee, Doug Hoffman, Kevin Martin, and
1091	     Jay Kominek for extensive testing and feedback on the initial
1092	     implementations of the DICT server.  Thanks to Zhong Shao for
1093	     advice and support.

1095	     Thanks to Brian Kanto, Phil Lapsley, and Jon Postel for writing
1096	     exemplary RFCs which were consulted during the preparation of this
1097	     document.

1099	12.  Author's Addresses

1101	          Rickard E. Faith
1102	          EMail: faith@cs.unc.edu (or faith@acm.org)

1104	          Bret Martin
1105	          EMail: bamartin@miranda.org

1107	          The majority of this work was completed while Bret Martin was
1108	          a student at Yale University.