idnits 2.17.1 

draft-robinson-www-interface-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-26) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (15 February 1996) is 10298 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Downref: Normative reference to an Informational RFC: RFC 1630 (ref. '1')

  ** Obsolete normative reference: RFC 1866 (ref. '2') (Obsoleted by RFC 2854)

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-http-v10-spec (ref. '3')

  ** Obsolete normative reference: RFC 1738 (ref. '4') (Obsoleted by RFC
     4248, RFC 4266)

  ** Obsolete normative reference: RFC  822 (ref. '6') (Obsoleted by RFC 2822)

  ** Obsolete normative reference: RFC 1808 (ref. '7') (Obsoleted by RFC 3986)

  ** Obsolete normative reference: RFC 1590 (ref. '9') (Obsoleted by RFC
     2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049)

  ** Obsolete normative reference: RFC  931 (ref. '10') (Obsoleted by RFC
     1413)

  -- Possible downref: Non-RFC (?) normative reference: ref. '11'


     Summary: 17 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT                                           D.R.T. Robinson
2	<draft-robinson-www-interface-01.txt>            University of Cambridge
3	Expires 15 August 1996                                  15 February 1996

5	              The WWW Common Gateway Interface Version 1.1

7	Status of this memo

9	   This document is an Internet-Draft. Internet-Drafts are working
10	   documents of the Internet Engineering Task Force (IETF), its areas
11	   and its working groups. Note that other groups may also distribute
12	   working documents as Internet-Drafts.

14	   Internet-Drafts are draft documents valid for a maximum of six months
15	   and may be updated, replaced or obsoleted by other documents at any
16	   time. It is inappropriate to use Internet-Drafts as reference
17	   material or to cite them other than as `work in progress'.

19	   To learn the current status of any Internet-Draft, please check the
20	   `1id-abstracts.txt' listing contained in the Internet-Drafts Shadow
21	   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
22	   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
23	   ftp.isi.edu (US West Coast).

25	   Distribution of this document is unlimited. Please send comments to
26	   the author; general discussion about CGI should take place on the
27	   <www-talk@w3.org> mailing list.

29	Abstract

31	   The Common Gateway Interface (CGI) is a simple interface for running
32	   external programs, software or gateways under an information server
33	   in a platform-independent manner. Currently, the supported
34	   information servers are HTTP servers.

36	   The interface has been in use by the World-Wide Web since 1993. This
37	   specification defines the interface known as `CGI/1.1', and its use
38	   on the Unix(R) and AmigaDOS(tm) systems.

40	1. Introduction

42	1.1. Purpose

44	   Together the HTTP [3] server and the CGI script are responsible for
45	   servicing a client request by sending back responses. The client
46	   request comprises a Universal Resource Identifier (URI) [1], a
47	   request method and various ancillary information about the request
48	   provided by the transport mechanism.

50	   The CGI defines the abstract parameters, known as environment
51	   variables, which describe the client's request. Together with a
52	   concrete programmer interface this specifies a platform-independent
53	   interface between the script and the HTTP server.

55	1.2. Requirements

57	   This specification uses the same words as RFC 1123 [5] to define the
58	   significance of each particular requirement. These are:

60	   must

62	      This word or the adjective `required' means that the item is an
63	      absolute requirement of the specification.

65	   should

67	      This word or the adjective `recommended' means that there may
68	      exist valid reasons in particular circumstances to ignore this
69	      item, but the full implications should be understood and the case
70	      carefully weighed before choosing a different course.

72	   may

74	      This word or the adjective `optional' means that this item is
75	      truly optional. One vendor may choose to include the item because
76	      a particular marketplace requires it or because it enhances the
77	      product, for example; another vendor may omit the same item.

79	   An implementation is not compliant if it fails to satisfy one or more
80	   of the `must' requirements for the protocols it implements. An
81	   implementation that satisfies all of the `must' and all of the
82	   `should' requirements for its features is said to be `unconditionally
83	   compliant'; one that satisfies all of the `must' requirements but not
84	   all of the `should' requirements for its features is said to be
85	   `conditionally compliant'.

87	1.3. Specifications

89	   Not all of the functions and features of the CGI are defined in the
90	   main part of this specification. The following phrases are used to
91	   describe the features which are not specified:

93	   system defined

95	      The feature may differ between systems, but must be the same for
96	      different implementations using the same system. A system will
97	      usually identify a class of operating-systems. Some systems are
98	      defined in section 12 of this document. New systems may be defined
99	      by new specifications without revision of this document.

101	   implementation defined

103	      The behaviour of the feature may vary from implementation to
104	      implementation, but a particular implementation must document its
105	      behaviour.

107	1.4. Terminology

109	   This specification uses many terms defined in the HTTP/1.0
110	   specification [3]; however, the following terms are used here in a
111	   sense which may not accord with their definitions in that document,
112	   or with their common meaning.

114	   environment variable

116	      A named parameter that carries information from the server to the
117	      script. It is not necessarily a variable in the operating-system's
118	      environment, although that is the most common implementation.

120	   script

122	      The software which is invoked by the server via this interface. It
123	      need not be a standalone program, but could be a
124	      dynamically-loaded or shared library, or even a subroutine in the
125	      server.

127	   server

129	      The application program which invokes the script in order to
130	      service requests.

132	2. Notational Conventions and Generic Grammar

134	2.1. Augmented BNF

136	   All of the mechanisms specified in this document are described in
137	   both prose and an augmented Backus-Naur Form (BNF) similar to that
138	   used by RFC 822 [6]. This augmented BNF contains the following
139	   constructs:

141	   name = definition

143	      The name of a rule is simply the name itself; it is separated from
144	      the definition by the equal character ("="). Whitespace is only
145	      significant in that continuation lines of a definition are
146	      indented.

148	   "literal"

150	      Quotation marks (") surround literal text, except for a literal
151	      quotation mark, which is surrounded by angle-brackets ("<" and
152	      ">").  Unless stated otherwise, the text is case-sensitive.

154	   rule1 | rule2

156	      Alternative rules are separated by a vertical bar ("|").

158	   (rule1 rule2 rule3)

160	      Elements enclosed in parentheses are treated as a single element.

162	   *rule

164	      A rule preceded by an asterisk ("*") may have zero or more
165	      occurrences. A rule preceded by an integer followed by an asterisk
166	      must occur at least the specified number of times.

168	   [rule]

170	      A element enclosed in square brackets ("[" and "]") is optional.

172	2.2. Basic Rules

174	   The following rules are used throughout this specification to
175	   describe basic parsing constructs.

177	      alpha         = lowalpha | hialpha
178	      lowalpha      = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h"
179	                    | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p"
180	                    | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x"
181	                    | "y" | "z"
182	      hialpha       = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H"
183	                    | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P"
184	                    | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X"
185	                    | "Y" | "Z"
186	      digit         = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7"
187	                    | "8" | "9"
188	      OCTET         = <any 8-bit byte>
189	      CHAR          = <any character>
190	      CTL           = <any control character>
191	      SP            = <space character>
192	      NL            = <newline>
193	      LWSP          = SP | NL | <horizontal-tab>
194	      tspecial      = "(" | ")" | "@" | "," | ";" | ":" | "\" | <">
195	                    | "/" | "[" | "]" | "?" | SP
196	      token         = 1*<any CHAR except CTLs or tspecials>
197	      quoted-string = ( <"> *qdtext <"> ) | ( "<" *qatext ">")
198	      qdtext        = <any CHAR except <"> and CTLs but including LWSP>
199	      qatext        = <any CHAR except "<", ">" and CTLs but
200	                      including LWSP>

202	   Note that newline (NL) need not be a single character, but can be a
203	   character sequence.

205	3. URL Encoding

207	   Some variables and constructs used here are described as being
208	   `URL-encoded'. This encoding is described in section 2.2 of RFC 1738
209	   [4]. In a URL encoded string an escape sequence consists of a percent
210	   character ("%") followed by two hexadecimal digits, where the two
211	   hexadecimal digits form an octet. An escape sequence represents the
212	   graphic character which has the octet as its code within the US-ASCII
213	   [11] coded character set, if it exists. If no such graphic character
214	   exists, then the escape sequence represents the octet value itself.

216	   Note that some unsafe characters may have different semantics if they
217	   are encoded. The definition of which characters are unsafe depends on
218	   the context.

220	4. The Script URI

222	   A `Script URI' can be defined; this describes the resource identified
223	   by the environment variables. Often, this URI will be the same as the
224	   URI requested by the client (the `Client URI'); however, it need not
225	   be. Instead, it could be a URI invented by the server, and so it can
226	   only be used in the context of the server and its CGI interface.

228	   The script URI has the syntax of generic-RL as defined in section 2.1
229	   of RFC 1808 [7], with the exception that object parameters and
230	   fragment identifiers are not permitted:

232	      <scheme>://<host>:<port>/<path>?<query>

234	   The various components of the script URI are defined by some of the
235	   environment variables (see below);

237	      script-uri = protocol "://" SERVER_NAME ":" SERVER_PORT enc-script
238	                   enc-path-info "?" QUERY_STRING

240	   where `protocol' is found from SERVER_PROTOCOL, `enc-script' is a
241	   URL-encoded version of SCRIPT_NAME and `enc-path-info' is a
242	   URL-encoded version of PATH_INFO.

244	5. Environment variables

246	   Environment variables are used to pass data about the request from
247	   the server to the script. They are accessed by the script in a system
248	   defined manner. In all cases, a missing environment variable is
249	   equivalent to a zero-length (NULL) value, and vice versa. The
250	   representation of the characters in the environment variables is
251	   system defined.

253	   Case is not significant in the names, in that there cannot be two
254	   different variable whose names differ in case only. Here they are
255	   shown using a canonical representation of capitals plus underscore
256	   ("_"). The actual representation of the names is system defined; for
257	   a particular system the representation may be defined differently to
258	   this.

260	   The variables are:

262	      AUTH_TYPE
263	      CONTENT_LENGTH
264	      CONTENT_TYPE
265	      GATEWAY_INTERFACE
266	      HTTP_*
267	      PATH_INFO
268	      PATH_TRANSLATED
269	      QUERY_STRING
270	      REMOTE_ADDR
271	      REMOTE_HOST
272	      REMOTE_IDENT
273	      REMOTE_USER
274	      REQUEST_METHOD
275	      SCRIPT_NAME
276	      SERVER_NAME
277	      SERVER_PORT
278	      SERVER_PROTOCOL
279	      SERVER_SOFTWARE

281	   AUTH_TYPE

283	      This variable is specific to requests made with HTTP.

285	      If the script URI would require access authentication for external
286	      access, then this variable is found from the `auth-scheme' token
287	      in the request, otherwise NULL.

289	         AUTH_TYPE   = "" | auth-scheme
290	         auth-scheme = "Basic" | token

292	      HTTP access authentication schemes are described in section 11 of
293	      the HTTP/1.0 specification [3]. The auth-scheme is not
294	      case-sensitive.

296	   CONTENT_LENGTH

298	      The size of the entity attached to the request, if any, in decimal
299	      number of octets. If no data is attached, then NULL. The syntax is
300	      the same as the HTTP Content-Length header (section 10, HTTP/1.0
301	      specification [3]).

303	         CONTENT_LENGTH = "" | [ 1*digit ]

305	   CONTENT_TYPE

307	      The Internet Media Type [9] of the attached entity. The syntax is
308	      the same as the HTTP Content-Type header.

310	         CONTENT_TYPE = "" | media-type
311	         media-type   = type "/" subtype *( ";" parameter)
312	         type         = token
313	         subtype      = token
314	         parameter    = attribute "=" value
315	         attribute    = token
316	         value        = token | quoted-string

318	      The type, subtype and parameter attribute names are not
319	      case-sensitive. Parameter values may be case sensitive.  Media
320	      types and their use in HTTP are described section 3.6 of the
321	      HTTP/1.0 specification [3]. Example:

323	         application/x-www-form-urlencoded

325	      There is no default value for this variable. If and only if it is
326	      unset, then the script may attempt to determine the media type
327	      from the data received. If the type remains unknown, then
328	      application/octet-stream should be assumed.

330	   GATEWAY_INTERFACE

332	      The version of the CGI specification to which this server
333	      complies.  Syntax:

335	         GATEWAY_INTERFACE =  "CGI" "/" 1*digit "." 1*digit

337	      Note that the major and minor numbers are treated as separate
338	      integers and that each may be incremented higher than a single
339	      digit.  Thus CGI/2.4 is a lower version than CGI/2.13 which in
340	      turn is lower than CGI/12.3. Leading zeros must be ignored by
341	      scripts and should never be generated by servers.

343	      This document defines the 1.1 version of the CGI interface.

345	   HTTP_*

347	      These variables are specific to requests made with HTTP.
348	      Interpretation of these variables may depend on the value of
349	      SERVER_PROTOCOL.

351	      Environment variables with names beginning with "HTTP_" contain
352	      header data read from the client, if the protocol used was HTTP.
353	      The HTTP header name is converted to upper case, has all
354	      occurrences of "-" replaced with "_" and has "HTTP_" prepended to
355	      give the environment variable name. The header data may be
356	      presented as sent by the client, or may be rewritten in ways which
357	      do not change its semantics. If multiple headers with the same
358	      field-name are received then they must be rewritten as a single
359	      header having the same semantics. Similarly, a header that is
360	      received on more than one line must be merged onto a single line.
361	      The server must, if necessary, change the representation of the
362	      data (for example, the character set) to be appropriate for a CGI
363	      environment variable.

365	      The server is not required to create environment variables for all
366	      the headers that it receives. In particular, it may remove any
367	      headers carrying authentication information, such as
368	      "Authorization"; it may remove headers whose value is available to
369	      the script via other variables, such as "Content-Length" and
370	      "Content-Type".

372	   PATH_INFO

374	      A path to be interpreted by the CGI script. It identifies the
375	      resource or sub-resource to be returned by the CGI script. The
376	      syntax and semantics are similar to a decoded HTTP URL `hpath'
377	      token (defined in RFC 1738 [4]), with the exception that a
378	      PATH_INFO of "/" represents a single void path segment. Otherwise,
379	      the leading "/" character is not part of the path.

381	         PATH_INFO = "" | "/" path
382	         path      = segment *( "/" segment )
383	         segment   = *pchar
384	         pchar     = <any CHAR except "/">

386	      The PATH_INFO string is the trailing part of the <path> component
387	      of the script URI that follows the SCRIPT_NAME part of the path.

389	   PATH_TRANSLATED

391	      The OS path to the file that the server would attempt to access
392	      were the client to request the absolute URL containing the path
393	      PATH_INFO.  i.e for a request of

395	         protocol "://" SERVER_NAME ":" SERVER_PORT enc-path-info

397	      where `enc-path-info' is a URL-encoded version of PATH_INFO. If
398	      PATH_INFO is NULL then PATH_TRANSLATED is set to NULL.

400	         PATH_TRANSLATED = *CHAR

402	      PATH_TRANSLATED need not be supported by the server. The server
403	      may choose to set PATH_TRANSLATED to NULL for reasons of security,
404	      or because the path would not be interpretable by a CGI script;
405	      such as the object it represented was internal to the server and
406	      not visible in the file-system; or for any other reason.

408	      The algorithm the server uses to derive PATH_TRANSLATED is
409	      obviously implementation defined; CGI scripts which use this
410	      variable may suffer limited portability.

412	   QUERY_STRING

414	      A URL-encoded search string; the <query> part of the script URI.

416	         QUERY_STRING = query-string
417	         query-string = *qchar
418	         qchar        = unreserved | escape | reserved
419	         unreserved   = alpha | digit | safe | extra
420	         reserved     = ";" | "/" | "?" | ":" | "@" | "&" | "="
421	         safe         = "$" | "-" | "_" | "." | "+"
422	         extra        = "!" | "*" | "'" | "(" | ")" | ","
423	         escape       = "%" hex hex
424	         hex          = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a"
425	                      | "b" | "c" | "d" | "e" | "f"

427	      The URL syntax for a search string is described in RFC 1738 [4].

429	   REMOTE_ADDR

431	      The IP address of the agent sending the request to the server. Not
432	      necessarily that of the client.

434	         REMOTE_ADDR = hostnumber
435	         hostnumber  = digits "." digits "." digits "." digits
436	         digits      = 1*digit

438	   REMOTE_HOST

440	      The fully qualified domain name of the agent sending the request
441	      to the server, if available, otherwise NULL. Not necessarily that
442	      of the client. Fully qualified domain names take the form as
443	      described in section 3.5 of RFC 1034 [8] and section 2.1 of RFC
444	      1123 [5]; a sequence of domain labels separated by ".", each
445	      domain label starting and ending with an alphanumerical character
446	      and possibly also containing "-" characters. The rightmost domain
447	      label will never start with a digit. Domain names are not case
448	      sensitive.

450	         REMOTE_HOST   = "" | hostname
451	         hostname      = *( domainlabel ".") toplabel
452	         domainlabel   = alphadigit [ *alphahypdigit alphadigit ]
453	         toplabel      = alpha [ *alphahypdigit alphadigit ]
454	         alphahypdigit = alphadigit | "-"
455	         alphadigit    = alpha | digit

457	   REMOTE_IDENT

459	      The identity information reported about the connection by a RFC
460	      931 [10] request to the remote agent, if available. The server may
461	      choose not to support this feature, or not to request the data for
462	      efficiency reasons.

464	         REMOTE_IDENT = *CHAR

466	      The data returned is not appropriate for use as authentication
467	      information.

469	   REMOTE_USER

471	      This variable is specific to requests made with HTTP.

473	      If AUTH_TYPE is "Basic", then the user-ID sent by the client. If
474	      AUTH_TYPE is NULL, then NULL, otherwise undefined.

476	         REMOTE_USER = "" | userid | *OCTET
477	         userid      = token

479	   REQUEST_METHOD

481	      This variable is specific to requests made with HTTP.

483	      The method with which the request was made, as described in
484	      section 5.1.1 of the HTTP/1.0 specification [3].

486	         REQUEST_METHOD   = http-method
487	         http-method      = "GET" | "HEAD" | "POST" | extension-method
488	         extension-method = token

490	      The method is case sensitive.

492	   SCRIPT_NAME

494	      A URL path that could identify the CGI script (rather then the
495	      particular CGI output). The syntax and semantics are identical to
496	      a decoded HTTP URL `hpath' token [4].

498	         SCRIPT_NAME = "" | "/" [ path ]

500	      The leading "/" is not part of the path. It is optional if the
501	      path is NULL.

503	      The SCRIPT_NAME string is some leading part of the <path>
504	      component of the script URI derived in some implementation defined
505	      manner.

507	   SERVER_NAME

509	      The name for this server, as used in the <host> part of the script
510	      URI. Thus either a fully qualified domain name, or an IP address.

512	         SERVER_NAME = hostname | hostnumber

514	   SERVER_PORT

516	      The port on which this request was received, as used in the <port>
517	      part of the script URI.

519	         SERVER_PORT = 1*digit

521	   SERVER_PROTOCOL

523	      The name and revision of the information protocol this request
524	      came in with.

526	         SERVER_PROTOCOL   = HTTP-Version | extension-version
527	         HTTP-Version      = "HTTP" "/" 1*digit "." 1*digit
528	         extension-version = protocol "/" 1*digit "." 1*digit
529	         protocol          = 1*( alpha | digit | "+" | "-" | "." )

531	      `protocol' is a version of the <scheme> part of the script URI,
532	      and is not case sensitive. By convention, `protocol' is in upper
533	      case.

535	   SERVER_SOFTWARE

537	      The name and version of the information server software answering
538	      the request (and running the gateway).

540	         SERVER_SOFTWARE = *CHAR

542	6. Invoking the script

544	   This script is invoked in a system defined manner. Unless specified
545	   otherwise, this will be by treating the file containing the script as
546	   an executable, and running it as a child process of the server.

548	7. The CGI script command line

550	   Some systems support a method for supplying a array of strings to the
551	   CGI script. This is only used in the case of an `indexed' query. This
552	   is identified by a "GET" or "HEAD" HTTP request with a URL search
553	   string not containing any unencoded "=" characters. For such a
554	   request, the server should parse the search string into words, using
555	   the rule:

557	      search-string = search-word *( "+" search-word )
558	      search-word   = 1*schar
559	      schar         = xunreserved | escape | xreserved
560	      xunreserved   = alpha | digit | xsafe | extra
561	      xsafe         = "$" | "-" | "_" | "."
562	      xreserved     = ";" | "/" | "?" | ":" | "@" | "&"

564	   After parsing, each word is URL-decoded, optionally encoded in a
565	   system defined manner and then the argument list is set to the list
566	   of words.

568	   If the server cannot create any part of the argument list, then the
569	   server should generate no command line information. For example, the
570	   number of arguments may be greater than operating system or server
571	   limitations, or one of the words may not be representable as an
572	   argument.

574	8. Data input to the CGI script

576	   As there may be a data entity attached to the request, there must be
577	   a system defined method for the script to read this data. Unless
578	   defined otherwise, this will be via the `standard input' file
579	   descriptor.

581	   There will be at least CONTENT_LENGTH bytes available for the script
582	   to read. The script is not obliged to read the data, but it must not
583	   attempt to read more than CONTENT_LENGTH bytes, even if more data is
584	   available.

586	   For non-parsed header (NPH) scripts (see below), the server should
587	   attempt to ensure that the script input comes directly from the
588	   client, with minimal buffering. For all scripts the data will be as
589	   supplied by the client.

591	9. Data output from the CGI script

593	   There must be a system defined method for the script to send data
594	   back to the server or client; a script will always return some data.
595	   Unless defined otherwise, this will be via the `standard output' file
596	   descriptor.

598	   There are two forms of output that the script can give; non-parsed
599	   header (NPH) output, and parsed header output. A server is only
600	   required to support the latter; distinguishing between the two types
601	   of output (or scripts) is implementation defined.

603	9.1. Non-Parsed Header Output

605	   The script must return a complete HTTP response message, as described
606	   in Section 6 of the HTTP specification [3]. Note that this allows an
607	   HTTP/0.9 response to an HTTP/1.0 request.

609	   The server should attempt to ensure that the script output is sent
610	   directly to the client, with minimal buffering.

612	9.2. Parsed Header Output

614	   The script returns a CGI response message.

616	      CGI-Response = *( CGI-Header | HTTP-Header ) NL [ Entity-Body ]
617	      CGI-Header   = Content-type
618	                   | Location
619	                   | Status
620	                   | extension-header

622	   The response comprises headers and a body, separated by a blank line.
623	   The headers are either CGI headers to be interpreted by the server,
624	   or HTTP headers to be included in the response returned to the client
625	   if the request method is HTTP. At least one CGI-Header must be
626	   supplied, but no CGI header can be repeated with the same field-name.

628	   If a body is supplied, then a Content-type header is required,
629	   otherwise the script must send a Location or Status header. If a
630	   Location header is returned, then no HTTP-Headers may be supplied.

632	   The CGI headers have the generic syntax:

634	      generic-header = field-name ":" [ field-value ] NL
635	      field-name     = 1*<any CHAR, excluding CTLs, SP and ":">
636	      field-value    = *( field-content | LWSP )
637	      field-content  = *( token | tspecial | quoted-string )

639	   The field-name is not case sensitive; a NULL field value is
640	   equivalent to the header not being sent.

642	   Content-Type

644	      The Internet Media Type [9] of the entity body, which is to be
645	      sent unmodified to the client.

647	         Content-Type = "Content-Type" ":" media-type NL

649	   Location

651	      This is used to specify to the server that the script is returning
652	      a reference to a document rather than an actual document.

654	         Location         = "Location" ":"
655	                            ( fragment-URI | rel-URL-abs-path ) NL
656	         fragment-URI     = URI [ # fragmentid ]
657	         URI              = scheme ":" *qchar
658	         fragmentid       = *qchar
659	         rel-URL-abs-path = "/" [ hpath ] [ "?" query-string ]
660	         hpath            = fpsegment *( "/" psegment )
661	         fpsegment        = 1*hchar
662	         psegment         = *hchar
663	         hchar            = alpha | digit | safe | extra
664	                          | ":" | "@" | "& | "="

666	      The location value is either an absolute URI with optional
667	      fragment, as defined in RFC 1630 [1], or an absolute path and
668	      optional query-string. If an absolute URI is returned by the
669	      script, then the server will generate a redirect HTTP response
670	      message, and if no entity body is supplied by the script, then the
671	      server will produce one. If the Location value is a path, then the
672	      server will generate the response that it would have produced in
673	      response to a request containing the URL

675	         protocol "://" SERVER_NAME ":" SERVER_PORT rel-URL-abs-path

677	      The location header may only be sent if the REQUEST_METHOD is HEAD
678	      or GET.

680	   Status

682	      The Status header is used to indicate to the server what status
683	      code it will use in the response message. It should not be sent if
684	      the script returns a Location header.

686	         Status        = "Status" ":" 3digit SP reason-phrase NL
687	         reason-phrase = *<CHAR, excluding CTLs, NL>

689	      The valid status codes are listed in section 6.1.1 of the HTTP/1.0
690	      specification [3]. If the script does not return a Status header,
691	      then "200 OK" should be assumed.

693	   HTTP headers

695	      The script may return any other headers defined by the HTTP/1.0
696	      specification [3]. The server must translate the header data from
697	      the CGI header syntax to the HTTP header syntax if these differ.
698	      For example, the character sequence for newline (such as Unix's
699	      ASCII NL) used by CGI scripts may not be the same as that used by
700	      HTTP (ASCII CR followed by LF). The server must also resolve any
701	      conflicts between headers returned by the script and headers that
702	      it would otherwise send itself.

704	10. Requirements for servers

706	   Servers must support the standard mechanism (described below) which
707	   allows the script author to determine what URL to use in documents
708	   which reference the script. Specifically, what URL to use in order to
709	   achieve particular settings of the environment variables. This
710	   mechanism is as follows:

712	   The value for SCRIPT_NAME is governed by the server configuration and
713	   the location of the script in the OS file-system. Given this, any
714	   access to the partial URL

716	      SCRIPT_NAME extra-path ? query-information

718	   where extra-path is either NULL or begins with a "/" and satisfies
719	   any other server requirements, will cause the CGI script to be
720	   executed with PATH_INFO set to the decoded extra-path, and
721	   QUERY_STRING set to query-information (not decoded).

723	   Servers may reject with error 404 any requests that would result in
724	   an encoded "/" being decoded into PATH_INFO or SCRIPT_NAME, as this
725	   might represent a loss of information to the script.

727	   Although the server and the CGI script need not be consistent in
728	   their handling of URL paths (client URLs and the PATH_INFO data,
729	   respectively), server authors may wish to impose consistency.  So the
730	   server implementation should define its behaviour for the following
731	   cases:

733	      o define any restrictions on allowed characters, in particular
734	        whether ASCII NULL is permitted;

736	      o define any restrictions on allowed path segments, in particular
737	        whether non-terminal NULL segments are permitted;

739	      o define the behaviour for "." or ".." path segments; i.e. whether
740	        they are prohibited, treated as ordinary path segments or
741	        interpreted in accordance with the relative URL specification
742	        [7];

744	      o define any limits of the implementation, including limits on
745	        path or search string lengths, and limits on the volume of
746	        headers the server will parse.

748	   Servers may generate the script URI in any way from the client URI,
749	   or from any other data (but the behaviour should be documented).

751	11. Recommendations for scripts

753	   Scripts should reject unexpected methods (such as DELETE etc.) with
754	   error 405 Method Not Allowed. If the script does not intend
755	   processing the PATH_INFO data, then it should reject the request with
756	   404 Not Found if PATH_INFO is not NULL.

758	   If the output of a form is being processed, check that CONTENT_TYPE
759	   is "application/x-www-form-urlencoded" [2].

761	   If parsing PATH_INFO, PATH_TRANSLATED or SCRIPT_NAME then be careful
762	   of void path segments ("//") and special path segments ("." and
763	   ".."). They should either be removed from the path before use in OS
764	   system calls, or the request should be rejected with 404 Not Found.
765	   It is very unlikely that any other use could be made of these.

767	   As it is impossible for the script to determine the client URI that
768	   initiated this request without knowledge of the specific server in
769	   use, the script should not return text/html documents containing
770	   relative URL links without including a <BASE> tag in the document.

772	   When returning headers, the script should try to send the CGI headers
773	   as soon as possible, and preferably before any HTTP headers. This may
774	   help reduce the server's memory requirements.

776	12. System specifications

778	12.1. AmigaDOS

780	   Environment variables

782	      These are accessed by the DOS library routine GetVar. The flags
783	      argument should be 0. Case is ignored, but upper case is
784	      recommended for compatibility with case-sensitive systems.

786	   The current working directory

788	      The current working directory for the script is set to the
789	      directory containing the script.

791	   Character set

793	      The US-ASCII character set is used for the definition of
794	      environment variables and headers; the newline (NL) sequence is CR
795	      LF.

797	12.2. Unix

799	   For Unix compatible operating systems, the following are defined:

801	   Environment variables

803	      These are accessed by the C library routine getenv.

805	   The command line

807	      This is accessed using the the argc and argv arguments to main().
808	      The words are have any characters which are `active' in the Bourne
809	      shell escaped with a backslash.

811	   The current working directory

813	      The current working directory for the script is set to the
814	      directory containing the script.

816	   Character set

818	      The US-ASCII character set is used for the definition of
819	      environment variables and headers; the newline (NL) sequence is
820	      LF; servers should also accept CR LF as a newline.

822	13. Security Considerations

824	13.1. Safe Methods

826	   As discussed in the security considerations of the HTTP specification
827	   [3], the convention has been established that the GET and HEAD
828	   methods should be `safe'; they should cause no side-effects and only
829	   have the significance of resource retrieval.

831	13.2. HTTP headers containing sensitive information

833	   Some HTTP headers may carry sensitive information which the server
834	   should not pass on to the script unless explicitly configured to do
835	   so. For example, if the server protects the script using the Basic
836	   authentication scheme, then the client will send an Authorization
837	   header containing a username and password. If the server, rather than
838	   the script, validates this information then it should not pass on the
839	   password via the HTTP_AUTHORIZATION environment variable.

841	13.3. Script interference with the server

843	   The most common implementation of CGI invokes the script as a child
844	   process using the same user and group as the server process. It
845	   should therefore be ensured that the script cannot interfere with the
846	   server process, its configuration or documents.

848	   If the script is executed by calling a function linked in to the
849	   server software (either at compile-time or run-time) then precautions
850	   should be taken to protect the core memory of the server, or to
851	   ensure that untrusted code cannot be executed.

853	14. Acknowledgements

855	   This work is based on the original CGI interface that arose out of
856	   discussions on the www-talk mailing list. In particular, Rob McCool,
857	   John Franks, Ari Luotonen, George Phillips and Tony Sanders deserve
858	   special recognition for their efforts in defining and implementing
859	   the early versions of this interface.

861	   This document has also greatly benefited from the comments and
862	   suggestions made Chris Adie, Dave Kristol and Mike Meyer.

864	15. References

866	   [1]  Berners-Lee, T., `Universal Resource Identifiers in WWW: A
867	        Unifying Syntax for the Expression of Names and Addresses of
868	        Objects on the Network as used in the World-Wide Web', RFC 1630,
869	        CERN, June 1994.

871	   [2]  Berners-Lee, T. and Connolly, D., `Hypertext Markup Language -
872	        2.0', RFC 1866, MIT/W3C, November 1995.

874	   [3]  Berners-Lee, T., Fielding, R. T. and Frystyk Nielsen, H.,
875	        `Hypertext Transfer Protocol -- HTTP/1.0', Work in progress
876	        (draft-ietf-http-v10-spec-04.txt), MIT/LCS, UC Irvine, October
877	        1995.

879	   [4]  Berners-Lee, T., Masinter, L. and McCahill, M., Editors,
880	        `Uniform Resource Locators (URL)', RFC 1738, CERN, Xerox
881	        Corporation, University of Minnesota, December 1994.

883	   [5]  Braden, R., Editor, `Requirements for Internet Hosts --
884	        Application and Support', STD 3, RFC 1123, IETF, October 1989.

886	   [6]  Crocker, D.H., `Standard for the Format of ARPA Internet Text
887	        Messages', STD 11, RFC 822, University of Delaware, August 1982.

889	   [7]  Fielding, R., `Relative Uniform Resource Locators', RFC 1808, UC
890	        Irving, June 1995.

892	   [8]  Mockapetris, P., `Domain Names - Concepts and Facilities', STD
893	        13, RFC 1034, ISI, November 1987.

895	   [9]  Postel, J., `Media Type Registration Procedure', RFC 1590, ISI,
896	        March 1994.

898	   [10] StJohns, M., `Authentication Server', RFC 931, TPSC, January
899	        1985.

901	   [11] `Coded Character Set -- 7-bit American Standard Code for
902	        Information Interchange', ANSI X3.4-1986.

904	16. Author's Address

906	      David Robinson
907	      Institute of Astronomy
908	      University of Cambridge
909	      Madingley Road
910	      Cambridge CB3 0HA
911	      UK

913	      Tel: +44 (1223) 337528
914	      Fax: +44 (1223) 337523
915	      EMail: drtr@ast.cam.ac.uk