idnits 2.17.1 draft-robinson-www-interface-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (15 February 1996) is 10298 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 1630 (ref. '1') ** Obsolete normative reference: RFC 1866 (ref. '2') (Obsoleted by RFC 2854) ** Downref: Normative reference to an Informational draft: draft-ietf-http-v10-spec (ref. '3') ** Obsolete normative reference: RFC 1738 (ref. '4') (Obsoleted by RFC 4248, RFC 4266) ** Obsolete normative reference: RFC 822 (ref. '6') (Obsoleted by RFC 2822) ** Obsolete normative reference: RFC 1808 (ref. '7') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 1590 (ref. '9') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 931 (ref. '10') (Obsoleted by RFC 1413) -- Possible downref: Non-RFC (?) normative reference: ref. '11' Summary: 17 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT D.R.T. Robinson 2 University of Cambridge 3 Expires 15 August 1996 15 February 1996 5 The WWW Common Gateway Interface Version 1.1 7 Status of this memo 9 This document is an Internet-Draft. Internet-Drafts are working 10 documents of the Internet Engineering Task Force (IETF), its areas 11 and its working groups. Note that other groups may also distribute 12 working documents as Internet-Drafts. 14 Internet-Drafts are draft documents valid for a maximum of six months 15 and may be updated, replaced or obsoleted by other documents at any 16 time. It is inappropriate to use Internet-Drafts as reference 17 material or to cite them other than as `work in progress'. 19 To learn the current status of any Internet-Draft, please check the 20 `1id-abstracts.txt' listing contained in the Internet-Drafts Shadow 21 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 22 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 23 ftp.isi.edu (US West Coast). 25 Distribution of this document is unlimited. Please send comments to 26 the author; general discussion about CGI should take place on the 27 mailing list. 29 Abstract 31 The Common Gateway Interface (CGI) is a simple interface for running 32 external programs, software or gateways under an information server 33 in a platform-independent manner. Currently, the supported 34 information servers are HTTP servers. 36 The interface has been in use by the World-Wide Web since 1993. This 37 specification defines the interface known as `CGI/1.1', and its use 38 on the Unix(R) and AmigaDOS(tm) systems. 40 1. Introduction 42 1.1. Purpose 44 Together the HTTP [3] server and the CGI script are responsible for 45 servicing a client request by sending back responses. The client 46 request comprises a Universal Resource Identifier (URI) [1], a 47 request method and various ancillary information about the request 48 provided by the transport mechanism. 50 The CGI defines the abstract parameters, known as environment 51 variables, which describe the client's request. Together with a 52 concrete programmer interface this specifies a platform-independent 53 interface between the script and the HTTP server. 55 1.2. Requirements 57 This specification uses the same words as RFC 1123 [5] to define the 58 significance of each particular requirement. These are: 60 must 62 This word or the adjective `required' means that the item is an 63 absolute requirement of the specification. 65 should 67 This word or the adjective `recommended' means that there may 68 exist valid reasons in particular circumstances to ignore this 69 item, but the full implications should be understood and the case 70 carefully weighed before choosing a different course. 72 may 74 This word or the adjective `optional' means that this item is 75 truly optional. One vendor may choose to include the item because 76 a particular marketplace requires it or because it enhances the 77 product, for example; another vendor may omit the same item. 79 An implementation is not compliant if it fails to satisfy one or more 80 of the `must' requirements for the protocols it implements. An 81 implementation that satisfies all of the `must' and all of the 82 `should' requirements for its features is said to be `unconditionally 83 compliant'; one that satisfies all of the `must' requirements but not 84 all of the `should' requirements for its features is said to be 85 `conditionally compliant'. 87 1.3. Specifications 89 Not all of the functions and features of the CGI are defined in the 90 main part of this specification. The following phrases are used to 91 describe the features which are not specified: 93 system defined 95 The feature may differ between systems, but must be the same for 96 different implementations using the same system. A system will 97 usually identify a class of operating-systems. Some systems are 98 defined in section 12 of this document. New systems may be defined 99 by new specifications without revision of this document. 101 implementation defined 103 The behaviour of the feature may vary from implementation to 104 implementation, but a particular implementation must document its 105 behaviour. 107 1.4. Terminology 109 This specification uses many terms defined in the HTTP/1.0 110 specification [3]; however, the following terms are used here in a 111 sense which may not accord with their definitions in that document, 112 or with their common meaning. 114 environment variable 116 A named parameter that carries information from the server to the 117 script. It is not necessarily a variable in the operating-system's 118 environment, although that is the most common implementation. 120 script 122 The software which is invoked by the server via this interface. It 123 need not be a standalone program, but could be a 124 dynamically-loaded or shared library, or even a subroutine in the 125 server. 127 server 129 The application program which invokes the script in order to 130 service requests. 132 2. Notational Conventions and Generic Grammar 134 2.1. Augmented BNF 136 All of the mechanisms specified in this document are described in 137 both prose and an augmented Backus-Naur Form (BNF) similar to that 138 used by RFC 822 [6]. This augmented BNF contains the following 139 constructs: 141 name = definition 143 The name of a rule is simply the name itself; it is separated from 144 the definition by the equal character ("="). Whitespace is only 145 significant in that continuation lines of a definition are 146 indented. 148 "literal" 150 Quotation marks (") surround literal text, except for a literal 151 quotation mark, which is surrounded by angle-brackets ("<" and 152 ">"). Unless stated otherwise, the text is case-sensitive. 154 rule1 | rule2 156 Alternative rules are separated by a vertical bar ("|"). 158 (rule1 rule2 rule3) 160 Elements enclosed in parentheses are treated as a single element. 162 *rule 164 A rule preceded by an asterisk ("*") may have zero or more 165 occurrences. A rule preceded by an integer followed by an asterisk 166 must occur at least the specified number of times. 168 [rule] 170 A element enclosed in square brackets ("[" and "]") is optional. 172 2.2. Basic Rules 174 The following rules are used throughout this specification to 175 describe basic parsing constructs. 177 alpha = lowalpha | hialpha 178 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" 179 | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" 180 | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" 181 | "y" | "z" 182 hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" 183 | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" 184 | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" 185 | "Y" | "Z" 186 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" 187 | "8" | "9" 188 OCTET = 189 CHAR = 190 CTL = 191 SP = 192 NL = 193 LWSP = SP | NL | 194 tspecial = "(" | ")" | "@" | "," | ";" | ":" | "\" | <"> 195 | "/" | "[" | "]" | "?" | SP 196 token = 1* 197 quoted-string = ( <"> *qdtext <"> ) | ( "<" *qatext ">") 198 qdtext = and CTLs but including LWSP> 199 qatext = " and CTLs but 200 including LWSP> 202 Note that newline (NL) need not be a single character, but can be a 203 character sequence. 205 3. URL Encoding 207 Some variables and constructs used here are described as being 208 `URL-encoded'. This encoding is described in section 2.2 of RFC 1738 209 [4]. In a URL encoded string an escape sequence consists of a percent 210 character ("%") followed by two hexadecimal digits, where the two 211 hexadecimal digits form an octet. An escape sequence represents the 212 graphic character which has the octet as its code within the US-ASCII 213 [11] coded character set, if it exists. If no such graphic character 214 exists, then the escape sequence represents the octet value itself. 216 Note that some unsafe characters may have different semantics if they 217 are encoded. The definition of which characters are unsafe depends on 218 the context. 220 4. The Script URI 222 A `Script URI' can be defined; this describes the resource identified 223 by the environment variables. Often, this URI will be the same as the 224 URI requested by the client (the `Client URI'); however, it need not 225 be. Instead, it could be a URI invented by the server, and so it can 226 only be used in the context of the server and its CGI interface. 228 The script URI has the syntax of generic-RL as defined in section 2.1 229 of RFC 1808 [7], with the exception that object parameters and 230 fragment identifiers are not permitted: 232 ://:/? 234 The various components of the script URI are defined by some of the 235 environment variables (see below); 237 script-uri = protocol "://" SERVER_NAME ":" SERVER_PORT enc-script 238 enc-path-info "?" QUERY_STRING 240 where `protocol' is found from SERVER_PROTOCOL, `enc-script' is a 241 URL-encoded version of SCRIPT_NAME and `enc-path-info' is a 242 URL-encoded version of PATH_INFO. 244 5. Environment variables 246 Environment variables are used to pass data about the request from 247 the server to the script. They are accessed by the script in a system 248 defined manner. In all cases, a missing environment variable is 249 equivalent to a zero-length (NULL) value, and vice versa. The 250 representation of the characters in the environment variables is 251 system defined. 253 Case is not significant in the names, in that there cannot be two 254 different variable whose names differ in case only. Here they are 255 shown using a canonical representation of capitals plus underscore 256 ("_"). The actual representation of the names is system defined; for 257 a particular system the representation may be defined differently to 258 this. 260 The variables are: 262 AUTH_TYPE 263 CONTENT_LENGTH 264 CONTENT_TYPE 265 GATEWAY_INTERFACE 266 HTTP_* 267 PATH_INFO 268 PATH_TRANSLATED 269 QUERY_STRING 270 REMOTE_ADDR 271 REMOTE_HOST 272 REMOTE_IDENT 273 REMOTE_USER 274 REQUEST_METHOD 275 SCRIPT_NAME 276 SERVER_NAME 277 SERVER_PORT 278 SERVER_PROTOCOL 279 SERVER_SOFTWARE 281 AUTH_TYPE 283 This variable is specific to requests made with HTTP. 285 If the script URI would require access authentication for external 286 access, then this variable is found from the `auth-scheme' token 287 in the request, otherwise NULL. 289 AUTH_TYPE = "" | auth-scheme 290 auth-scheme = "Basic" | token 292 HTTP access authentication schemes are described in section 11 of 293 the HTTP/1.0 specification [3]. The auth-scheme is not 294 case-sensitive. 296 CONTENT_LENGTH 298 The size of the entity attached to the request, if any, in decimal 299 number of octets. If no data is attached, then NULL. The syntax is 300 the same as the HTTP Content-Length header (section 10, HTTP/1.0 301 specification [3]). 303 CONTENT_LENGTH = "" | [ 1*digit ] 305 CONTENT_TYPE 307 The Internet Media Type [9] of the attached entity. The syntax is 308 the same as the HTTP Content-Type header. 310 CONTENT_TYPE = "" | media-type 311 media-type = type "/" subtype *( ";" parameter) 312 type = token 313 subtype = token 314 parameter = attribute "=" value 315 attribute = token 316 value = token | quoted-string 318 The type, subtype and parameter attribute names are not 319 case-sensitive. Parameter values may be case sensitive. Media 320 types and their use in HTTP are described section 3.6 of the 321 HTTP/1.0 specification [3]. Example: 323 application/x-www-form-urlencoded 325 There is no default value for this variable. If and only if it is 326 unset, then the script may attempt to determine the media type 327 from the data received. If the type remains unknown, then 328 application/octet-stream should be assumed. 330 GATEWAY_INTERFACE 332 The version of the CGI specification to which this server 333 complies. Syntax: 335 GATEWAY_INTERFACE = "CGI" "/" 1*digit "." 1*digit 337 Note that the major and minor numbers are treated as separate 338 integers and that each may be incremented higher than a single 339 digit. Thus CGI/2.4 is a lower version than CGI/2.13 which in 340 turn is lower than CGI/12.3. Leading zeros must be ignored by 341 scripts and should never be generated by servers. 343 This document defines the 1.1 version of the CGI interface. 345 HTTP_* 347 These variables are specific to requests made with HTTP. 348 Interpretation of these variables may depend on the value of 349 SERVER_PROTOCOL. 351 Environment variables with names beginning with "HTTP_" contain 352 header data read from the client, if the protocol used was HTTP. 353 The HTTP header name is converted to upper case, has all 354 occurrences of "-" replaced with "_" and has "HTTP_" prepended to 355 give the environment variable name. The header data may be 356 presented as sent by the client, or may be rewritten in ways which 357 do not change its semantics. If multiple headers with the same 358 field-name are received then they must be rewritten as a single 359 header having the same semantics. Similarly, a header that is 360 received on more than one line must be merged onto a single line. 361 The server must, if necessary, change the representation of the 362 data (for example, the character set) to be appropriate for a CGI 363 environment variable. 365 The server is not required to create environment variables for all 366 the headers that it receives. In particular, it may remove any 367 headers carrying authentication information, such as 368 "Authorization"; it may remove headers whose value is available to 369 the script via other variables, such as "Content-Length" and 370 "Content-Type". 372 PATH_INFO 374 A path to be interpreted by the CGI script. It identifies the 375 resource or sub-resource to be returned by the CGI script. The 376 syntax and semantics are similar to a decoded HTTP URL `hpath' 377 token (defined in RFC 1738 [4]), with the exception that a 378 PATH_INFO of "/" represents a single void path segment. Otherwise, 379 the leading "/" character is not part of the path. 381 PATH_INFO = "" | "/" path 382 path = segment *( "/" segment ) 383 segment = *pchar 384 pchar = 386 The PATH_INFO string is the trailing part of the component 387 of the script URI that follows the SCRIPT_NAME part of the path. 389 PATH_TRANSLATED 391 The OS path to the file that the server would attempt to access 392 were the client to request the absolute URL containing the path 393 PATH_INFO. i.e for a request of 395 protocol "://" SERVER_NAME ":" SERVER_PORT enc-path-info 397 where `enc-path-info' is a URL-encoded version of PATH_INFO. If 398 PATH_INFO is NULL then PATH_TRANSLATED is set to NULL. 400 PATH_TRANSLATED = *CHAR 402 PATH_TRANSLATED need not be supported by the server. The server 403 may choose to set PATH_TRANSLATED to NULL for reasons of security, 404 or because the path would not be interpretable by a CGI script; 405 such as the object it represented was internal to the server and 406 not visible in the file-system; or for any other reason. 408 The algorithm the server uses to derive PATH_TRANSLATED is 409 obviously implementation defined; CGI scripts which use this 410 variable may suffer limited portability. 412 QUERY_STRING 414 A URL-encoded search string; the part of the script URI. 416 QUERY_STRING = query-string 417 query-string = *qchar 418 qchar = unreserved | escape | reserved 419 unreserved = alpha | digit | safe | extra 420 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" 421 safe = "$" | "-" | "_" | "." | "+" 422 extra = "!" | "*" | "'" | "(" | ")" | "," 423 escape = "%" hex hex 424 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" 425 | "b" | "c" | "d" | "e" | "f" 427 The URL syntax for a search string is described in RFC 1738 [4]. 429 REMOTE_ADDR 431 The IP address of the agent sending the request to the server. Not 432 necessarily that of the client. 434 REMOTE_ADDR = hostnumber 435 hostnumber = digits "." digits "." digits "." digits 436 digits = 1*digit 438 REMOTE_HOST 440 The fully qualified domain name of the agent sending the request 441 to the server, if available, otherwise NULL. Not necessarily that 442 of the client. Fully qualified domain names take the form as 443 described in section 3.5 of RFC 1034 [8] and section 2.1 of RFC 444 1123 [5]; a sequence of domain labels separated by ".", each 445 domain label starting and ending with an alphanumerical character 446 and possibly also containing "-" characters. The rightmost domain 447 label will never start with a digit. Domain names are not case 448 sensitive. 450 REMOTE_HOST = "" | hostname 451 hostname = *( domainlabel ".") toplabel 452 domainlabel = alphadigit [ *alphahypdigit alphadigit ] 453 toplabel = alpha [ *alphahypdigit alphadigit ] 454 alphahypdigit = alphadigit | "-" 455 alphadigit = alpha | digit 457 REMOTE_IDENT 459 The identity information reported about the connection by a RFC 460 931 [10] request to the remote agent, if available. The server may 461 choose not to support this feature, or not to request the data for 462 efficiency reasons. 464 REMOTE_IDENT = *CHAR 466 The data returned is not appropriate for use as authentication 467 information. 469 REMOTE_USER 471 This variable is specific to requests made with HTTP. 473 If AUTH_TYPE is "Basic", then the user-ID sent by the client. If 474 AUTH_TYPE is NULL, then NULL, otherwise undefined. 476 REMOTE_USER = "" | userid | *OCTET 477 userid = token 479 REQUEST_METHOD 481 This variable is specific to requests made with HTTP. 483 The method with which the request was made, as described in 484 section 5.1.1 of the HTTP/1.0 specification [3]. 486 REQUEST_METHOD = http-method 487 http-method = "GET" | "HEAD" | "POST" | extension-method 488 extension-method = token 490 The method is case sensitive. 492 SCRIPT_NAME 494 A URL path that could identify the CGI script (rather then the 495 particular CGI output). The syntax and semantics are identical to 496 a decoded HTTP URL `hpath' token [4]. 498 SCRIPT_NAME = "" | "/" [ path ] 500 The leading "/" is not part of the path. It is optional if the 501 path is NULL. 503 The SCRIPT_NAME string is some leading part of the 504 component of the script URI derived in some implementation defined 505 manner. 507 SERVER_NAME 509 The name for this server, as used in the part of the script 510 URI. Thus either a fully qualified domain name, or an IP address. 512 SERVER_NAME = hostname | hostnumber 514 SERVER_PORT 516 The port on which this request was received, as used in the 517 part of the script URI. 519 SERVER_PORT = 1*digit 521 SERVER_PROTOCOL 523 The name and revision of the information protocol this request 524 came in with. 526 SERVER_PROTOCOL = HTTP-Version | extension-version 527 HTTP-Version = "HTTP" "/" 1*digit "." 1*digit 528 extension-version = protocol "/" 1*digit "." 1*digit 529 protocol = 1*( alpha | digit | "+" | "-" | "." ) 531 `protocol' is a version of the part of the script URI, 532 and is not case sensitive. By convention, `protocol' is in upper 533 case. 535 SERVER_SOFTWARE 537 The name and version of the information server software answering 538 the request (and running the gateway). 540 SERVER_SOFTWARE = *CHAR 542 6. Invoking the script 544 This script is invoked in a system defined manner. Unless specified 545 otherwise, this will be by treating the file containing the script as 546 an executable, and running it as a child process of the server. 548 7. The CGI script command line 550 Some systems support a method for supplying a array of strings to the 551 CGI script. This is only used in the case of an `indexed' query. This 552 is identified by a "GET" or "HEAD" HTTP request with a URL search 553 string not containing any unencoded "=" characters. For such a 554 request, the server should parse the search string into words, using 555 the rule: 557 search-string = search-word *( "+" search-word ) 558 search-word = 1*schar 559 schar = xunreserved | escape | xreserved 560 xunreserved = alpha | digit | xsafe | extra 561 xsafe = "$" | "-" | "_" | "." 562 xreserved = ";" | "/" | "?" | ":" | "@" | "&" 564 After parsing, each word is URL-decoded, optionally encoded in a 565 system defined manner and then the argument list is set to the list 566 of words. 568 If the server cannot create any part of the argument list, then the 569 server should generate no command line information. For example, the 570 number of arguments may be greater than operating system or server 571 limitations, or one of the words may not be representable as an 572 argument. 574 8. Data input to the CGI script 576 As there may be a data entity attached to the request, there must be 577 a system defined method for the script to read this data. Unless 578 defined otherwise, this will be via the `standard input' file 579 descriptor. 581 There will be at least CONTENT_LENGTH bytes available for the script 582 to read. The script is not obliged to read the data, but it must not 583 attempt to read more than CONTENT_LENGTH bytes, even if more data is 584 available. 586 For non-parsed header (NPH) scripts (see below), the server should 587 attempt to ensure that the script input comes directly from the 588 client, with minimal buffering. For all scripts the data will be as 589 supplied by the client. 591 9. Data output from the CGI script 593 There must be a system defined method for the script to send data 594 back to the server or client; a script will always return some data. 595 Unless defined otherwise, this will be via the `standard output' file 596 descriptor. 598 There are two forms of output that the script can give; non-parsed 599 header (NPH) output, and parsed header output. A server is only 600 required to support the latter; distinguishing between the two types 601 of output (or scripts) is implementation defined. 603 9.1. Non-Parsed Header Output 605 The script must return a complete HTTP response message, as described 606 in Section 6 of the HTTP specification [3]. Note that this allows an 607 HTTP/0.9 response to an HTTP/1.0 request. 609 The server should attempt to ensure that the script output is sent 610 directly to the client, with minimal buffering. 612 9.2. Parsed Header Output 614 The script returns a CGI response message. 616 CGI-Response = *( CGI-Header | HTTP-Header ) NL [ Entity-Body ] 617 CGI-Header = Content-type 618 | Location 619 | Status 620 | extension-header 622 The response comprises headers and a body, separated by a blank line. 623 The headers are either CGI headers to be interpreted by the server, 624 or HTTP headers to be included in the response returned to the client 625 if the request method is HTTP. At least one CGI-Header must be 626 supplied, but no CGI header can be repeated with the same field-name. 628 If a body is supplied, then a Content-type header is required, 629 otherwise the script must send a Location or Status header. If a 630 Location header is returned, then no HTTP-Headers may be supplied. 632 The CGI headers have the generic syntax: 634 generic-header = field-name ":" [ field-value ] NL 635 field-name = 1* 636 field-value = *( field-content | LWSP ) 637 field-content = *( token | tspecial | quoted-string ) 639 The field-name is not case sensitive; a NULL field value is 640 equivalent to the header not being sent. 642 Content-Type 644 The Internet Media Type [9] of the entity body, which is to be 645 sent unmodified to the client. 647 Content-Type = "Content-Type" ":" media-type NL 649 Location 651 This is used to specify to the server that the script is returning 652 a reference to a document rather than an actual document. 654 Location = "Location" ":" 655 ( fragment-URI | rel-URL-abs-path ) NL 656 fragment-URI = URI [ # fragmentid ] 657 URI = scheme ":" *qchar 658 fragmentid = *qchar 659 rel-URL-abs-path = "/" [ hpath ] [ "?" query-string ] 660 hpath = fpsegment *( "/" psegment ) 661 fpsegment = 1*hchar 662 psegment = *hchar 663 hchar = alpha | digit | safe | extra 664 | ":" | "@" | "& | "=" 666 The location value is either an absolute URI with optional 667 fragment, as defined in RFC 1630 [1], or an absolute path and 668 optional query-string. If an absolute URI is returned by the 669 script, then the server will generate a redirect HTTP response 670 message, and if no entity body is supplied by the script, then the 671 server will produce one. If the Location value is a path, then the 672 server will generate the response that it would have produced in 673 response to a request containing the URL 675 protocol "://" SERVER_NAME ":" SERVER_PORT rel-URL-abs-path 677 The location header may only be sent if the REQUEST_METHOD is HEAD 678 or GET. 680 Status 682 The Status header is used to indicate to the server what status 683 code it will use in the response message. It should not be sent if 684 the script returns a Location header. 686 Status = "Status" ":" 3digit SP reason-phrase NL 687 reason-phrase = * 689 The valid status codes are listed in section 6.1.1 of the HTTP/1.0 690 specification [3]. If the script does not return a Status header, 691 then "200 OK" should be assumed. 693 HTTP headers 695 The script may return any other headers defined by the HTTP/1.0 696 specification [3]. The server must translate the header data from 697 the CGI header syntax to the HTTP header syntax if these differ. 698 For example, the character sequence for newline (such as Unix's 699 ASCII NL) used by CGI scripts may not be the same as that used by 700 HTTP (ASCII CR followed by LF). The server must also resolve any 701 conflicts between headers returned by the script and headers that 702 it would otherwise send itself. 704 10. Requirements for servers 706 Servers must support the standard mechanism (described below) which 707 allows the script author to determine what URL to use in documents 708 which reference the script. Specifically, what URL to use in order to 709 achieve particular settings of the environment variables. This 710 mechanism is as follows: 712 The value for SCRIPT_NAME is governed by the server configuration and 713 the location of the script in the OS file-system. Given this, any 714 access to the partial URL 716 SCRIPT_NAME extra-path ? query-information 718 where extra-path is either NULL or begins with a "/" and satisfies 719 any other server requirements, will cause the CGI script to be 720 executed with PATH_INFO set to the decoded extra-path, and 721 QUERY_STRING set to query-information (not decoded). 723 Servers may reject with error 404 any requests that would result in 724 an encoded "/" being decoded into PATH_INFO or SCRIPT_NAME, as this 725 might represent a loss of information to the script. 727 Although the server and the CGI script need not be consistent in 728 their handling of URL paths (client URLs and the PATH_INFO data, 729 respectively), server authors may wish to impose consistency. So the 730 server implementation should define its behaviour for the following 731 cases: 733 o define any restrictions on allowed characters, in particular 734 whether ASCII NULL is permitted; 736 o define any restrictions on allowed path segments, in particular 737 whether non-terminal NULL segments are permitted; 739 o define the behaviour for "." or ".." path segments; i.e. whether 740 they are prohibited, treated as ordinary path segments or 741 interpreted in accordance with the relative URL specification 742 [7]; 744 o define any limits of the implementation, including limits on 745 path or search string lengths, and limits on the volume of 746 headers the server will parse. 748 Servers may generate the script URI in any way from the client URI, 749 or from any other data (but the behaviour should be documented). 751 11. Recommendations for scripts 753 Scripts should reject unexpected methods (such as DELETE etc.) with 754 error 405 Method Not Allowed. If the script does not intend 755 processing the PATH_INFO data, then it should reject the request with 756 404 Not Found if PATH_INFO is not NULL. 758 If the output of a form is being processed, check that CONTENT_TYPE 759 is "application/x-www-form-urlencoded" [2]. 761 If parsing PATH_INFO, PATH_TRANSLATED or SCRIPT_NAME then be careful 762 of void path segments ("//") and special path segments ("." and 763 ".."). They should either be removed from the path before use in OS 764 system calls, or the request should be rejected with 404 Not Found. 765 It is very unlikely that any other use could be made of these. 767 As it is impossible for the script to determine the client URI that 768 initiated this request without knowledge of the specific server in 769 use, the script should not return text/html documents containing 770 relative URL links without including a tag in the document. 772 When returning headers, the script should try to send the CGI headers 773 as soon as possible, and preferably before any HTTP headers. This may 774 help reduce the server's memory requirements. 776 12. System specifications 778 12.1. AmigaDOS 780 Environment variables 782 These are accessed by the DOS library routine GetVar. The flags 783 argument should be 0. Case is ignored, but upper case is 784 recommended for compatibility with case-sensitive systems. 786 The current working directory 788 The current working directory for the script is set to the 789 directory containing the script. 791 Character set 793 The US-ASCII character set is used for the definition of 794 environment variables and headers; the newline (NL) sequence is CR 795 LF. 797 12.2. Unix 799 For Unix compatible operating systems, the following are defined: 801 Environment variables 803 These are accessed by the C library routine getenv. 805 The command line 807 This is accessed using the the argc and argv arguments to main(). 808 The words are have any characters which are `active' in the Bourne 809 shell escaped with a backslash. 811 The current working directory 813 The current working directory for the script is set to the 814 directory containing the script. 816 Character set 818 The US-ASCII character set is used for the definition of 819 environment variables and headers; the newline (NL) sequence is 820 LF; servers should also accept CR LF as a newline. 822 13. Security Considerations 824 13.1. Safe Methods 826 As discussed in the security considerations of the HTTP specification 827 [3], the convention has been established that the GET and HEAD 828 methods should be `safe'; they should cause no side-effects and only 829 have the significance of resource retrieval. 831 13.2. HTTP headers containing sensitive information 833 Some HTTP headers may carry sensitive information which the server 834 should not pass on to the script unless explicitly configured to do 835 so. For example, if the server protects the script using the Basic 836 authentication scheme, then the client will send an Authorization 837 header containing a username and password. If the server, rather than 838 the script, validates this information then it should not pass on the 839 password via the HTTP_AUTHORIZATION environment variable. 841 13.3. Script interference with the server 843 The most common implementation of CGI invokes the script as a child 844 process using the same user and group as the server process. It 845 should therefore be ensured that the script cannot interfere with the 846 server process, its configuration or documents. 848 If the script is executed by calling a function linked in to the 849 server software (either at compile-time or run-time) then precautions 850 should be taken to protect the core memory of the server, or to 851 ensure that untrusted code cannot be executed. 853 14. Acknowledgements 855 This work is based on the original CGI interface that arose out of 856 discussions on the www-talk mailing list. In particular, Rob McCool, 857 John Franks, Ari Luotonen, George Phillips and Tony Sanders deserve 858 special recognition for their efforts in defining and implementing 859 the early versions of this interface. 861 This document has also greatly benefited from the comments and 862 suggestions made Chris Adie, Dave Kristol and Mike Meyer. 864 15. References 866 [1] Berners-Lee, T., `Universal Resource Identifiers in WWW: A 867 Unifying Syntax for the Expression of Names and Addresses of 868 Objects on the Network as used in the World-Wide Web', RFC 1630, 869 CERN, June 1994. 871 [2] Berners-Lee, T. and Connolly, D., `Hypertext Markup Language - 872 2.0', RFC 1866, MIT/W3C, November 1995. 874 [3] Berners-Lee, T., Fielding, R. T. and Frystyk Nielsen, H., 875 `Hypertext Transfer Protocol -- HTTP/1.0', Work in progress 876 (draft-ietf-http-v10-spec-04.txt), MIT/LCS, UC Irvine, October 877 1995. 879 [4] Berners-Lee, T., Masinter, L. and McCahill, M., Editors, 880 `Uniform Resource Locators (URL)', RFC 1738, CERN, Xerox 881 Corporation, University of Minnesota, December 1994. 883 [5] Braden, R., Editor, `Requirements for Internet Hosts -- 884 Application and Support', STD 3, RFC 1123, IETF, October 1989. 886 [6] Crocker, D.H., `Standard for the Format of ARPA Internet Text 887 Messages', STD 11, RFC 822, University of Delaware, August 1982. 889 [7] Fielding, R., `Relative Uniform Resource Locators', RFC 1808, UC 890 Irving, June 1995. 892 [8] Mockapetris, P., `Domain Names - Concepts and Facilities', STD 893 13, RFC 1034, ISI, November 1987. 895 [9] Postel, J., `Media Type Registration Procedure', RFC 1590, ISI, 896 March 1994. 898 [10] StJohns, M., `Authentication Server', RFC 931, TPSC, January 899 1985. 901 [11] `Coded Character Set -- 7-bit American Standard Code for 902 Information Interchange', ANSI X3.4-1986. 904 16. Author's Address 906 David Robinson 907 Institute of Astronomy 908 University of Cambridge 909 Madingley Road 910 Cambridge CB3 0HA 911 UK 913 Tel: +44 (1223) 337528 914 Fax: +44 (1223) 337523 915 EMail: drtr@ast.cam.ac.uk