idnits 2.17.1 draft-rfced-info-coar-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-03-28) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 17 longer pages, the longest (page 1) being 60 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 36 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** There are 4 instances of lines with control characters in the document. ** The abstract seems to contain references ([NCSA-CGI]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 104: '... MUST...' RFC 2119 keyword, line 107: '... SHOULD...' RFC 2119 keyword, line 112: '... MAY...' RFC 2119 keyword, line 744: '... the server MUST NOT abort the sc...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'NCSA-CGI' on line 47 ** Downref: Normative reference to an Informational RFC: RFC 1630 (ref. '1') ** Obsolete normative reference: RFC 1866 (ref. '2') (Obsoleted by RFC 2854) ** Downref: Normative reference to an Informational RFC: RFC 1945 (ref. '3') ** Obsolete normative reference: RFC 1738 (ref. '4') (Obsoleted by RFC 4248, RFC 4266) ** Obsolete normative reference: RFC 822 (ref. '6') (Obsoleted by RFC 2822) ** Obsolete normative reference: RFC 1808 (ref. '7') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2068 (ref. '8') (Obsoleted by RFC 2616) ** Downref: Normative reference to an Informational RFC: RFC 1431 (ref. '11') -- Possible downref: Non-RFC (?) normative reference: ref. '12' Summary: 20 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT EXPIRES AUGUST 1998 Ken A L Coar 2 The Apache Group 3 D.R.T. Robinson 4 ESI 5 12 February, 1998 7 The WWW Common Gateway Interface 8 Version 1.2 9 11 Status of this Memo 13 This document is an Internet-Draft. Internet-Drafts are working 14 documents of the Internet Engineering Task Force (IETF), its areas 15 and its working groups. Note that other groups may also distribute 16 working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as 'work in progress.' 23 To learn the current status of any Internet-Draft, please check 24 the '1id-abstracts.txt' listing contained in the 25 one of the following Internet-Drafts Shadow Directories: 27 * Africa: ftp.is.co.za 28 * Europe: nic.nordu.net 29 * Pacific Rim: munnari.oz.au 30 * U.S. East Coast: ds.internic.net 31 * U.S. West Coast: ftp.isi.edu 33 Distribution of this document is unlimited. Please send comments to 34 the mailing list; general discussion about CGI 35 should take place on the mailing list. 37 Abstract 39 The Common Gateway Interface (CGI) is a simple interface for running 40 external programs, software or gateways under an information server 41 in a platform-independent manner. Currently, the supported information 42 servers are HTTP servers. 44 The interface has been in use by the World-Wide Web since 1993. This 45 specification defines the interface known as 'CGI/1.2', which is an 46 extension of the 'CGI/1.1' interface developed and documented at the 47 U.S. National Centre for Supercomputing Applications [NCSA-CGI]. 48 This document also defines the use of the CGI/1.2 interface 49 on the Unix(R) and AmigaDOS(tm) systems. 51 Table of Contents 53 1 Introduction..............................................2 54 1.1 Purpose................................................2 55 1.2 Requirements...........................................2 56 1.3 Specifications.........................................3 57 1.4 Terminology............................................3 58 2 Notational Conventions and Generic Grammar................3 59 2.1 Augmented BNF..........................................3 60 2.2 Basic Rules............................................4 61 3 Protocol Parameters.......................................5 62 3.1 URL Encoding...........................................5 63 3.2 The Script URI.........................................5 64 4 Environment Variables.....................................5 65 5 Invoking the Script.......................................10 66 6 The CGI Script Command Line...............................10 67 7 Data Input to the CGI Script..............................11 68 8 Data Output from the CGI Script...........................11 69 8.1 Non-Parsed Header Output...............................11 70 8.2 Parsed Header Output...................................12 71 9 Requirements for Servers..................................14 72 10 Recommendations for Scripts..............................15 73 11 System Specifications....................................15 74 11.1 AmigaDOS..............................................15 75 11.2 Unix..................................................15 76 12 Security Considerations..................................16 77 12.1 Safe Methods..........................................16 78 12.2 HTTP Header Fields Containing Sensitive Information...16 79 12.3 Script Interference with the Server...................16 80 13 Acknowledgments..........................................16 81 14 References...............................................16 82 15 Authors' Addresses.......................................17 84 1. Introduction 86 1.1. Purpose 88 Together the HTTP [3],[8] server and the CGI script are responsible 89 for servicing a client request by sending back responses. The client 90 request comprises a Universal Resource Identifier (URI) [1], a 91 request method and various ancillary information about the request 92 provided by the transport mechanism. 94 The CGI defines the abstract parameters, known as environment 95 variables, which describe the client's request. Together with a 96 concrete programmer interface this specifies a platform-independent 97 interface between the script and the HTTP server. 99 1.2. Requirements 101 This specification uses the same words as RFC 1123 [5] to define the 102 significance of each particular requirement. These are: 104 MUST 105 This word or the adjective 'required' means that the item is an 106 absolute requirement of the specification. 107 SHOULD 108 This word or the adjective 'recommended' means that there may 109 exist valid reasons in particular circumstances to ignore this 110 item, but the full implications should be understood and the 111 case carefully weighed before choosing a different course. 112 MAY 113 This word or the adjective 'optional' means that this item is 114 truly optional. One vendor may choose to include the item 115 because a particular marketplace requires it or because it 116 enhances the product, for example; another vendor may omit the 117 same item. 119 An implementation is not compliant if it fails to satisfy one or more 120 of the 'must' requirements for the protocols it implements. An 121 implementation that satisfies all of the 'must' and all of the 122 'should' requirements for its features is said to be 'unconditionally 123 compliant'; one that satisfies all of the 'must' requirements but not 124 all of the 'should' requirements for its features is said to be 125 'conditionally compliant'. 127 1.3. Specifications 129 Not all of the functions and features of the CGI are defined in the 130 main part of this specification. The following phrases are used to 131 describe the features which are not specified: 133 system defined 134 The feature may differ between systems, but must be the same 135 for different implementations using the same system. A system 136 will usually identify a class of operating-systems. Some 137 systems are defined in section 12 of this document. New systems 138 may be defined by new specifications without revision of this 139 document. 141 implementation defined 142 The behaviour of the feature may vary from implementation to 143 implementation, but a particular implementation must document 144 its behaviour. 146 1.4. Terminology 148 This specification uses many terms defined in the HTTP/1.1 149 specification [8]; however, the following terms are 150 used here in a sense which may not accord with their definitions 151 in that document, or with their common meaning. 153 environment variable 154 A named parameter that carries information from the server to 155 the script. It is not necessarily a variable in the 156 operating-system's environment, although that is the most 157 common implementation. 159 script 160 The software which is invoked by the server via this interface. 161 It need not be a standalone program, but could be a 162 dynamically-loaded or shared library, or even a subroutine in 163 the server. 165 server 166 The application program which invokes the script in order to 167 service requests. 169 2. Notational Conventions and Generic Grammar 171 2.1. Augmented BNF 173 All of the mechanisms specified in this document are described in 174 both prose and an augmented Backus-Naur Form (BNF) similar to that 175 used by RFC 822 [6]. This augmented BNF contains 176 the following constructs: 178 name = definition 179 the definition by the equal character ("="). Whitespace is only 180 significant in that continuation lines of a definition are 181 indented. 183 "literal" 184 Quotation marks (") surround literal text, except for a literal 185 quotation mark, which is surrounded by angle-brackets ("<" and 186 ">"). Unless stated otherwise, the text is case-sensitive. 188 rule1 | rule2 189 Alternative rules are separated by a vertical bar ("|"). 191 (rule1 rule2 rule3) 192 Elements enclosed in parentheses are treated as a single 193 element. 195 *rule 196 A rule preceded by an asterisk ("*") may have zero or more 197 occurrences. A rule preceded by an integer followed by an 198 asterisk must occur at least the specified number of times. 200 [rule] 201 A element enclosed in square brackets ("[" and "]") is 202 optional. 204 2.2. Basic Rules 206 The following rules are used throughout this specification to 207 describe basic parsing constructs. 209 alpha = lowalpha | hialpha 210 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" 211 | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" 212 | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" 213 | "y" | "z" 214 hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" 215 | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" 216 | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" 217 | "Y" | "Z" 218 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" 219 | "8" | "9" 220 OCTET = 221 CHAR = 222 CTL = 223 SP = 224 HT = 225 NL = 226 LWSP = SP | HT | NL 227 tspecial = "(" | ")" | "@" | "," | ";" | ":" | "\" | <"> 228 | "/" | "[" | "]" | "?" | "<" | ">" | "{" | "}" 229 | SP | HT 230 token = 1* 231 quoted-string = ( <"> *qdtext <"> ) | ( "<" *qatext ">") 232 qdtext = and CTLs but including LWSP> 233 qatext = " and CTLs but 234 including LWSP> 236 Note that newline (NL) need not be a single character, but can be a 237 character sequence. 239 3. Protocol Parameters 241 3.1. URL Encoding 243 Some variables and constructs used here are described as being 244 'URL-encoded'. This encoding is described in section 2.2 of RFC 1738 245 [4]. In a URL encoded string an escape sequence consists of a percent 246 character ("%") followed by two hexadecimal digits, where the 247 two hexadecimal digits form an octet. An escape sequence represents 248 the graphic character which has the octet as its code within the 249 US-ASCII [12] coded character set, if it exists. If no such graphic 250 character exists, then the escape sequence represents the octet value 251 itself. 253 Note that some unsafe characters may have different semantics if 254 they are encoded. The definition of which characters are unsafe 255 depends on the context. 257 3.2. The Script URI 259 A 'Script URI' can be defined; this describes the resource identified 260 by the environment variables. Often, this URI will be the same as 261 the URI requested by the client (the 'Client URI'); however, it need 262 not be. Instead, it could be a URI invented by the server, and so it 263 can only be used in the context of the server and its CGI interface. 265 The script URI has the syntax of generic-RL as defined in section 2.1 266 of RFC 1808 [7], with the exception that object parameters and 267 fragment identifiers are not permitted: 269 ://:/? 271 The various components of the script URI are defined by some of the 272 environment variables (see below); 274 script-uri = protocol "://" SERVER_NAME ":" SERVER_PORT enc-script 275 enc-path-info "?" QUERY_STRING 277 where 'protocol' is found from SERVER_PROTOCOL, 'enc-script' is a 278 URL-encoded version of SCRIPT_NAME and 'enc-path-info' is a 279 URL-encoded version of PATH_INFO. 281 4. Environment Variables 283 Environment variables are used to pass data about the request from 284 the server to the script. They are accessed by the script in a system 285 defined manner. In all cases, a missing environment variable is 286 equivalent to a zero-length (NULL) value, and vice versa. The 287 representation of the characters in the environment variables is 288 system defined. 290 Case is not significant in the names, in that there cannot be two 291 different variable whose names differ in case only. Here they are 292 shown using a canonical representation of capitals plus underscore 293 ("_"). The actual representation of the names is system defined; for 294 a particular system the representation may be defined differently to 295 this. 297 The variables are: 299 AUTH_TYPE 300 CONTENT_LENGTH 301 CONTENT_TYPE 302 GATEWAY_INTERFACE 303 HTTP_* 304 PATH_INFO 305 PATH_TRANSLATED 306 QUERY_STRING 307 REMOTE_ADDR 308 REMOTE_HOST 309 REMOTE_IDENT 310 REMOTE_USER 311 REQUEST_METHOD 312 SCRIPT_NAME 313 SERVER_NAME 314 SERVER_PORT 315 SERVER_PROTOCOL 316 SERVER_SOFTWARE 318 AUTH_TYPE 319 This variable is specific to requests made with HTTP. 320 If the script URI would require access authentication for 321 external access, then this variable is found from the 322 'auth-scheme' token in the request, otherwise NULL. 324 AUTH_TYPE = "" | auth-scheme 325 auth-scheme = "Basic" | token 327 HTTP access authentication schemes are described in section 11 328 of the HTTP/1.1 specification [8]. The auth-scheme is not 329 case-sensitive. 331 CONTENT_LENGTH 332 The size of the entity attached to the request, if any, in 333 decimal number of octets. If no data is attached, then NULL. 334 The syntax is the same as the HTTP Content-Length header field 335 (section 14.14, HTTP/1.1 specification [8]). 337 CONTENT_LENGTH = "" | 1*digit 339 CONTENT_TYPE 340 The Internet Media Type [9] of the attached entity. The syntax 341 is the same as the HTTP Content-Type header field. 343 CONTENT_TYPE = "" | media-type 344 media-type = type "/" subtype *( ";" parameter) 345 type = token 346 subtype = token 347 parameter = attribute "=" value 348 attribute = token 349 value = token | quoted-string 351 The type, subtype and parameter attribute names are not 352 case-sensitive. Parameter values may be case sensitive. Media 353 types and their use in HTTP are described section 3.7 of the 354 HTTP/1.1 specification [8]. Example: 356 application/x-www-form-urlencoded 358 There is no default value for this variable. If and only if it 359 is unset, then the script may attempt to determine the media 360 type from the data received. If the type remains unknown, then 361 application/octet-stream should be assumed. 363 GATEWAY_INTERFACE 364 The version of the CGI specification to which this server 365 complies. Syntax: 367 GATEWAY_INTERFACE = "CGI" "/" 1*digit "." 1*digit 369 Note that the major and minor numbers are treated as separate 370 integers and hence each may be incremented higher than a single 371 digit. Thus CGI/2.4 is a lower version than CGI/2.13 which in 372 turn is lower than CGI/12.3. Leading zeros must be ignored by 373 scripts and should never be generated by servers. 375 This document defines the 1.2 version of the CGI interface. 377 HTTP_* 378 These variables are specific to requests made with HTTP. 379 Interpretation of these variables may depend on the value of 380 SERVER_PROTOCOL. 382 Environment variables with names beginning with "HTTP_" contain 383 header data read from the client, if the protocol used was 384 HTTP. The HTTP header field name is converted to upper case, 385 has all occurrences of "-" replaced with "_" and has "HTTP_" 386 prepended to give the environment variable name. The header 387 data may be presented as sent by the client, or may be 388 rewritten in ways which do not change its semantics. If 389 multiple header fields with the same field-name are received 390 then they must be rewritten as a single header field having the 391 same semantics. Similarly, a header field that is received on 392 more than one line must be merged onto a single line. The 393 server must, if necessary, change the representation of the 394 data (for example, the character set) to be appropriate for a 395 CGI environment variable. 397 The server is not required to create environment variables for 398 all the header fields that it receives. In particular, it may 399 remove any header fields carrying authentication information, 400 such as "Authorization"; it may remove header fields whose 401 value is available to the script via other variables, such as 402 "Content-Length" and "Content-Type". 404 PATH_INFO 405 A path to be interpreted by the CGI script. It identifies the 406 resource or sub-resource to be returned by the CGI script. The 407 syntax and semantics are similar to a decoded HTTP URL 'hpath' 408 token (defined in RFC 1738 [4]), with the exception that a 409 PATH_INFO of "/" represents a single void path segment. 410 Otherwise, the leading "/" character is not part of the path. 412 PATH_INFO = "" | ( "/" path ) 413 path = segment *( "/" segment ) 414 segment = *pchar 415 pchar = 417 The PATH_INFO string is the trailing part of the 418 component of the script URI that follows the SCRIPT_NAME part 419 of the path. 421 PATH_TRANSLATED 422 The OS path to the file that the server would attempt to access 423 were the client to request the absolute URL containing the path 424 PATH_INFO. I.e., for a request of 426 protocol "://" SERVER_NAME ":" SERVER_PORT enc-path-info 428 where 'enc-path-info' is a URL-encoded version of PATH_INFO. If 429 PATH_INFO is NULL then PATH_TRANSLATED is set to NULL. 431 PATH_TRANSLATED = *CHAR 433 PATH_TRANSLATED need not be supported by the server. The server 434 may choose to set PATH_TRANSLATED to NULL for reasons of 435 security, or because the path would not be interpretable by a 436 CGI script; such as the object it represented was internal to 437 the server and not visible in the file-system; or for any other 438 reason. 440 The algorithm the server uses to derive PATH_TRANSLATED is 441 obviously implementation defined; CGI scripts which use this 442 variable may suffer limited portability. 444 QUERY_STRING 445 A URL-encoded search string; the part of the script 446 URI. 448 QUERY_STRING = query-string 449 query-string = *qchar 450 qchar = unreserved | escape | reserved 451 unreserved = alpha | digit | safe | extra 452 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" 453 safe = "$" | "-" | "_" | "." | "+" 454 extra = "!" | "*" | "'" | "(" | ")" | "," 455 escape = "%" hex hex 456 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" 457 | "b" | "c" | "d" | "e" | "f" 459 The URL syntax for a search string is described in RFC 1738 460 [4]. 462 REMOTE_ADDR 463 The IP address of the agent sending the request to the server. 464 This is not necessarily that of the client. 466 REMOTE_ADDR = hostnumber 467 hostnumber = digits "." digits "." digits "." digits 468 digits = 1*digit 470 REMOTE_HOST 471 The fully qualified domain name of the agent sending the 472 request to the server, if available, otherwise NULL. Not 473 necessarily that of the client. Fully qualified domain names 474 take the form as described in section 3.5 of RFC 1034 [10] and 475 section 2.1 of RFC 1123 [5]; a sequence of domain labels 476 separated by ".", each domain label starting and ending with an 477 alphanumerical character and possibly also containing "-" 478 characters. The rightmost domain label will never start with a 479 digit. Domain names are not case sensitive. 481 REMOTE_HOST = "" | hostname 482 hostname = *( domainlabel ".") toplabel 483 domainlabel = alphadigit [ *alphahypdigit alphadigit ] 484 toplabel = alpha [ *alphahypdigit alphadigit ] 485 alphahypdigit = alphadigit | "-" 486 alphadigit = alpha | digit 488 REMOTE_IDENT 489 The identity information reported about the connection by a RFC 490 1413 [11] request to the remote agent, if available. The server 491 may choose not to support this feature, or not to request the 492 data for efficiency reasons. 494 REMOTE_IDENT = *CHAR 496 The data returned is not appropriate for use as authentication 497 information. 499 REMOTE_USER 500 This variable is specific to requests made with HTTP. 502 If AUTH_TYPE is "Basic", then the user-ID sent by the client. 503 If AUTH_TYPE is NULL, then NULL, otherwise undefined. 505 REMOTE_USER = "" | userid | *OCTET 506 userid = token 508 REQUEST_METHOD 509 This variable is specific to requests made with HTTP. 511 The method with which the request was made, as described in 512 section 5.1.1 of the HTTP/1.0 specification [3] and section 513 5.1.1 of the HTTP/1.1 specification [8]. 515 REQUEST_METHOD = http-method 516 http-method = "GET" | "HEAD" | "POST" | "PUT" | "DELETE" 517 | extension-method 518 extension-method = token 520 The method is case sensitive. Note that of the new methods 521 defined by the HTTP/1.1 specification [8], OPTIONS and TRACE 522 are not appropriate for the CGI/1.2 environment. 524 SCRIPT_NAME 525 A URL path that could identify the CGI script (rather then the 526 particular CGI output). The syntax and semantics are identical 527 to a decoded HTTP URL 'hpath' token [4]. 529 SCRIPT_NAME = "" | ( "/" [ path ] ) 531 The leading "/" is not part of the path. It is optional if the 532 path is NULL. 534 The SCRIPT_NAME string is some leading part of the 535 component of the script URI derived in some implementation 536 defined manner. 538 SERVER_NAME 539 The name for this server, as used in the part of the 540 script URI. Thus either a fully qualified domain name, or an IP 541 address. 543 SERVER_NAME = hostname | hostnumber 545 SERVER_PORT 546 The port on which this request was received, as used in the 547 part of the script URI. 549 SERVER_PORT = 1*digit 551 SERVER_PROTOCOL 552 The name and revision of the information protocol this request 553 came in with. This is not necessarily the same as the protocol 554 version used by the server in its response. 556 SERVER_PROTOCOL = HTTP-Version | extension-version 557 HTTP-Version = "HTTP" "/" 1*digit "." 1*digit 558 extension-version = protocol "/" 1*digit "." 1*digit 559 protocol = 1*( alpha | digit | "+" | "-" | "." ) 561 'protocol' is a version of the part of the script URI, 562 and is not case sensitive. By convention, 'protocol' is in 563 upper case. 565 SERVER_SOFTWARE 566 The name and version of the information server software 567 answering the request (and running the gateway). 569 SERVER_SOFTWARE = *CHAR 571 5. Invoking the Script 573 This script is invoked in a system defined manner. Unless specified 574 otherwise, this will be by treating the file containing the script 575 as an executable program, and running it as a child process of the 576 server. 578 6. The CGI Script Command Line 579 Some systems support a method for supplying an array of strings to 580 the CGI script. This is only used in the case of an 'indexed' query. 581 This is identified by a "GET" or "HEAD" HTTP request with a URL 582 search string not containing any unencoded "=" characters. For such a 583 request, the server should parse the search string into words, using 584 the rules: 586 search-string = search-word *( "+" search-word ) 587 search-word = 1*schar 588 schar = xunreserved | escape | xreserved 589 xunreserved = alpha | digit | xsafe | extra 590 xsafe = "$" | "-" | "_" | "." 591 xreserved = ";" | "/" | "?" | ":" | "@" | "&" 593 After parsing, each word is URL-decoded, optionally encoded in a 594 system defined manner and then the argument list is set to the list 595 of words. 597 If the server cannot create any part of the argument list, then the 598 server should generate no command line information. For example, the 599 number of arguments may be greater than operating system or server 600 limitations, or one of the words may not be representable as an 601 argument. 603 7. Data Input to the CGI Script 605 As there may be a data entity attached to the request, there must be 606 a system defined method for the script to read this data. Unless 607 defined otherwise, this will be via the 'standard input' file 608 descriptor. 610 There will be at least CONTENT_LENGTH bytes available for the script 611 to read. The script is not obliged to read the data, but it must not 612 attempt to read more than CONTENT_LENGTH bytes, even if more data is 613 available. 615 For non-parsed header (NPH) scripts (see below), the server should 616 attempt to ensure that the script input comes directly from the 617 client, with minimal buffering. For all scripts the data will be 618 as supplied by the client. 620 8. Data Output from the CGI Script 622 There must be a system defined method for the script to send data 623 back to the server or client; a script will always return some data. 624 Unless defined otherwise, this will be via the 'standard 625 output' file descriptor. 627 There are two forms of output that the script can give; non-parsed 628 header (NPH) output, and parsed header output. A server is only 629 required to support the latter; distinguishing between the two types 630 of output (or scripts) is implementation defined. 632 8.1. Non-Parsed Header Output 634 The script must return a complete HTTP response message, as described 635 in Section 6 of the HTTP specifications [3],[8]. The script should 636 use the SERVER_PROTOCOL variable to determine the appropriate format 637 for a response. Note that this allows an HTTP/0.9 response to an 638 HTTP/1.0 request, for example. 640 The server should attempt to ensure that the script output is sent 641 directly to the client, with minimal buffering. 643 8.2. Parsed Header Output 645 The script returns a CGI response message. 647 CGI-Response = *( CGI-Header | HTTP-Header ) NL [ Entity-Body ] 648 CGI-Header = Content-type 649 | Location 650 | Status 651 | Script-Control 652 | extension-header 654 The response comprises a header and a body, separated by a blank line. 655 The header fields are either CGI header fields to be interpreted by 656 the server, or HTTP headers to be included in the response returned 657 to the client if the request method is HTTP. At least one CGI-Header must be 658 supplied, but no CGI header field can be repeated with the same field-name. 659 If a body is supplied, then a Content-type header field is required, 660 otherwise the script must send a Location or Status header field. If a 661 Location header field is returned, then no HTTP-Headers may be supplied. 663 The CGI header fields have the generic syntax: 665 generic-header = field-name ":" [ field-value ] NL 666 field-name = 1* 667 field-value = *( field-content | LWSP ) 668 field-content = *( token | tspecial | quoted-string ) 670 The field-name is not case sensitive; a NULL field value is 671 equivalent to the header field not being sent. 673 Content-Type 674 The Internet Media Type [9] of the entity body, which is to be 675 sent unmodified to the client. 677 Content-Type = "Content-Type" ":" media-type NL 679 Location 680 This is used to specify to the server that the script is 681 returning a reference to a document rather than an actual 682 document. 684 Location = "Location" ":" 685 ( fragment-URI | rel-URL-abs-path ) NL 686 fragment-URI = URI [ # fragmentid ] 687 URI = scheme ":" *qchar 688 fragmentid = *qchar 689 rel-URL-abs-path = "/" [ hpath ] [ "?" query-string ] 690 hpath = fpsegment *( "/" psegment ) 691 fpsegment = 1*hchar 692 psegment = *hchar 693 hchar = alpha | digit | safe | extra 694 | ":" | "@" | "& | "=" 695 The location value is either an absolute URI with optional 696 fragment, as defined in RFC 1630 [1], or an absolute path and 697 optional query-string. If an absolute URI is returned by the 698 script, then the server will generate a '302 redirect' HTTP 699 response message, and if no entity body is supplied by the 700 script, then the server will produce one. If the Location value 701 is a path, then the server will generate the response that it 702 would have produced in response to a request containing the URL 704 protocol "://" SERVER_NAME ":" SERVER_PORT rel-URL-abs-path 706 The location header field may only be sent if the 707 REQUEST_METHOD is HEAD or GET. 709 Status 710 The Status header field is used to indicate to the server what 711 status code the server must use in the response message. It 712 should not be sent if the script returns a Location header 713 field. 715 Status = "Status" ":" digit digit digit SP reason-phrase NL 716 reason-phrase = * 718 The valid status codes are listed in section 6.1.1 of the 719 HTTP/1.0 specifications [3]. If the SERVER_PROTOCOL is 720 "HTTP/1.1", then the status codes defined in the HTTP/1.1 721 specification [8] may be used. If the script does not return a 722 Status header, then "200 OK" should be assumed. 724 If a script is being used to handle a particular error or 725 condition encountered by the server, such as a 404 Not Found 726 error, the script should use the Status CGI header field to 727 propagate the error condition back to the client. E.g., in 728 the example mentioned it should include a "Status: 404 Not Found" 729 in the header data returned to the server. 731 Script-Control 732 The Script-Control header field is used to inform the server of 733 special requirements the script may have. 735 Script-Control = "Script-Control" ":" 1#control-directive NL 736 control-directive = "no-abort" 737 | extension-directive 738 extension-directive = * 740 The meanings of the different script control directives are: 742 no-abort 743 The presence of this directive informs the server that 744 the server MUST NOT abort the script, which will manage 745 its own termination. This is useful when a script's 746 activity includes performing an operation which might 747 result in data corruption if prematurely interrupted. 749 If the script does not return a Script-Control header field, 750 then the server is free to manage the script as it deems 751 appropriate (e.g., killing the CGI process if the request is 752 aborted by the client, or if the script neglects to respond 753 within an arbitrary time interval selected by the server). 755 HTTP header fields 756 The script may return any other header fields defined by the 757 specification for the SERVER_PROTOCOL (HTTP/1.0 [3] or HTTP/1.1 758 [8]). The server must translate the header data from the CGI 759 header field syntax to the HTTP header field syntax if these 760 differ. For example, the character sequence for newline (such 761 as Unix's ASCII NL) used by CGI scripts may not be the same as 762 that used by HTTP (ASCII CR followed by LF). The server must 763 also resolve any conflicts between header fields returned by 764 the script and header fields that it would otherwise send 765 itself. 767 9. Requirements for Servers 769 Servers must support the standard mechanism (described below) which 770 allows the script author to determine what URL to use in documents 771 which reference the script. Specifically, what URL to use in order to 772 achieve particular settings of the environment variables. This 773 mechanism is as follows: 775 The value for SCRIPT_NAME is governed by the server configuration and 776 the location of the script in the OS file-system. Given this, any 777 access to the partial URL 779 SCRIPT_NAME extra-path ? query-information 781 where extra-path is either NULL or begins with a "/" and satisfies 782 any other server requirements, will cause the CGI script to be 783 executed with PATH_INFO set to the decoded extra-path, and 784 QUERY_STRING set to query-information (not decoded). 786 Servers may reject with error 404 any requests that would result in 787 an encoded "/" being decoded into PATH_INFO or SCRIPT_NAME, as this 788 might represent a loss of information to the script. 790 Although the server and the CGI script need not be consistent in 791 their handling of URL paths (client URLs and the PATH_INFO data, 792 respectively), server authors may wish to impose consistency. 793 So the server implementation should define its behaviour for the 794 following cases: 796 1. define any restrictions on allowed characters, in particular 797 whether ASCII NUL is permitted; 798 2. define any restrictions on allowed path segments, in particular 799 whether non-terminal NULL segments are permitted; 800 3. define the behaviour for "." or ".." path segments; i.e., whether 801 they are prohibited, treated as ordinary path segments or 802 interpreted in accordance with the relative URL specification [7]; 803 4. define any limits of the implementation, including limits on path 804 or search string lengths, and limits on the volume of header data 805 the server will parse. 807 Servers may generate the script URI in any way from the client URI, 808 or from any other data (but the behaviour should be documented). 810 10. Recommendations for Scripts 812 Scripts should reject unexpected methods (such as DELETE etc.) with 813 error 405 Method Not Allowed. If the script does not intend 814 processing the PATH_INFO data, then it should reject the request with 815 404 Not Found if PATH_INFO is not NULL. 817 If the output of a form is being processed, check that CONTENT_TYPE 818 is "application/x-www-form-urlencoded" [2]. 820 If parsing PATH_INFO, PATH_TRANSLATED or SCRIPT_NAME then be careful 821 of void path segments ("//") and special path segments ("." and 822 ".."). They should either be removed from the path before 823 use in OS system calls, or the request should be rejected with 404 Not Found. 824 It is very unlikely that any other use could be made of these. 826 As it is impossible for the script to determine the client URI that 827 initiated this request without knowledge of the specific server in 828 use, the script should not return text/html documents containing 829 relative URL links without including a tag in the 830 document. 832 When returning header fields, the script should try to send the CGI 833 header fields as soon as possible, and preferably before any HTTP 834 header fields. This may help reduce the server's memory requirements. 836 11. System Specifications 838 11.1. AmigaDOS 840 Environment variables 841 These are accessed by the DOS library routine GetVar. The flags 842 argument should be 0. Case is ignored, but upper case is 843 recommended for compatibility with case-sensitive systems. 845 The current working directory 846 The current working directory for the script is set to the 847 directory containing the script. 849 Character set 850 The US-ASCII character set is used for the definition of 851 environment variables and header fields; the newline (NL) 852 sequence is CR LF. 854 11.2. Unix 856 For Unix compatible operating systems, the following are defined: 858 Environment variables 859 These are accessed by the C library routine getenv. 861 The command line 862 This is accessed using the the argc and argv arguments to 863 main(). The words have any characters which are 'active' in the 864 Bourne shell escaped with a backslash. 866 The current working directory 867 The current working directory for the script is set to the 868 directory containing the script. 870 Character set 871 The US-ASCII character set is used for the definition of 872 environment variables and header fields; the newline (NL) 873 sequence is LF; servers should also accept CR LF as a newline. 875 12. Security Considerations 877 12.1. Safe Methods 879 As discussed in the security considerations of the HTTP 880 specifications [3],[8], the convention has been established that the 881 GET and HEAD methods should be 'safe'; they should cause no 882 side-effects and only have the significance of resource retrieval. 884 12.2. HTTP Header Fields Containing Sensitive Information 886 Some HTTP header fields may carry sensitive information which the server 887 should not pass on to the script unless explicitly configured to do 888 so. For example, if the server protects the script using the Basic 889 authentication scheme, then the client will send an Authorization 890 header field containing a username and password. If the server, rather 891 than the script, validates this information then the password should 892 not be passed on to the script via the HTTP_AUTHORIZATION 893 environment variable. 895 12.3. Script Interference with the Server 897 The most common implementation of CGI invokes the script as a child 898 process using the same user and group as the server process. It 899 should therefore be ensured that the script cannot interfere with the 900 server process, its configuration or documents. 902 If the script is executed by calling a function linked in to the 903 server software (either at compile-time or run-time) then precautions 904 should be taken to protect the core memory of the server, or to 905 ensure that untrusted code cannot be executed. 907 13. Acknowledgements 909 This work is based on a draft published in 1997 by David R. Robinson in 910 1997, which in turn was based on the original CGI interface that arose out of 911 discussions on the www-talk mailing list. In particular, 912 Rob McCool, John Franks, Ari Luotonen, George Phillips and 913 Tony Sanders deserve special recognition for their efforts in 914 defining and implementing the early versions of this interface. 916 This document has also greatly benefited from the comments and 917 suggestions made Chris Adie, Dave Kristol, and Mike Meyer. 919 14. References 921 [1] 922 Berners-Lee, T., 'Universal Resource Identifiers in WWW: A 923 Unifying Syntax for the Expression of Names and Addresses of 924 Objects on the Network as used in the World-Wide Web', RFC 925 1630, CERN, June 1994. 927 [2] 928 Berners-Lee, T. and Connolly, D., 'Hypertext Markup Language - 929 2.0', RFC 1866, MIT/W3C, November 1995. 930 [3] 931 Berners-Lee, T., Fielding, R. T. and Frystyk, H., 'Hypertext 932 Transfer Protocol -- HTTP/1.0', RFC 1945, MIT/LCS, UC Irvine, 933 May 1996. 934 [4] 935 Berners-Lee, T., Masinter, L. and McCahill, M., Editors, 936 'Uniform Resource Locators (URL)', RFC 1738, CERN, Xerox 937 Corporation, University of Minnesota, December 1994. 938 [5] 939 Braden, R., Editor, 'Requirements for Internet Hosts -- 940 Application and Support', STD 3, RFC 1123, IETF, October 1989. 941 [6] 942 Crocker, D.H., 'Standard for the Format of ARPA Internet Text 943 Messages', STD 11, RFC 822, University of Delaware, August 944 1982. 945 [7] 946 Fielding, R., 'Relative Uniform Resource Locators', RFC 1808, 947 UC Irving, June 1995. 948 [8] 949 Fielding, R., Gettys, J., Mogul, J., Frystyk, H. and 950 Berners-Lee, T., 'Hypertext Transfer Protocol -- HTTP/1.1', RFC 951 2068, UC Irving, DEC, MIT/LCS, January 1997. 952 [9] 953 Freed, N. and Borenstein N., 'Multipurpose Internet Mail 954 Extensions (MIME) Part Two: Media Types', RFC 2046, Innosoft, 955 First Virtual, November 1996. 956 [10] 957 Mockapetris, P., 'Domain Names - Concepts and Facilities', STD 958 13, RFC 1034, ISI, November 1987. 959 [11] 960 St. Johns, M., 'Identification Protocol', RFC 1431, US 961 Department of Defense, February 1993. 962 [12] 963 'Coded Character Set -- 7-bit American Standard Code for 964 Information Interchange', ANSI X3.4-1986. 966 15. Authors' Addresses 968 Ken A L Coar 969 MeepZor Consulting 970 26B Bay Ridge Drive 971 Nashua, NH 03062 972 U.S.A. 973 Tel: +1 (603) 891.2243 974 Fax: not available 975 Email: Ken.Coar@Golux.Com 977 David Robinson 978 Electronic Share Information Ltd 979 Mount Pleasant House 980 2 Mount Pleasant 981 Huntingdon Road 982 Cambridge CB3 0RN 983 UK 984 Tel: +44 (1223) 566926 985 Fax: +44 (1223) 506288 986 Email: drtr@esi.co.uk 988 INTERNET DRAFT EXPIRES AUGUST 1998 INTERNET DRAFT