idnits 2.17.1 draft-coar-cgi-v11-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The script MUST not provide any other header fields. For an HTTP client request, the server MUST generate a 302 `Found' HTTP response message. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (16 April 2003) is 7674 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'Entity-Body' on line 1021 == Unused Reference: '22' is defined on line 1554, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1630 (ref. '1') ** Downref: Normative reference to an Informational RFC: RFC 1945 (ref. '2') ** Obsolete normative reference: RFC 2396 (ref. '3') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 822 (ref. '6') (Obsoleted by RFC 2822) ** Obsolete normative reference: RFC 2246 (ref. '7') (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2616 (ref. '8') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 2617 (ref. '9') (Obsoleted by RFC 7235, RFC 7615, RFC 7616, RFC 7617) ** Obsolete normative reference: RFC 2732 (ref. '11') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2373 (ref. '12') (Obsoleted by RFC 3513) ** Obsolete normative reference: RFC 2388 (ref. '13') (Obsoleted by RFC 7578) -- Possible downref: Non-RFC (?) normative reference: ref. '15' ** Obsolete normative reference: RFC 2818 (ref. '16') (Obsoleted by RFC 9110) -- Possible downref: Non-RFC (?) normative reference: ref. '18' -- Possible downref: Non-RFC (?) normative reference: ref. '19' -- Possible downref: Non-RFC (?) normative reference: ref. '20' -- Possible downref: Non-RFC (?) normative reference: ref. '21' -- Possible downref: Non-RFC (?) normative reference: ref. '22' Summary: 15 errors (**), 0 flaws (~~), 4 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT David Robinson 3 draft-coar-cgi-v11-03.txt Apache Software Foundation 4 Expires 15 October 2003 Ken A.L. Coar 5 IBM Corporation 6 16 April 2003 8 The Common Gateway Interface (CGI) Version 1.1 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as `work in progress'. 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Distribution of this document is unlimited. Please send comments to 31 the authors, or via the CGI-WG mailing list; see the project Web page 32 at . 34 Abstract 36 The Common Gateway Interface (CGI) is a simple interface for running 37 external programs, software or gateways under an information server 38 in a platform-independent manner. Currently, the supported 39 information servers are HTTP servers. 41 The interface has been in use by the World-Wide Web since 1993. This 42 specification defines the `current practice' parameters of the 43 `CGI/1.1' interface developed and documented at the U.S. National 44 Centre for Supercomputing Applications. This document also defines 45 the use of the CGI/1.1 interface on UNIX(R) and other, similar 46 systems. 48 Contents 50 1 Introduction.....................................................3 51 1.1 Purpose......................................................3 52 1.2 Requirements.................................................4 53 1.3 Specifications...............................................4 54 1.4 Terminology..................................................4 56 2 Notational Conventions and Generic Grammar.......................5 57 2.1 Augmented BNF................................................5 58 2.2 Basic Rules..................................................6 59 2.3 URL Encoding.................................................6 61 3 Invoking the Script..............................................7 62 3.1 Server Responsibilities......................................7 63 3.2 Script Selection.............................................8 64 3.3 The Script-URI...............................................8 65 3.4 Execution....................................................9 67 4 The CGI Request..................................................9 68 4.1 Request Meta-Variables.......................................9 69 4.1.1 AUTH_TYPE............................................10 70 4.1.2 CONTENT_LENGTH.......................................11 71 4.1.3 CONTENT_TYPE.........................................11 72 4.1.4 GATEWAY_INTERFACE....................................12 73 4.1.5 PATH_INFO............................................12 74 4.1.6 PATH_TRANSLATED......................................13 75 4.1.7 QUERY_STRING.........................................14 76 4.1.8 REMOTE_ADDR..........................................14 77 4.1.9 REMOTE_HOST..........................................15 78 4.1.10 REMOTE_IDENT.........................................15 79 4.1.11 REMOTE_USER..........................................15 80 4.1.12 REQUEST_METHOD.......................................15 81 4.1.13 SCRIPT_NAME..........................................16 82 4.1.14 SERVER_NAME..........................................16 83 4.1.15 SERVER_PORT..........................................16 84 4.1.16 SERVER_PROTOCOL......................................17 85 4.1.17 SERVER_SOFTWARE......................................17 86 4.1.18 Protocol-Specific Meta-Variables.....................17 87 4.2 Request Message-Body........................................18 88 4.3 Request Methods.............................................18 89 4.3.1 GET..................................................19 90 4.3.2 POST.................................................19 91 4.3.3 HEAD.................................................19 92 4.3.4 Protocol-Specific Methods............................19 93 4.4 The Script Command Line.....................................19 95 5 NPH Scripts.....................................................20 96 5.1 Indentification.............................................20 97 5.2 NPH Response................................................20 99 6 CGI Response....................................................21 100 6.1 Response Handling...........................................21 101 6.2 Response Types..............................................21 102 6.2.1 Document Response....................................22 103 6.2.2 Local Redirect Response..............................22 104 6.2.3 Client Redirect Response.............................22 105 6.2.4 Client Redirect Response with Document...............22 106 6.3 Response Header Fields......................................23 107 6.3.1 Content-Type.........................................23 108 6.3.2 Location.............................................24 109 6.3.3 Status...............................................24 110 6.3.4 Protocol-Specific Header Fields......................25 111 6.3.5 Extension Header Fields..............................25 112 6.4 Response Message Body.......................................25 114 7 System Specifications...........................................25 115 7.1 AmigaDOS....................................................26 116 7.2 UNIX........................................................26 117 7.3 EBCDIC/POSIX................................................26 119 8 Implementation..................................................27 120 8.1 Recommendations for Servers.................................27 121 8.2 Recommendations for Scripts.................................28 123 9 Security Considerations.........................................28 124 9.1 Safe Methods................................................28 125 9.2 HTTP Headers Containing Sensitive Information...............28 126 9.3 Data Privacy................................................29 127 9.4 TLS Connection Endpoint.....................................29 128 9.5 Server/Script Authentication................................29 129 9.6 Script Interference with the Server.........................29 130 9.7 Data Length and Buffering Considerations....................29 131 9.8 Stateless Processing........................................30 132 9.9 Non-parsed Header Output....................................30 134 10 Acknowledgements................................................30 136 11 References......................................................31 138 12 Authors' Addresses..............................................32 140 1 Introduction 142 1.1 Purpose 144 The Common Gateway Interface (CGI) [21] allows an HTTP [2], [8] 145 server and a CGI script to share responsiblity for servicing client 146 requests by sending back responses. The client request comprises a 147 Universal Resource Identifier (URI) [1], a request method and various 148 ancillary information about the request provided by the transport 149 mechanism. 151 The CGI defines the abstract parameters, known as meta-variables, 152 which describe the client's request. Together with a concrete 153 programmer interface this specifies a platform-independent interface 154 between the script and the HTTP server. 156 The server is responsible for managing connection, data transfer, 157 transport and network issues related to the request, whilst the CGI 158 script is handles the application issues, such as data access and 159 document processing. 161 1.2 Requirements 163 The key words `MUST', `MUST NOT', `REQUIRED', `SHALL', `SHALL NOT', 164 `SHOULD', `SHOULD NOT', `RECOMMENDED', `MAY' and `OPTIONAL' in this 165 document are to be interpreted as described in RFC 2119 [5]. 167 An implementation is not compliant if it fails to satisfy one or more 168 of the `must' requirements for the protocols it implements. An 169 implementation that satisfies all of the `must' and all of the 170 `should' requirements for its features is said to be `unconditionally 171 compliant'; one that satisfies all of the `must' requirements but not 172 all of the `should' requirements for its features is said to be 173 `conditionally compliant'. 175 1.3 Specifications 177 Not all of the functions and features of the CGI are defined in the 178 main part of this specification. The following phrases are used to 179 describe the features which are not specified: 181 system defined 182 The feature may differ between systems, but must be the same for 183 different implementations using the same system. A system will 184 usually identify a class of operating-systems. Some systems are 185 defined in section 7 of this document. New systems may be defined 186 by new specifications without revision of this document. 188 implementation defined 189 The behaviour of the feature may vary from implementation to 190 implementation, but a particular implementation must document its 191 behaviour. 193 1.4 Terminology 194 This specification uses many terms defined in the HTTP/1.1 195 specification [8]; however, the following terms are used here in a 196 sense which may not accord with their definitions in that document, 197 or with their common meaning. 199 meta-variable 200 A named parameter that carries information from the server to the 201 script. It is not necessarily a variable in the operating-system's 202 environment, although that is the most common implementation. 204 script 205 The software which is invoked by the server via this interface. It 206 need not be a standalone program, but could be a dynamically- 207 loaded or shared library, or even a subroutine in the server. It 208 might be a set of statements interpreted at run-time, as the term 209 `script' is frequently understood, but that is not a requirement 210 and within the context of this specification the term has the 211 broader definition stated. 213 server 214 The application program which invokes the script in order to 215 service requests from the cleint. 217 2 Notational Conventions and Generic Grammar 219 2.1 Augmented BNF 221 All of the mechanisms specified in this document are described in 222 both prose and an augmented Backus-Naur Form (BNF) similar to that 223 used by RFC 822 [6]. This augmented BNF contains the following 224 constructs: 226 name = definition 227 The name of a rule and its definition separated by the equal 228 character ("="). Whitespace is only significant in that 229 continuation lines of a definition are indented. 231 "literal" 232 Quotation marks (") surround literal text, except for a literal 233 quotation mark, which is surrounded by angle-brackets ("<" and 234 ">"). Unless stated otherwise, the text is case-sensitive. 236 rule1 | rule2 237 Alternative rules are separated by a vertical bar ("|"). 239 (rule1 rule2 rule3) 240 Elements enclosed in parentheses are treated as a single element. 242 *rule 243 A rule preceded by an asterisk ("*") may have zero or more 244 occurrences. A rule preceded by an integer followed by an asterisk 245 must occur at least the specified number of times. 247 [rule] 248 A element enclosed in square brackets ("[" and "]") is optional. 250 2.2 Basic Rules 252 The following rules are used throughout this specification to 253 describe basic parsing constructs. 255 alpha = lowalpha | hialpha 256 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | 257 "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | 258 "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | 259 "y" | "z" 260 hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | 261 "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | 262 "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | 263 "Y" | "Z" 264 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | 265 "8" | "9" 266 OCTET = 267 CHAR = alpha | digit | separator | "!" | "#" | "$" | 268 "%" | "&" | "'" | "*" | "+" | "-" | "." | "`" | 269 "^" | "_" | "{" | "|" | "}" | "~" | CTL 270 CTL = 271 SP = 272 HT = 273 NL = 274 LWSP = SP | HT | NL 275 separator = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | 276 "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | 277 "}" | SP | HT 278 token = 1* 279 quoted-string = <"> *qdtext <"> 280 qdtext = and CTLs but including LWSP> 281 TEXT = 283 Note that newline (NL) need not be a single control character, but 284 can be a sequence of control characters. A system MAY define TEXT to 285 be a larger set of characters than . 288 2.3 URL Encoding 290 Some variables and constructs used here are described as being `URL- 291 encoded'. This encoding is described in section 2 of RFC 2396 [3]. In 292 a URL-encoded string an escape sequence consists of a percent 293 character ("%") followed by two hexadecimal digits, where the two 294 hexadecimal digits form an octet. An escape sequence represents the 295 graphic character which has the octet as its code within the US-ASCII 296 [20] coded character set, if it exists. Currently there is no 297 provision within the URI syntax to identify which character set non- 298 ASCII codes represent, so CGI handles this issue on an ad-hoc basis 299 for each case. 301 Note that some unsafe (reserved) characters may have different 302 semantics when encoded. The definition of which characters are unsafe 303 depends on the context; see section 2 of RFC 2396 [3], updated by RFC 304 2732 [11], for an authoritative treatment. These reserved characters 305 are generally used to provide syntatic structure to the character 306 string, for example as field separators. In all cases, the string is 307 first processed with regard to any reserved characters present, and 308 then the resulting data can be URL-decoded by replacing "%" escapes 309 by their character values. 311 To encode a character string, all reseved and forbidden characters 312 are replaced by the corresponding "%" escapes. The string can then be 313 used in assembling a URI. The reserved characters will vary from 314 context to context, but will always be drawn from this set: 316 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | 317 "," | "[" | "]" 319 The last two characters were added by RFC 2732 [11]. In any 320 particular context, a sub-set of these characters will be reserved; 321 the other characters from this set MUST NOT be encoded when a string 322 is URL-encoded in that context. Other basic rules used to describe 323 URI syntax are: 325 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" 326 | "c" | "d" | "e" | "f" 327 escaped = "%" hex hex 328 unreserved = alpha | digit | mark 329 mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" 331 3 Invoking the Script 333 3.1 Server Responsibilities 335 The server acts as an application gateway. It receives the request 336 from the client, select a CGI script to handle the request, convert 337 the request to a CGI request, execute the script and convert the CGI 338 response into a response for the client. When processing the client 339 request, it is responsible for implementing any protocol or transport 340 level authentication and security. The server MAY also function in a 341 `non-transparent' manner, modifying the request or response in order 342 to provide some additional service, such as media type transformation 343 or protocol reduction. 345 The server MUST perform translations and protocol conversions on the 346 request data required by this specification. Futhermore, the server 347 retains is responsibility to the client to conform to the network 348 protocol even if the CGI script fails to conform to this 349 specification. 351 If the server is applying authentication to the request, then it MUST 352 NOT execute the script unless the request passes all defined access 353 controls. 355 3.2 Script Selection 357 The server determines the CGI script to be executed based on a 358 generic-form URI supplied by the client. This URI includes a 359 hierarchical path with components separated by "/". For any 360 particular request, the server will identify all or a leading part of 361 this path with an individual script, thus placing the script at a 362 particular point in the path hierarchy. The remainder of the path, if 363 any, identifies a resource or sub-resource identifier to be 364 interpreted by the script. 366 Information about this split of the path is available to the script 367 in the meta-variables, described below. Support for non-hierarchical 368 URI schemes is outside the scope of this specification. 370 3.3 The Script-URI 372 The mapping from request URI to choice of script is defined by the 373 particular server implementation and its configuration. The server 374 MAY allow the script to be identified with a set of several different 375 URI path heierarchies, and therefore is permitted to replace the URI 376 by other members of this set during processing and generation of the 377 meta-variables. The server 379 - MAY preserve the URI in the particular request; or 381 - MAY select a canonical URI from the set of possible values for 382 each script; or 384 - can implement any other selection of URI from the set. 386 From the meta-variables thus generated, a URI, the `Script-URI' can 387 be constructed. This MUST have the property that if the client had 388 accessed this URI instead, then the script would have been executed 389 with the same values for the PATH_INFO and QUERY_STRING meta- 390 variables. The Script-URI has the syntax of a generic URI as defined 391 in section 3 of RFC 2396 [3], with the exception that object 392 parameters and fragment identifiers are not permitted. The various 393 components of the Script-URI are defined by some of the meta- 394 variables (see below); 396 script-URI = scheme "://" server-name [ ":" server-port ] 397 [ script-path [ extra-path ["?" query-string] ] ] 398 script-path = abs-path 399 extra-path = abs-path 400 abs-path = "/" path-segments 401 path-segments = segment *( "/" segment) 402 segment = *lchar 403 lchar = unreserved | escaped | extra 404 extra = ":" | "@" | "&" | "=" | "+" | "$" | "," 406 where `scheme' is found from SERVER_PROTOCOL, and script-path and 407 extra-path are URL-encoded versions of SCRIPT_NAME and PATH_INFO, 408 respectively, with ";", "=" and "?" reserved. See section 4.1.5 for 409 more information about the PATH_INFO meta-variable. 411 The scheme and the protocol are not identical as the scheme 412 identifies an access method in addition to a protocol. For instance, 413 a resource accessed using Transport Layer Security (TLS) [7] may have 414 a request URI with a scheme of https whilst using the HTTP protocol 415 [16]. CGI/1.1 provides no generic means for the script to reconstruct 416 this, and therefore the Script-URI as defined includes the base 417 protocol used. However, a script MAY make use of scheme-specific 418 meta-variables to better deduce the URI scheme. 420 Note that this definition also allows URIs to be constructed which 421 would invoke the script with any permissable values for the path-info 422 or query-string, by modifying the appropriate components. 424 3.4 Execution 426 The script is invoked in a system defined manner. Unless specified 427 otherwise, the file containing the script will be invoked as an 428 executable program. 430 4 The CGI Request 432 Information about a request comes from two different sources: the 433 request meta-variables and any associated message-body. 435 4.1 Request Meta-Variables 437 Meta-variables contain data about the request passed from the server 438 to the script, and are accessed by the script in a system defined 439 manner. Meta-variables are identified by case-insensitive names; 440 there cannot be two different variable whose names differ in case 441 only. Here they are shown using a canonical representation of 442 capitals plus underscore ("_"). A particular system can defined a 443 different representation. 445 meta-variable-name = "AUTH_TYPE" | "CONTENT_LENGTH" | 446 "CONTENT_TYPE" | "GATEWAY_INTERFACE" | 447 "PATH_INFO" | "PATH_TRANSLATED" | 448 "QUERY_STRING" | "REMOTE_ADDR" | 449 "REMOTE_HOST" | "REMOTE_IDENT" | 450 "REMOTE_USER" | "REQUEST_METHOD" | 451 "SCRIPT_NAME" | "SERVER_NAME" | 452 "SERVER_PORT" | "SERVER_PROTOCOL" | 453 "SERVER_SOFTWARE" | scheme | 454 protocol-var-name | extension-var-name 455 protocol-var-name = ( protocol | scheme ) "_" var-name 456 var-name = token 457 extension-var-name = token 459 Meta-variables with the name of a scheme, and names beginning with 460 the name of a protocol or scheme (e.g. HTTP_ACCEPT) are also be 461 specified. The number and meaning of these variables may change 462 independently of this specification. (See also section 4.1.18.) 464 The server MAY define additional implementation-specific extension 465 meta-variables, whose names SHOULD be prefixed with `X_'. 467 This specification does not distinguish between zero-length (NULL) 468 values and missing values. For example, a script cannot distinguish 469 between the requests http://host/script and http://host/script? ; in 470 both cases the QUERY_STRING meta-variable would be NULL. An optional 471 meta-variable may be ommitted (left unset) if its value is NULL. 473 meta-variable-value = "" | 475 Meta-variable values MUST be considered case-sensitive except as 476 noted otherwise. The representation of the characters in the meta- 477 variables is system defined; the server MUST convert values to that 478 character set. 480 4.1.1 AUTH_TYPE 482 The AUTH_TYPE variable identifies any mechanism used by the server to 483 authenticate the user. Currently defined values are specific to 484 requests made via the HTTP protocol. 486 If the client request required authentication for external access, 487 then the server MUST set the value of this variable from the `auth- 488 scheme' token in the request Authorization HTTP header field. 489 Otherwise the variable is set to NULL. The syntax is for this 490 variable is described in RFC 2617 [9]: 492 AUTH_TYPE = "" | auth-scheme 493 auth-scheme = "Basic" | "Digest" | token 495 HTTP access authentication schemes are described in section 11 of the 496 HTTP/1.1 specification [8]. The auth-scheme is not case-sensitive. 498 4.1.2 CONTENT_LENGTH 500 The CONTENT_LENGTH variable contains the size of the message-body 501 entity attached to the request, if any, in decimal number of octets. 502 If no data is attached, then NULL (or unset). 504 CONTENT_LENGTH = "" | 1*digit 506 The server MUST set this meta-variable if the request is accompanied 507 by a message-body entity. The CONTENT_LENGTH value must reflect the 508 length of the message-body after the server has removed any transfer- 509 codings or content-codings. 511 4.1.3 CONTENT_TYPE 513 If the request includes a message-body, the CONTENT_TYPE variable is 514 set to the Internet Media Type [10] of the attached entity. 516 CONTENT_TYPE = "" | media-type 517 media-type = type "/" subtype *( ";" parameter ) 518 type = token 519 subtype = token 520 parameter = attribute "=" value 521 attribute = token 522 value = token | quoted-string 524 The type, subtype and parameter attribute names are not case- 525 sensitive. Parameter values may be case sensitive. Media types and 526 their use in HTTP are described section 3.7 of the HTTP/1.1 527 specification [8]. 529 There is no default value for this variable. If and only if it is 530 unset, then the script MAY attempt to determine the media type from 531 the data received. If the type remains unknown, then the script MAY 532 choose to assume a type of application/octet-stream or it may reject 533 the request with an error (as described in section 6.3.3). 535 Each media-type defines a set of optional and mandatory parameters. 536 This may include a charset parameter with a case-insensitive value 537 defining the coded character set for the attached entity. If the 538 charset parameter is omitted, then the default value should be 539 derived according to whichever of the following rules is the first to 540 apply: 542 - There MAY be a system-defined default charset for some media- 543 types. 545 - The default for media-types of type `text' is ISO-8859-1 [8]. 547 - Any default defined in the media-type specification. 549 - The default is US-ASCII. 550 The server MUST set this meta-variable if an HTTP Content-Type field 551 is present in the original request header. If the server receives a 552 request with an attached entity but no Cotent-Type header field, it 553 MAY attempt to determine the correct content type, otherwise it 554 should omit this meta-variable. 556 4.1.4 GATEWAY_INTERFACE 558 The GATEWAY_INTERFACE variable MUST be set to the dialect of CGI 559 being used by the server to communicate with the script. Syntax: 561 GATEWAY_INTERFACE = "CGI" "/" 1*digit "." 1*digit 563 Note that the major and minor numbers are treated as separate 564 integers and hence each may be incremented higher than a single 565 digit. Thus CGI/2.4 is a lower version than CGI/2.13 which in turn is 566 lower than CGI/12.3. Leading zeros MUST be ignored by the script and 567 MUST NOT be generated by the server. 569 This document defines the 1.1 version of the CGI interface. 571 4.1.5 PATH_INFO 573 The PATH_INFO variable specifies a path to be interpreted by the CGI 574 script. It identifies the resource or sub-resource to be returned by 575 the CGI script, and MUST be derived from the the portion of the URI 576 path heirarchy following that part that identifies the script itself. 577 Unlike a URI path, the PATH_INFO is not URL-encoded, and cannot 578 contain path-segment parameters. A PATH_INFO of "/" represents a 579 single void path segment. 581 PATH_INFO = "" | ( "/" path ) 582 path = psegment *( "/" psegment ) 583 psegment = *pchar 584 pchar = 586 The value is considered case-sensitive and the server MUST preseve 587 the case of the path as presented in the request URI. The server MAY 588 impose restrictions and limitations on what values it permits for 589 PATH_INFO, and MAY reject the request with an error if it encounters 590 any values considered objectionable. Similarly, treatment of non US- 591 ASCII characters in the path is system defined. 593 URL-encoded, the PATH_INFO string forms the extra-path component of 594 the Script-URI (see section 3.2) that follows the SCRIPT_NAME part of 595 that path. 597 4.1.6 PATH_TRANSLATED 599 The PATH_TRANSLATED variable is derived by taking the PATH_INFO, 600 parsing it as a URI in its own right, and performing any virtual-to- 601 physical translation appropriate to map it onto the server's document 602 repository structure. 604 PATH_TRANSLATED = *TEXT 606 This is the file location that would be accessed by a request for 608 scheme "://" server-name ":" server-port enc(PATH_INFO) 610 where `scheme' is found from SERVER_PROTOCOL (as described in section 611 3.2) and `enc(PATH_INFO)' is a URL-encoded version of PATH_INFO, with 612 ";", "=" and "?" reserved. For example, a request such as the 613 following: 615 http://somehost.com/cgi-bin/somescript/this%2eis%2epath%3binfo 617 the PATH_INFO component would be decoded, and the result parsed as 618 though it were a request for the following: 620 http://somehost.com/this.is.the.path%3binfo 622 This would then be translated to a location in the server's document 623 repository, perhaps a filesystem path something like this: 625 /usr/local/www/htdocs/this.is.the.path;info 627 The result of the translation is the value of PATH_TRANSLATED. 629 The value of PATH_TRANSLATED may or may not map to a valid repository 630 location. The server MUST preserve the case of the path-info segment 631 if and only if the underlying repository supports case-sensitive 632 names. If the repository is only case-aware, case-preserving, or 633 case-blind with regard to document names, the server is not required 634 to preserve the case of the original segment through the translation. 636 The set of characters permitted in the repository location are system 637 defined. 639 The translation algorithm the server uses to derive PATH_TRANSLATED 640 is implementation defined; CGI scripts which use this variable may 641 suffer limited portability. 643 The server SHOULD set this meta-variable if the request URI includes 644 a path-info component. If PATH_INFO is NULL, then the PATH_TRANSLATED 645 variable MUST be set to NULL (or unset). 647 4.1.7 QUERY_STRING 649 The QUERY_STRING variable contains a URL-encoded search or parameter 650 string; it provides information to the CGI script to affect or refine 651 the document to be returned by te script. 653 The URL syntax for a search string is described in section 3 of RFC 654 2396 [3]. The QUERY_STRING value is case-sensitive. 656 QUERY_STRING = query-string 657 query-string = *uric 658 uric = reserved | unreserved | escaped 660 When parsing and decoding the query string, the detail of the 661 parsing, reserved characters and non US-ASCII characters depends on 662 the context. For example, form submission from an HTML document [15] 663 uses application/x-www-form-urlencoded encoding, in which the 664 characters "+", "&" and "=" are reserved, and the ISO 8859-1 encoding 665 may used for non US-ASCII characters. 667 The QUERY_STRING value provides the query-string part of the Script- 668 URI. (See section 3.2). 670 The server MUST set this variable; if the Script-URI does not include 671 a query component, the QUERY_STRING MUST be defined as an empty 672 string (""). 674 4.1.8 REMOTE_ADDR 676 The REMOTE_ADDR variable MUST be set to the network address of the 677 client sending the request to the server. 679 REMOTE_ADDR = hostnumber 680 hostnumber = ipv4-address | ipv6-address 681 ipv4-address = 1*3digit "." 1*3digit "." 1*3digit "." 1*3digit 682 ipv6-address = hexpart [ ":" ipv4-address ] 683 hexpart = hexseq | ( [ hexseq ] "::" [ hexseq ] ) 684 hexseq = 1*4hex *( ":" 1*4hex ) 686 The format of IPv6 addresses is defined in RFC 2373 [12]. 688 4.1.9 REMOTE_HOST 690 The REMOTE_HOST variable contains the fully qualified domain name of 691 the client sending the request to the server, if available, otherwise 692 NULL. Fully qualified domain names take the form as described in 693 section 3.5 of RFC 1034 [14] and section 2.1 of RFC 1123 [4]. Domain 694 names are not case sensitive. 696 REMOTE_HOST = "" | hostname | hostnumber 697 hostname = *( domainlabel "." ) toplabel 698 domainlabel = alphanum [ *alphahypdigit alphanum ] 699 toplabel = alpha [ *alphahypdigit alphanum ] 700 alphahypdigit = alphanum | "-" 702 The server SHOULD set this variable. If the hostname is not available 703 for performance reasons or otherwise, the server MAY substitute the 704 REMOTE_ADDR value. 706 4.1.10 REMOTE_IDENT 708 The REMOTE_IDENT variable MAY be used to provides identity 709 information reported about the connection by an RFC 1413 [17] request 710 to the remote agent, if available. The server may choose not to 711 support this feature, or not to request the data for efficiency 712 reasons, or not to return available identity data. The server should 714 REMOTE_IDENT = *TEXT 716 The data returned may be used for authentication purposes, but the 717 level of trust reposed in it should be minimal. 719 4.1.11 REMOTE_USER 721 The REMOTE_USER variable provides a user identification string 722 supplied by client as part of user authentication. 724 REMOTE_USER = *TEXT 726 If the request required HTTP Authentication [9] (i.e. the AUTH_TYPE 727 meta-variable is set to `Basic' or `Digest'), then the value of the 728 REMOTE_USER meta-variable MUST be set to the user-ID supplied. 730 4.1.12 REQUEST_METHOD 732 The REQUEST_METHOD meta-variable MUST be set to the method that 733 should be used by the script to process the request, as described in 734 section 5.1.1 of the HTTP/1.0 specification [2] and section 5.1.1 of 735 the HTTP/1.1 specification [8]. 737 REQUEST_METHOD = method 738 method = "GET" | "POST" | "HEAD" | extension-method 739 extension-method = "PUT" | "DELETE" | token 741 The method is case sensitive. The methods are described in section 742 4.3. 744 4.1.13 SCRIPT_NAME 746 The SCRIPT_NAME variable MUST be set to a URI path that could 747 identify the CGI script (rather then the script's output). The syntax 748 is the same as for PATH_INFO (section 4.1.5) 750 SCRIPT_NAME = "" | ( "/" path ) 752 The leading "/" is not part of the path. It is optional if the path 753 is NULL; however, the variable MUST still be set in that case. 755 The SCRIPT_NAME string forms some leading part of the path component 756 of the Script-URI derived in some implementation defined manner. No 757 PATH_INFO segment (see section 4.1.5) is included in the SCRIPT_NAME 758 value. 760 4.1.14 SERVER_NAME 762 The SERVER_NAME variable MUST be set to name of the server host to 763 which the client request is directed. It is a case-insensitive 764 hostname or network address. It forms the host part of the Script- 765 URI. The syntax for an IPv6 address in a URI is defined in RFC 2373 766 [12]. 768 SERVER_NAME = server-name 769 server-name = hostname | ipv4-address | ( "[" ipv6-address "]" ) 771 A deployed server can have more than one possible value for this 772 variable, where several HTTP virtual hosts share the same IP address. 773 In that case, the server uses the contents of the Host header to 774 select the correct virtual host. 776 4.1.15 SERVER_PORT 778 The SERVER_PORT variable MUST be set to the TCP/IP port number on 779 which this request is received from the client. This value is used in 780 the port part of the Script-URI. 782 SERVER_PORT = server-port 783 server-port = 1*digit 785 Note that this variable MUST be set to the port number, even if the 786 port is the default port for the scheme and could otherwise be 787 omitted from a URI. 789 4.1.16 SERVER_PROTOCOL 791 The SERVER_PROTOCOL variable MUST be set to the name and revision of 792 the application protocol used for this CGI request. This is not 793 necessarily the same as the protocol version used by the server in 794 its response to the client. 796 SERVER_PROTOCOL = HTTP-Version | "INCLUDED" | extension-version 797 HTTP-Version = "HTTP" "/" 1*digit "." 1*digit 798 extension-version = protocol [ "/" 1*digit "." 1*digit ] 799 protocol = token 801 `protocol' is a version of the scheme part of the Script-URI, and is 802 not case sensitive. By convention, `protocol' is in upper case. The 803 protocol may not be identical to the scheme of the request; for 804 example, the request may have scheme `https', whilst the protocol is 805 `HTTP'. 807 A well-known value for SERVER_PROTCOL which the server MAY use is 808 `INCLUDED', which signals that the current document is being included 809 as part of a composite document, rather than being the direct target 810 of the client request. The script MAY treat this as an HTTP/1.0 811 request. 813 The server MUST set this meta-variable. 815 4.1.17 SERVER_SOFTWARE 817 The SERVER_SOFTWARE meta-variable MUST be set to the name and version 818 of the information server software answering the request (and running 819 the gateway). It SHOULD be the same as the server description 820 reported to the client, if any. 822 SERVER_SOFTWARE = 1*( product | comment ) 823 product = token [ "/" product-version ] 824 product-version = token 825 comment = "(" *( ctext | comment ) ")" 826 ctext = 828 4.1.18 Protocol-Specific Meta-Variables 830 The server SHOULD set meta-variables specific to the protocol and 831 scheme for the request. Interpretation of protocol-specific variables 832 depends on the protocol version in SERVER_PROTOCOL. The server MAY 833 set a meta-variable with the name of the scheme to a non-NULL value 834 if the scheme is different to the protocol. The presence of such a 835 variable indicates to a script which scheme is used by the request. 837 Meta-variables with names beginning with `HTTP_' contain values read 838 from the client request header fields, if the protocol used is HTTP. 839 The HTTP header field name is converted to upper case, has all 840 occurrences of "-" replaced with "_" and has `HTTP_' prepended to 841 give the meta-variable name. The header data can be presented as sent 842 by the client, or can be rewritten in ways which do not change its 843 semantics. If multiple header fields with the same field-name are 844 received then they the server MUST rewrite them as a value having the 845 same semantics. Similarly, a header field that is received on more 846 than one line must be merged onto a single line. The server MUST, if 847 necessary, change the representation of the data (for example, the 848 character set) to be appropriate for a CGI meta-variable. 850 The server is not required to create meta-variables for all the 851 headers that it receives. In particular, it SHOULD remove any headers 852 carrying authentication information, such as `Authorization'; or 853 which are available to the script via other variables, such as 854 `Content-Length' and `Content-Type'. The server MAY remove headers 855 which relate solely to client-side communication issues, such as 856 `Connection'. 858 4.2 Request Message-Body 860 As there may be a data entity attached to the request, there MUST be 861 a system defined method for the script to read this data. Unless 862 defined otherwise, this will be via the `standard input' file 863 descriptor. 865 If the CONTENT_LENGTH is not NULL, the server MUST make at least that 866 many bytes available for the script to read. The script is not 867 obliged to read the data. The server MAY signal an end-of-file 868 condition after CONTENT_LENGTH bytes have been read, but is not 869 obliged to do so. Therefore, the script MUST NOT attempt to read 870 more than CONTENT_LENGTH bytes, even if more data is available. 872 For non-parsed header (NPH) scripts (section 5), the server SHOULD 873 attempt to ensure that the data supplied to the script is precisely 874 as supplied to by the client and is unaltered by the server. 876 For a regular parsed-header script, the server MUST remove any 877 transfer-codings from the message-body (and re-caclcuate the 878 CONTENT_LENGTH), and it MAY remove any content-codings. 880 4.3 Request Methods 881 The Request Method, as supplied in the REQUEST_METHOD meta-variable, 882 identifies the processing method to be applied by the script in 883 producing a response. The script author can choose to implement the 884 methods most appropriate for the particular application. If the 885 script receives a request with a method it does not support it SHOULD 886 reject it with an error (see section 6.3.3). 888 4.3.1 GET 890 The GET method method indicates that the script should produce a 891 document based on the meta-variable values. By convention, the GET 892 method is `safe' and `idempotent' and SHOULD NOT have the the 893 significance of taking an action other than producing a document. 895 The meaning of the GET method may be modified and refined by 896 protocol-specific meta-variables. 898 4.3.2 POST 900 The POST method is used to request the script perform processing and 901 produce a document based on the data in the request message body, in 902 addition to meta-variable values. A common use is form submission in 903 HTML [15], intended to initate processing by the script that has a 904 permanent affect, such a change in a database. 906 The script MUST check the value of the CONTENT_LENGTH variable before 907 reading the attached message body, and SHOULD check the CONTENT_TYPE 908 value before processing it. 910 4.3.3 HEAD 912 The HEAD method requests the script to do the sufficient processing 913 to return the response header fields, without providing a response 914 message body. The script MUST NOT provide a response message body for 915 a HEAD request. If it, does conformance to the HTTP standard would 916 REQUIRE a server to remove the response message body when returning 917 the request to the client. 919 4.3.4 Protocol-Specific Methods 921 The script MAY implement any protocol-specific method, such as 922 HTTP/1.1 PUT and DELETE; it SHOULD check the value for 923 SERVER_PROTOCOL when doing so. 925 The server MAY decide that some methods are not appropriate or 926 permitted for a script, and may handle the methods itself or return 927 an error to the client. 929 4.4 The Script Command Line 930 Some systems support a method for supplying an array of strings to 931 the CGI script. This is only used in the case of an `indexed' HTTP 932 query. This is identified by a `GET' or `HEAD' request with a URI 933 query string not containing any unencoded "=" characters. For such a 934 request, the server SHOULD treat the query-string as a search-string 935 and parse it into words, using the rules 937 search-string = search-word *( "+" search-word ) 938 search-word = 1*schar 939 schar = unreserved | escape | xreserved 940 xreserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "," | 941 "$" 943 After parsing, each search-word is URL-decoded, optionally encoded in 944 a system defined manner and then added to the argument list. 946 If the server cannot create any part of the argument list, then the 947 server MUST NOT generate any command line information. For example, 948 the number of arguments may be greater than operating system or 949 server limitations, or one of the words may not be representable as 950 an argument. 952 The script SHOULD check to see if the QUERY_STRING value contains an 953 unencoded "=" character, and SHOULD NOT use the command line 954 arguments if it does. 956 5 NPH Scripts 958 5.1 Indentification 960 The server MAY support NPH (Non-Parsed Header) scripts; these are 961 scripts to which the server passes all responsbility for response 962 processing. 964 This specification provides no mechanism for an NPH script to be 965 identified on the basis of its output data alone. By convention, 966 therefore, any particular script can only ever provide output of one 967 type (NPH or CGI) and hence the script itself is described as an `NPH 968 script'. A server with NPH support MUST provide an implemenation- 969 defined mechanism for identifying NPH scripts, perhaps based on the 970 name or location of the script. 972 5.2 NPH Response 974 There MUST be a system defined method for the script to send data 975 back to the server or client; a script MUST always return some data. 976 Unless defined otherwise, this will be the same as for conventional 977 CGI scripts. 979 Currently, NPH scripts are only defined for HTTP client requests. An 980 (HTTP) NPH script MUST return a complete HTTP response message, as 981 described in section 6 of the HTTP specifications [2], [8], as 982 revised from time to time. The script MUST use the SERVER_PROTOCOL 983 variable to determine the appropriate format for a response. It MUST 984 also take account of any generic or protocol-specific meta-variables 985 in the request as might be manadated by the particular protocol 986 specification. 988 The server MUST ensure that the script output is sent to the client 989 unmodified. Note that this requires the script to use correct 990 character set (US-ASCII [20] and ISO-Latin-1 [21] for HTTP) in the 991 headers. The server SHOULD attempt to ensure that the script output 992 is sent directly to the client, with minimal internal and no 993 transport-visible buffering. 995 Unless the implementation defines otherwise, the script MUST NOT 996 indicate in its response that the client can send further requests 997 over the same connection. 999 6 CGI Response 1001 6.1 Response Handling 1003 A script MUST always provide a non-empty response, and so there MUST 1004 be a system defined method for it to send this data back to the 1005 server or client. Unless defined otherwise, this will be via the 1006 `standard output' file descriptor. 1008 The script MUST check the REQUEST_METHOD variable when processing the 1009 request and preparing its response. 1011 The server MAY implement a timeout period within which data must be 1012 received from the script. If a server implementation defines such a 1013 timeout and receives no data from a script within the timeout period, 1014 the server MAY terminate the script process. 1016 6.2 Response Types 1018 The response comprises a header and a body, separated by a blank 1019 line. The body may be NULL. 1021 generic-response = 1*header-field NL [Entity-Body] 1023 The script MUST return one of either a document response, a local 1024 redirect response or a client redirect (with optional document) 1025 response. In the repsonse definitions below, the order of header 1026 fields in a response is not signficant (despite appearing so in the 1027 BNF). The header fields are defined in section 6.3. 1029 CGI-Response = document-response | local-redir-response | 1030 client-redir-response | client-redirdoc-response 1032 6.2.1 Document Response 1034 The CGI script can return a document to the user in a document 1035 response, with an optional error code indicating the success status 1036 of the response. 1038 document-response = Content-Type [ Status ] *other-field NL 1039 Entity-Body 1041 The script MUST return a Content-Type header field. A Status header 1042 field is optional, and status 200 `OK' is assumed if it is ommitted. 1043 The server MUST make any appropriate modifications to the script's 1044 output to ensure that the response to the client complies with the 1045 response protocol version. 1047 6.2.2 Local Redirect Response 1049 The CGI script can return a URI path and query-string (`local- 1050 pathquery') for a local resource in a Location header. This indicates 1051 to the server that it should re-process the request using the path 1052 specified. 1054 local-redir-response = local-Location NL 1056 The script MUST NOT return any other head fields or an entity body, 1057 and the server MUST generate the response that it would have produced 1058 in response to a request containing the URL 1060 scheme "://" server-name ":" server-port local-pathquery 1062 6.2.3 Client Redirect Response 1064 The CGI script can return an absolute URI path in a Location header, 1065 to indicate to the client that it should re-process the request using 1066 the URI specified. 1068 client-redir-response = client-Location *other-field NL 1070 The script MUST not provide any other header fields. For an HTTP 1071 client request, the server MUST generate a 302 `Found' HTTP response 1072 message. 1074 6.2.4 Client Redirect Response with Document 1076 The CGI script can return an absolute URI path in a Location header 1077 together with an attached document, to indicate to the client that it 1078 should re-process the request using the URI specified. 1080 client-redirdoc-response = client-Location Status Content-Type 1081 *other-field NL Entity-Body 1083 The Status header field MUST be supplied and MUST contain a status 1084 value of 302 `Found'. The server MUST make any appropriate 1085 modifications to the script's output to ensure that the response to 1086 the client complies with the response protocol version. 1088 6.3 Response Header Fields 1090 The header fields are either CGI or extension header fields to be 1091 interpreted by the server, or protocol-specific headers to be 1092 included in the response returned to the client. At least one CGI 1093 field MUST be supplied, and no CGI field can be used more than once 1094 in a response. The response headers have the syntax: 1096 header-field = CGI-field | other-field 1097 CGI-field = Content-Type | Location | Status 1098 other-field = protocol-field | extension-field 1099 protocol-field = generic-field 1100 extension-field = generic-field 1101 generic-field = field-name ":" [ field-value ] NL 1102 field-name = token 1103 field-value = *( field-content | LWSP ) 1104 field-content = *( token | separator | quoted-string ) 1106 The field-name is not case sensitive. A NULL field value is 1107 equivalent to a field not being sent. Note that each header field in 1108 a CGI-Response MUST be specified on a single line; CGI/1.1 does not 1109 support continuation lines. Whitespace is permitted between the ":" 1110 and the field-value (but not between the field-name and the ":"), and 1111 also between tokens in the field-value. 1113 6.3.1 Content-Type 1115 The Content-Type response field sets Internet Media Type [10] of the 1116 entity body, which SHOULD be sent unmodified to the client, except 1117 for any required transfer-codings or content-codings. 1119 Content-Type = "Content-Type:" media-type NL 1121 If a entity body is returned, the script MUST supply a Content-Type 1122 field in the response. If it fails to do so, the server SHOULD NOT 1123 attempt to determine the correct content type. This field MUST NOT 1124 appear more than once in the repsonse. 1126 Unless it is otherwise system-defined, the default charset assumed by 1127 the client for text media-types is ISO-8859-1 if the protocol is HTTP 1128 and US-ASCII otherwise. Hence the script SHOULD include a charset 1129 parameter. See section 3.4.1 of the HTTP/1.1 specification [8] for a 1130 discussion of this. 1132 6.3.2 Location 1134 The Location header field is used to specify to the server that the 1135 script is returning a reference to a document rather than an actual 1136 document. It is either an absolute URI (with fragment), indicating 1137 that the client is to fetch the referenced document, or a local path 1138 (with query string), indicating that the server is to fetch the 1139 referenced document. 1141 Location = local-Location | client-Location 1142 client-Location = "Location:" fragment-URI NL 1143 local-Location = "Location:" local-pathquery NL 1144 fragment-URI = absoluteURI [ # fragment ] 1145 fragment = *uric 1146 local-pathquery = abs-path [ "?" query-string ] 1148 The syntax of an absoluteURI is incorporated into this document from 1149 that specified in RFC 2396 [3] and RFC 2732 [11]. The two forms can 1150 be distingished as a local-pathquery must start with a "/" character, 1151 whereas an absoluteURI must start with a scheme; scheme names cannot 1152 contain "/" characters. 1154 Note that any message body attached to the request (such as for a 1155 POST request) may not be available to the resource that is the target 1156 of the redirect. This field MUST NOT appear more than once in the 1157 repsonse. 1159 6.3.3 Status 1161 The Status header field is used to indicate to the server what status 1162 code the server MUST use in the response message. 1164 Status = "Status:" status-code SP reason-phrase NL 1165 status-code = 200 | 302 | 400 | 501 | 3digit 1166 reason-phrase = *TEXT 1168 Status code 200 `OK' indicates success, and is the default value 1169 assumed for a document response. Status code 302 `Found' is used with 1170 a Location header-field and response entity body. Status code 400 1171 `Bad Request' may be used for an unknown request format, such as a 1172 missing COTENT_TYPE. Status code 501 `Not Implemented' may be 1173 returned by a script if it receives an unsupported REQUEST_METHOD. 1175 Other valid status codes are listed in section 6.1.1 of the HTTP 1176 specifications [2], [8], and also the IANA HTTP Status Code Registry 1177 [18], and can be used in addition to or instead of the ones listed 1178 above. The script SHOULD check the value of SERVER_PROTOCOL before 1179 using HTTP/1.1 status codes. 1181 Note that returning an error status code does not have to mean an 1182 error condition with the script itself. For example, a script that is 1183 invoked as an error handler by the server should return the code 1184 appropriate to the server's error condition. This field MUST NOT 1185 appear more than once in the repsonse. 1187 The reason-phrase is a textual description of the error to be 1188 returned to the client for human consumption. 1190 6.3.4 Protocol-Specific Header Fields 1192 The script MAY return any other headers that relate to the response 1193 message defined by the specification for the SERVER_PROTOCOL 1194 (HTTP/1.0 [2] or HTTP/1.1 [8]). The server MUST translate the header 1195 data from the CGI header syntax to the HTTP header syntax if these 1196 differ. For example, the character sequence for newline (such as 1197 UNIX's US-ASCII LF) used by CGI scripts may not be the same as that 1198 used by HTTP (US-ASCII CR followed by LF). 1200 The script MUST NOT return any header fields that relate to client- 1201 side communcation issues and could affect the server's ability to 1202 send the response to the client. The server MAY remove any such 1203 header fields returned by the client. It SHOULD resolve any conflicts 1204 between headers returned by the script and headers that it would 1205 otherwise send itself. 1207 6.3.5 Extension Header Fields 1209 The server may define additional implementation-specific CGI header 1210 fields, whoses field names SHOULD begin with `X-CGI-'. It MAY ignore 1211 (and delete) any unrecognised header-fields with names beginning `X- 1212 CGI-'. 1214 6.4 Response Message Body 1216 The response entity body is a message body to be returned to the 1217 client by the server. The server MUST read all the data provided by 1218 the script, until the script signals the end of the entity body by 1219 way of an end of file condition. 1221 Entity-Body = *OCTET 1223 7 System Specifications 1224 7.1 AmigaDOS 1226 Meta-Variables 1227 The server SHOULD use environment variables as the mechanism of 1228 providing request meta-data to the CGI script. These are accessed 1229 by the DOS library routine GetVar. The flags argument SHOULD be 0. 1230 Case is ignored, but upper case is recommended for compatibility 1231 with case-sensitive systems. 1233 The current working directory 1234 The current working directory for the script is set to the 1235 directory containing the script. 1237 Character set 1238 The US-ASCII character set [20] is used for the definition of 1239 meta-variables, headers and values; the newline (NL) sequence is 1240 LF; servers SHOULD also accept CR LF as a newline. 1242 7.2 UNIX 1244 For UNIX compatible operating systems, the following are defined: 1246 Meta-Variables 1247 The server MUST use environment variables as the mechanism of 1248 providing request meta-data to the CGI script. These are accessed 1249 by the C library routine getenv. 1251 The command line 1252 This is accessed using the the argc and argv arguments to main(). 1253 The words have any characters which are `active' in the Bourne 1254 shell escaped with a backslash. 1256 The current working directory 1257 The current working directory for the script SHOULD be set to the 1258 directory containing the script. 1260 Character set 1261 The US-ASCII character set [20], excluding NUL, is used for the 1262 definition of meta-variables, headers and CHAR values; TEXT values 1263 are ISO-8859-1. The newline (NL) sequence is LF; servers should 1264 also accept CR LF as a newline. 1266 7.3 EBCDIC/POSIX 1268 For POSIX compatible operating systems using the EBCDIC character 1269 set, the following are defined: 1271 Meta-Variables 1272 The server MUST use environment variables as the mechanism of 1273 providing request meta-data to the CGI script. These are accessed 1274 by the C library routine getenv. 1276 The command line 1277 This is accessed using the the argc and argv arguments to main(). 1278 The words have any characters which are `active' in the Bourne 1279 shell escaped with a backslash. 1281 The current working directory 1282 The current working directory for the script SHOULD be set to the 1283 directory containing the script. 1285 Character set 1286 The EBCDIC-CP-US character set [19], excluding NUL, is used for 1287 the definition of meta-variables, headers all values. The newline 1288 (NL) sequence is LF; servers should also accept CR LF as a 1289 newline. 1291 media-type charset default 1292 The default charset value for text (and other implementation- 1293 defined) media types is EBCDIC-CP-US. 1295 8 Implementation 1297 8.1 Recommendations for Servers 1299 Servers may reject with error 404 `Not Found' any requests that would 1300 result in an encoded "/" being decoded into PATH_INFO or SCRIPT_NAME, 1301 as this might represent a loss of information to the script. 1303 Although the server and the CGI script need not be consistent in 1304 their handling of URL paths (client URLs and the PATH_INFO data, 1305 respectively), server authors may wish to impose consistency. So the 1306 server implementation should define its behaviour for the following 1307 cases: 1309 - define any restrictions on allowed path segments, in particular 1310 whether non-terminal NULL segments are permitted; 1312 - define the behaviour for "." or ".." path segments; i.e. whether 1313 they are prohibited, treated as ordinary path segments or 1314 interpreted in accordance with the relative URL specification 1315 [3]; 1317 - define any limits of the implementation, including limits on 1318 path or search string lengths, and limits on the volume of 1319 headers the server will parse. 1320 Servers may generate the Script-URI in any way from the client URI, 1321 or from any other data (but the behaviour should be documented). 1323 8.2 Recommendations for Scripts 1325 The server might interrupt or terminate script execution at any time 1326 and without warning, so the script SHOULD be prepared to handle 1327 abnormal termination. 1329 The script MAY reject with error 405 `Method Not Allowed' HTTP/1.1 1330 requests made using a method it does not support. If the script does 1331 not intend processing the PATH_INFO data, then it should reject the 1332 request with 404 Not Found if PATH_INFO is not NULL. 1334 If the output of a form is being processed, check that CONTENT_TYPE 1335 is `application/x-www-form-urlencoded' [15] or `multipart/form-data' 1336 [13]. If CONTENT_TYPE is blank, the script can reject the request 1337 with a 415 `Unsupported Media Type' error, where supported by the 1338 protocol. 1340 When parsing PATH_INFO, PATH_TRANSLATED or SCRIPT_NAME the script 1341 SHOULD be careful of void path segments ("//") and special path 1342 segments ("." and ".."). They SHOULD either be removed from the path 1343 before use in OS system calls, or the request SHOULD be rejected with 1344 404 `Not Found'. 1346 When returning headers, the script SHOULD try to send the CGI headers 1347 as soon as possible, and SHOULD send them before any HTTP headers. 1348 This may help reduce the server's memory requirements. 1350 9 Security Considerations 1352 9.1 Safe Methods 1354 As discussed in the security considerations of the HTTP 1355 specifications [2], [8], the convention has been established that the 1356 GET and HEAD methods should be `safe' and `idempotent'; they should 1357 cause no side-effects and only have the significance of resource 1358 retrieval. An idempotent request may be repeated an arbitrary number 1359 of times and produce side effects identical to a single request. 1361 9.2 HTTP Headers Containing Sensitive Information 1363 Some HTTP headers may carry sensitive information which the server 1364 should not pass on to the script unless explicitly configured to do 1365 so. For example, if the server protects the script using the Basic 1366 authentication scheme, then the client will send an Authorization 1367 header containing a username and password. If the server, rather than 1368 the script, validates this information then it should not pass on the 1369 password via the HTTP_AUTHORIZATION meta-variable without careful 1370 consideration. This also applies to the Proxy-Authorization header 1371 field and the corresponding HTTP_PROXY_AUTHORIZATION meta-variable. 1373 9.3 Data Privacy 1375 Confidential data in a request should be placed in a message-body as 1376 part of a POST request, and not placed in the URI or message headers. 1377 On some systems, the environment used to pass meta-variables to a 1378 script may be visible to other scripts or users. In addition, many 1379 existing servers, proxies and client will log the URI where it might 1380 be visible to third parties. 1382 9.4 TLS Connection Endpoint 1384 For a connection using TLS, the security applies between the client 1385 and the server, and not between the client and the script. It is the 1386 server's responsibility to handle the TLS session, and thus it is the 1387 server that is authenticated to the client, not the CGI script. 1389 9.5 Server/Script Authentication 1391 This specification provides no mechanism for the script to 1392 authenticate the server that invoked it. There is no enforced 1393 integrity on the CGI request and response messages. 1395 9.6 Script Interference with the Server 1397 The most common implementation of CGI invokes the script as a child 1398 process using the same user and group as the server process. It 1399 should therefore be ensured that the script cannot interfere with the 1400 server process, its configuration, documents or log files. 1402 If the script is executed by calling a function linked in to the 1403 server software (either at compile-time or run-time) then precautions 1404 should be taken to protect the core memory of the server, or to 1405 ensure that untrusted code cannot be executed. 1407 9.7 Data Length and Buffering Considerations 1409 This specification places no limits on the length of the message-body 1410 presented to the script. The script should not assume that statically 1411 allocated buffers of any size are sufficient to contain the entire 1412 submission at one time. Use of a fixed length buffer without careful 1413 overflow checking may result in an attacker exploiting `stack- 1414 smashing' or `stack-overflow' vulnerabilities of the operating 1415 system. The script may spool large submissions to disk or other 1416 buffering media, but a rapid succession of large submissions may 1417 result in denial of service conditions. If the CONTENT_LENGTH of a 1418 message-body is larger than resource considerations allow, scripts 1419 should respond with an error status appropriate for the protocol 1420 version; potentially applicable status codes include 503 `Service 1421 Unavailable' (HTTP/1.0 and HTTP/1.1), 413 `Request Entity Too Large' 1422 (HTTP/1.1), and 414 `Request-URI Too Large' (HTTP/1.1). 1424 Similar considerations apply to the server's handling of the CGI 1425 response from the script. There is no limit on the length of the 1426 message body returned by the script; the server should not assume 1427 that statically allocated buffers of any size are sufficient to 1428 contain the entire response. 1430 9.8 Stateless Processing 1432 The stateless nature of the Web makes each script execution and 1433 resource retrieval independent of all others even when multiple 1434 requests constitute a single conceptual Web transaction. Because of 1435 this, a script should not make any assumptions about the context of 1436 the user-agent submitting a request. In particular, scripts should 1437 examine data obtained from the client and verify that they are valid, 1438 both in form and content, before allowing them to be used for 1439 sensitive purposes such as input to other applications, commands, or 1440 operating system services. These uses include, but are not limited 1441 to: system call arguments, database writes, dynamically evaluated 1442 source code, and input to billing or other secure processes. It is 1443 important that applications be protected from invalid input 1444 regardless of whether the invalidity is the result of user error, 1445 logic error, or malicious action. 1447 Authors of scripts involved in multi-request transactions should be 1448 particularly cautios about validating the state information; 1449 undesirable effects may result from the substitution of dangerous 1450 values for portions of the submission which might otherwise be 1451 presumed safe. Subversion of this type occurs when alterations are 1452 made to data from a prior stage of the transaction that were not 1453 meant to be controlled by the client (e.g., hidden HTML form 1454 elements, cookies, embedded URLs, etc.). 1456 9.9 Non-parsed Header Output 1458 If a script returns a non-parsed header output, to be interpreted by 1459 the client in its native protocol, then the script MUST address all 1460 security considerations relating to that protocol. 1462 10 Acknowledgements 1464 This work is based on the original CGI interface that arose out of 1465 discussions on the `www-talk' mailing list. In particular, Rob 1466 McCool, John Franks, Ari Luotonen, George Phillips and Tony Sanders 1467 deserve special recognition for their efforts in defining and 1468 implementing the early versions of this interface. 1470 This document has also greatly benefited from the comments and 1471 suggestions made Chris Adie, Dave Kristol and Mike Meyer; also David 1472 Morris, Jeremy Madea, Patrick McManus, Adam Donahue, Ross Patterson 1473 and Harald Alvestrand. 1475 11 References 1477 [1] Berners-Lee, T., `Universal Resource Identifiers in WWW: A 1478 Unifying Syntax for the Expression of Names and Addresses of 1479 Objects on the Network as used in the World-Wide Web', RFC 1630, 1480 CERN, June 1994. 1482 [2] Berners-Lee, T., Fielding, R. T. and Frystyk, H., `Hypertext 1483 Transfer Protocol -- HTTP/1.0', RFC 1945, MIT/LCS, UC Irvine, 1484 May 1996. 1486 [3] Berners-Lee, T., Fielding, R. and Masinter, L., `Uniform 1487 Resource Identifiers (URI) : Generic Syntax', RFC 2396, MIT/LC, 1488 U.C. Irvine, Xerox Corporation, August 1998. 1490 [4] Braden, R., Editor, `Requirements for Internet Hosts -- 1491 Application and Support', STD 3, RFC 1123, IETF, October 1989. 1493 [5] Bradner, S., `Key words for use in RFCs to Indicate Requirements 1494 Levels', BCP 14, RFC 2119, Harvard University, March 1997. 1496 [6] Crocker, D.H., `Standard for the Format of ARPA Internet Text 1497 Messages', STD 11, RFC 822, University of Delaware, August 1982. 1499 [7] Dierks, T. and Allen, C., `The TLS Protocol Version 1.0', RFC 1500 2246, Certicom, January 1999. 1502 [8] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., 1503 Leach, P. and Berners-Lee, T., `Hypertext Transfer Protocol -- 1504 HTTP/1.1', RFC 2616, UC Irving, Compaq/W3C, Compaq, W3C/MIT, 1505 Xerox, Microsoft, W3C/MIT, June 1999. 1507 [9] Franks, J., Hallam-Baker, P., Hostetler, J., Lawrence, S., 1508 Leach, P., Luotonen, A. and Stewart L. `HTTP Authentication: 1509 Basic and Digest Access Authentication', RFC 2617, Northwestern 1510 University, Verisign Inc., AbiSource, Inc., Agranat Systems, 1511 Inc., Microsoft Corporation, Netscape Communications 1512 Corporation, Open Market, Inc., June 1999. 1514 [10] Freed, N. and Borenstein N., `Multipurpose Internet Mail 1515 Extensions (MIME) Part Two: Media Types', RFC 2046, Innosoft, 1516 First Virtual, November 1996. 1518 [11] Hinden, R., Carpenter, B. and Masinter, L., `Format for Literal 1519 IPv6 Addresses in URL's', RFC 2732, Nokia, IBM, AT&T, December 1520 1999. 1522 [12] Hinden R. and Deering S., `IP Version 6 Addressing 1523 Architecture', RFC 2373, Nokia, Cisco Systems, July 1998. 1525 [13] Masinter, L., `Returning Values from Forms: multipart/form- 1526 data', RFC 2388, Xerox Corporation, August 1998. 1528 [14] Mockapetris, P., `Domain Names - Concepts and Facilities', STD 1529 13, RFC 1034, ISI, November 1987. 1531 [15] Raggett, Dave, Le Hors, Arnaud and Jacobs, Ian (eds) `HTML 4.01 1532 Specification', W3C Recommendation December 1999, 1533 http://www.w3.org/TR/html401/. 1535 [16] Rescola, E. `HTTP Over TLS', RFC 2818, RTFM, May 2000. 1537 [17] St. Johns, M., `Identification Protocol', RFC 1413, US 1538 Department of Defense, February 1993. 1540 [18] `HTTP Status Code Registry', 1541 http://www.iana.org/assignments/http-status-codes, IANA 1543 [19] IBM National Language Support Reference Manual Volume 2, 1544 SE09-8002-01, March 1990. 1546 [20] `Information Systems -- Coded Character Sets -- 7-bit American 1547 Standard Code for Information Interchange (7-Bit ASCII)', ANSI 1548 INCITS.4-1986 (R2002). 1550 [21] `Information technology -- 8-bit single-byte coded graphic 1551 character sets -- Part 1: Latin alphabet No. 1', ISO/IEC 1552 8859-1:1998. 1554 [22] `The Common Gateway Interface', 1555 http://hoohoo.ncsa.uiuc.edu/cgi/, NCSA, University of Illinois. 1557 12 Authors' Addresses 1559 David Robinson 1560 Apache Software Foundation 1561 Email: drtr@apache.org 1563 Ken A. L. Coar 1564 MeepZor Consulting 1565 7824 Mayfaire Crest Lane, Suite 202 1566 Raleigh, NC 27615-4875 1567 USA 1568 Tel: +1 (919) 254 4237 1569 Fax: +1 (919) 254 5420 1570 Email: Ken.Coar@Golux.com