idnits 2.17.1 draft-coar-cgi-v11-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The script MUST not provide any other header fields, except for server-defined CGI extension fields. For an HTTP client request, the server MUST generate a 302 'Found' HTTP response message. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (19 October 2003) is 7495 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '22' is defined on line 1597, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1630 (ref. '1') ** Downref: Normative reference to an Informational RFC: RFC 1945 (ref. '2') ** Obsolete normative reference: RFC 2396 (ref. '3') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 822 (ref. '6') (Obsoleted by RFC 2822) ** Obsolete normative reference: RFC 2246 (ref. '7') (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2616 (ref. '8') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 2617 (ref. '9') (Obsoleted by RFC 7235, RFC 7615, RFC 7616, RFC 7617) ** Obsolete normative reference: RFC 2732 (ref. '11') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2373 (ref. '12') (Obsoleted by RFC 3513) ** Obsolete normative reference: RFC 2388 (ref. '13') (Obsoleted by RFC 7578) -- Possible downref: Non-RFC (?) normative reference: ref. '15' ** Obsolete normative reference: RFC 2818 (ref. '16') (Obsoleted by RFC 9110) -- Possible downref: Non-RFC (?) normative reference: ref. '18' -- Possible downref: Non-RFC (?) normative reference: ref. '19' -- Possible downref: Non-RFC (?) normative reference: ref. '20' -- Possible downref: Non-RFC (?) normative reference: ref. '21' -- Possible downref: Non-RFC (?) normative reference: ref. '22' Summary: 15 errors (**), 0 flaws (~~), 4 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT David Robinson 3 draft-coar-cgi-v11-04.txt Apache Software Foundation 4 Expires 18 April 2004 Ken A.L. Coar 5 IBM Corporation 6 19 October 2003 8 The Common Gateway Interface (CGI) Version 1.1 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as 18 Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as 'work in progress'. 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Distribution of this document is unlimited. Please send comments to 32 the authors, or via the CGI-WG mailing list; see the project Web page 33 at . 35 Abstract 37 The Common Gateway Interface (CGI) is a simple interface for running 38 external programs, software or gateways under an information server 39 in a platform-independent manner. Currently, the supported 40 information servers are HTTP servers. 42 The interface has been in use by the World-Wide Web since 1993. This 43 specification defines the 'current practice' parameters of the 44 'CGI/1.1' interface developed and documented at the U.S. National 45 Centre for Supercomputing Applications. This document also defines 46 the use of the CGI/1.1 interface on UNIX(R) and other, similar 47 systems. 49 Contents 51 1 Introduction 4 52 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 1.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . 4 54 1.3 Specifications . . . . . . . . . . . . . . . . . . . . . . 4 55 1.4 Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 57 2 Notational Conventions and Generic Grammar 5 58 2.1 Augmented BNF . . . . . . . . . . . . . . . . . . . . . . 5 59 2.2 Basic Rules . . . . . . . . . . . . . . . . . . . . . . . 6 60 2.3 URL Encoding . . . . . . . . . . . . . . . . . . . . . . . 7 62 3 Invoking the Script 8 63 3.1 Server Responsibilities . . . . . . . . . . . . . . . . . 8 64 3.2 Script Selection . . . . . . . . . . . . . . . . . . . . . 8 65 3.3 The Script-URI . . . . . . . . . . . . . . . . . . . . . . 9 66 3.4 Execution . . . . . . . . . . . . . . . . . . . . . . . . 10 68 4 The CGI Request 10 69 4.1 Request Meta-Variables . . . . . . . . . . . . . . . . . . 10 70 4.1.1 AUTH_TYPE . . . . . . . . . . . . . . . . . . . . . 11 71 4.1.2 CONTENT_LENGTH . . . . . . . . . . . . . . . . . . 11 72 4.1.3 CONTENT_TYPE . . . . . . . . . . . . . . . . . . . 12 73 4.1.4 GATEWAY_INTERFACE . . . . . . . . . . . . . . . . . 13 74 4.1.5 PATH_INFO . . . . . . . . . . . . . . . . . . . . . 13 75 4.1.6 PATH_TRANSLATED . . . . . . . . . . . . . . . . . . 14 76 4.1.7 QUERY_STRING . . . . . . . . . . . . . . . . . . . 15 77 4.1.8 REMOTE_ADDR . . . . . . . . . . . . . . . . . . . . 15 78 4.1.9 REMOTE_HOST . . . . . . . . . . . . . . . . . . . . 16 79 4.1.10 REMOTE_IDENT . . . . . . . . . . . . . . . . . . . 16 80 4.1.11 REMOTE_USER . . . . . . . . . . . . . . . . . . . . 16 81 4.1.12 REQUEST_METHOD . . . . . . . . . . . . . . . . . . 16 82 4.1.13 SCRIPT_NAME . . . . . . . . . . . . . . . . . . . . 17 83 4.1.14 SERVER_NAME . . . . . . . . . . . . . . . . . . . . 17 84 4.1.15 SERVER_PORT . . . . . . . . . . . . . . . . . . . . 17 85 4.1.16 SERVER_PROTOCOL . . . . . . . . . . . . . . . . . . 18 86 4.1.17 SERVER_SOFTWARE . . . . . . . . . . . . . . . . . . 18 87 4.1.18 Protocol-Specific Meta-Variables . . . . . . . . . 18 88 4.2 Request Message-Body . . . . . . . . . . . . . . . . . . . 19 89 4.3 Request Methods . . . . . . . . . . . . . . . . . . . . . 20 90 4.3.1 GET . . . . . . . . . . . . . . . . . . . . . . . . 20 91 4.3.2 POST . . . . . . . . . . . . . . . . . . . . . . . 20 92 4.3.3 HEAD . . . . . . . . . . . . . . . . . . . . . . . 20 93 4.3.4 Protocol-Specific Methods . . . . . . . . . . . . . 20 94 4.4 The Script Command Line . . . . . . . . . . . . . . . . . 21 96 5 NPH Scripts 21 97 5.1 Identification . . . . . . . . . . . . . . . . . . . . . . 21 98 5.2 NPH Response . . . . . . . . . . . . . . . . . . . . . . . 22 100 6 CGI Response 22 101 6.1 Response Handling . . . . . . . . . . . . . . . . . . . . 22 102 6.2 Response Types . . . . . . . . . . . . . . . . . . . . . . 22 103 6.2.1 Document Response . . . . . . . . . . . . . . . . . 23 104 6.2.2 Local Redirect Response . . . . . . . . . . . . . . 23 105 6.2.3 Client Redirect Response . . . . . . . . . . . . . 23 106 6.2.4 Client Redirect Response with Document . . . . . . 24 107 6.3 Response Header Fields . . . . . . . . . . . . . . . . . . 24 108 6.3.1 Content-Type . . . . . . . . . . . . . . . . . . . 24 109 6.3.2 Location . . . . . . . . . . . . . . . . . . . . . 25 110 6.3.3 Status . . . . . . . . . . . . . . . . . . . . . . 26 111 6.3.4 Protocol-Specific Header Fields . . . . . . . . . . 26 112 6.3.5 Extension Header Fields . . . . . . . . . . . . . . 27 113 6.4 Response Message-Body . . . . . . . . . . . . . . . . . . 27 115 7 System Specifications 27 116 7.1 AmigaDOS . . . . . . . . . . . . . . . . . . . . . . . . . 27 117 7.2 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 118 7.3 EBCDIC/POSIX . . . . . . . . . . . . . . . . . . . . . . . 28 120 8 Implementation 29 121 8.1 Recommendations for Servers . . . . . . . . . . . . . . . 29 122 8.2 Recommendations for Scripts . . . . . . . . . . . . . . . 29 124 9 Security Considerations 30 125 9.1 Safe Methods . . . . . . . . . . . . . . . . . . . . . . . 30 126 9.2 Header Fields Containing Sensitive Information . . . . . . 30 127 9.3 Data Privacy . . . . . . . . . . . . . . . . . . . . . . . 30 128 9.4 Information Security Model . . . . . . . . . . . . . . . . 30 129 9.5 Script Interference with the Server . . . . . . . . . . . 30 130 9.6 Data Length and Buffering Considerations . . . . . . . . . 31 131 9.7 Stateless Processing . . . . . . . . . . . . . . . . . . . 31 132 9.8 Relative Paths . . . . . . . . . . . . . . . . . . . . . . 32 133 9.9 Non-parsed Header Output . . . . . . . . . . . . . . . . . 32 135 10 Acknowledgements 32 137 11 References 32 139 12 Authors' Addresses 34 141 1 Introduction 143 1.1 Purpose 145 The Common Gateway Interface (CGI) [21] allows an HTTP [2], [8] 146 server and a CGI script to share responsibility for responding to 147 client requests. The client request comprises a Universal Resource 148 Identifier (URI) [1], a request method and various ancillary 149 information about the request provided by the transport protocol. 151 The CGI defines the abstract parameters, known as meta-variables, 152 which describe the client's request. Together with a concrete 153 programmer interface this specifies a platform-independent interface 154 between the script and the HTTP server. 156 The server is responsible for managing connection, data transfer, 157 transport and network issues related to the client request, whereas 158 the CGI script handles the application issues, such as data access 159 and document processing. 161 1.2 Requirements 163 The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', 164 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY' and 'OPTIONAL' in this 165 document are to be interpreted as described in RFC 2119 [5]. 167 An implementation is not compliant if it fails to satisfy one or more 168 of the 'must' requirements for the protocols it implements. An 169 implementation that satisfies all of the 'must' and all of the 170 'should' requirements for its features is said to be 'unconditionally 171 compliant'; one that satisfies all of the 'must' requirements but not 172 all of the 'should' requirements for its features is said to be 173 'conditionally compliant'. 175 1.3 Specifications 177 Not all of the functions and features of the CGI are defined in the 178 main part of this specification. The following phrases are used to 179 describe the features that are not specified: 181 'system defined' 182 The feature may differ between systems, but must be the same for 183 different implementations using the same system. A system will 184 usually identify a class of operating-systems. Some systems are 185 defined in section 7 of this document. New systems may be defined 186 by new specifications without revision of this document. 188 'implementation defined' 189 The behaviour of the feature may vary from implementation to 190 implementation; a particular implementation must document its 191 behaviour. 193 1.4 Terminology 195 This specification uses many terms defined in the HTTP/1.1 196 specification [8]; however, the following terms are used here in a 197 sense which may not accord with their definitions in that document, 198 or with their common meaning. 200 'meta-variable' 201 A named parameter which carries information from the server to the 202 script. It is not necessarily a variable in the operating- 203 system's environment, although that is the most common 204 implementation. 206 'script' 207 The software that is invoked by the server according to this 208 interface. It need not be a standalone program, but could be a 209 dynamically-loaded or shared library, or even a subroutine in the 210 server. It might be a set of statements interpreted at run-time, 211 as the term 'script' is frequently understood, but that is not a 212 requirement and within the context of this specification the term 213 has the broader definition stated. 215 'server' 216 The application program that invokes the script in order to 217 service requests from the client. 219 2 Notational Conventions and Generic Grammar 221 2.1 Augmented BNF 223 All of the mechanisms specified in this document are described in 224 both prose and an augmented Backus-Naur Form (BNF) similar to that 225 used by RFC 822 [6]. Unless stated otherwise, the elements are 226 case-sensitive. This augmented BNF contains the following 227 constructs: 229 name = definition 230 The name of a rule and its definition are separated by the equals 231 character ('='). Whitespace is only significant in that 232 continuation lines of a definition are indented. 234 "literal" 235 Double quotation marks (") surround literal text, except for a 236 literal quotation mark, which is surrounded by angle-brackets ('<' 237 and '>'). 239 rule1 | rule2 240 Alternative rules are separated by a vertical bar ('|'). 242 (rule1 rule2 rule3) 243 Elements enclosed in parentheses are treated as a single element. 245 *rule 246 A rule preceded by an asterisk ('*') may have zero or more 247 occurrences. The full form is 'n*m rule' indicating at least n 248 and at most m occurrences of the rule. n and m are optional 249 decimal values with default values of 0 and infinity respectively. 251 [rule] 252 An element enclosed in square brackets ('[' and ']') is optional, 253 and is equivalent to '*1 rule'. 255 N rule 256 A rule preceded by a decimal number represents exactly N 257 occurrences of the rule. It is equivalent to 'N*N rule'. 259 2.2 Basic Rules 261 This specification uses a BNF-like grammar defined in terms of 262 characters. Unlike many specifications which define the bytes 263 allowed by a protocol, here each literal in the grammar corresponds 264 to the character it represents. How these characters are represented 265 in terms of bits and bytes within a a system are either 266 system-defined or specified in the particular context. The single 267 exception is the rule 'OCTET', defined below. 269 The following rules are used throughout this specification to 270 describe basic parsing constructs. 272 alpha = lowalpha | hialpha 273 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | 274 "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | 275 "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | 276 "y" | "z" 277 hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | 278 "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | 279 "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | 280 "Y" | "Z" 281 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | 282 "8" | "9" 283 alphanum = alpha | digit 284 OCTET = 285 CHAR = alpha | digit | separator | "!" | "#" | "$" | 286 "%" | "&" | "'" | "*" | "+" | "-" | "." | "`" | 287 "^" | "_" | "{" | "|" | "}" | "~" | CTL 288 CTL = 289 SP = 290 HT = 291 NL = 292 LWSP = SP | HT | NL 293 separator = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | 294 "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | 295 "}" | SP | HT 296 token = 1* 297 quoted-string = <"> *qdtext <"> 298 qdtext = and CTLs but including LWSP> 299 TEXT = 301 Note that newline (NL) need not be a single control character, but 302 can be a sequence of control characters. A system MAY define TEXT to 303 be a larger set of characters than . 306 2.3 URL Encoding 308 Some variables and constructs used here are described as being 309 'URL-encoded'. This encoding is described in section 2 of RFC 2396 310 [3]. In a URL-encoded string an escape sequence consists of a 311 percent character ("%") followed by two hexadecimal digits, where the 312 two hexadecimal digits form an octet. An escape sequence represents 313 the graphic character that has the octet as its code within the 314 US-ASCII [20] coded character set, if it exists. Currently there is 315 no provision within the URI syntax to identify which character set 316 non-ASCII codes represent, so CGI handles this issue on an ad-hoc 317 basis. 319 Note that some unsafe (reserved) characters may have different 320 semantics when encoded. The definition of which characters are 321 unsafe depends on the context; see section 2 of RFC 2396 [3], updated 322 by RFC 2732 [11], for an authoritative treatment. These reserved 323 characters are generally used to provide syntactic structure to the 324 character string, for example as field separators. In all cases, the 325 string is first processed with regard to any reserved characters 326 present, and then the resulting data can be URL-decoded by replacing 327 "%" escapes by their character values. 329 To encode a character string, all reserved and forbidden characters 330 are replaced by the corresponding "%" escapes. The string can then 331 be used in assembling a URI. The reserved characters will vary from 332 context to context, but will always be drawn from this set: 334 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | 335 "," | "[" | "]" 337 The last two characters were added by RFC 2732 [11]. In any 338 particular context, a sub-set of these characters will be reserved; 339 the other characters from this set MUST NOT be encoded when a string 340 is URL-encoded in that context. Other basic rules used to describe 341 URI syntax are: 343 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" 344 | "c" | "d" | "e" | "f" 345 escaped = "%" hex hex 346 unreserved = alpha | digit | mark 347 mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" 349 3 Invoking the Script 351 3.1 Server Responsibilities 353 The server acts as an application gateway. It receives the request 354 from the client, selects a CGI script to handle the request, converts 355 the client request to a CGI request, executes the script and converts 356 the CGI response into a response for the client. When processing the 357 client request, it is responsible for implementing any protocol or 358 transport level authentication and security. The server MAY also 359 function in a 'non-transparent' manner, modifying the request or 360 response in order to provide some additional service, such as media 361 type transformation or protocol reduction. 363 The server MUST perform translations and protocol conversions on the 364 client request data required by this specification. Furthermore, the 365 server retains its responsibility to the client to conform to the 366 relevant network protocol even if the CGI script fails to conform to 367 this specification. 369 If the server is applying authentication to the request, then it MUST 370 NOT execute the script unless the request passes all defined access 371 controls. 373 3.2 Script Selection 375 The server determines which CGI is script to be executed based on a 376 generic-form URI supplied by the client. This URI includes a 377 hierarchical path with components separated by "/". For any 378 particular request, the server will identify all or a leading part of 379 this path with an individual script, thus placing the script at a 380 particular point in the path hierarchy. The remainder of the path, 381 if any, is a resource or sub-resource identifier to be interpreted by 382 the script. 384 Information about this split of the path is available to the script 385 in the meta-variables, described below. Support for non-hierarchical 386 URI schemes is outside the scope of this specification. 388 3.3 The Script-URI 390 The mapping from client request URI to choice of script is defined by 391 the particular server implementation and its configuration. The 392 server may allow the script to be identified with a set of several 393 different URI path hierarchies, and therefore is permitted to replace 394 the URI by other members of this set during processing and generation 395 of the meta-variables. The server 397 1. MAY preserve the URI in the particular client request; or 399 2. MAY select a canonical URI from the set of possible values for 400 each script; or 402 3. can implement any other selection of URI from the set. 404 From the meta-variables thus generated, a URI, the 'Script-URI', can 405 be constructed. This MUST have the property that if the client had 406 accessed this URI instead, then the script would have been executed 407 with the same values for the SCRIPT_NAME, PATH_INFO and QUERY_STRING 408 meta-variables. The Script-URI has the structure of a generic URI as 409 defined in section 3 of RFC 2396 [3], with the exception that object 410 parameters and fragment identifiers are not permitted. The various 411 components of the Script-URI are defined by some of the 412 meta-variables (see below); 414 script-URI = "://" ":" 415 "?" 417 where is found from SERVER_PROTOCOL, , 418 and are the values of the respective 419 meta-variables. The SCRIPT_NAME and PATH_INFO values, URL-encoded 420 with ";", "=" and "?" reserved, give and . 421 See section 4.1.5 for more information about the PATH_INFO 422 meta-variable. 424 The scheme and the protocol are not identical as the scheme 425 identifies the access method in addition to the protocol. For 426 example, a resource accessed using Transport Layer Security (TLS) [7] 427 would have a request URI with a scheme of https when using the HTTP 428 protocol [16]. CGI/1.1 provides no generic means for the script to 429 reconstruct this, and therefore the Script-URI as defined includes 430 the base protocol used. However, a script MAY make use of 431 scheme-specific meta-variables to better deduce the URI scheme. 433 Note that this definition also allows URIs to be constructed which 434 would invoke the script with any permitted values for the path-info 435 or query-string, by modifying the appropriate components. 437 3.4 Execution 439 The script is invoked in a system defined manner. Unless specified 440 otherwise, the file containing the script will be invoked as an 441 executable program. The server prepares the CGI request as described 442 in section 4; this comprises the request meta-variables (immediately 443 available to the script on execution) and request message data. The 444 request data need not be immediately available to the script; the 445 script can be executed before all this data has been received by the 446 server from the client. The response from the script is returned to 447 the server as described in sections 5 and 6. 449 In the event of an error condition, the server can interrupt or 450 terminate script execution at any time and without warning. That 451 could occur, for example, in the event of a transport failure between 452 the server and the client; so the script SHOULD be prepared to handle 453 abnormal termination. 455 4 The CGI Request 457 Information about a request comes from two different sources; the 458 request meta-variables and any associated message-body. 460 4.1 Request Meta-Variables 462 Meta-variables contain data about the request passed from the server 463 to the script, and are accessed by the script in a system defined 464 manner. Meta-variables are identified by case-insensitive names; 465 there cannot be two different variables whose names differ in case 466 only. Here they are shown using a canonical representation of 467 capitals plus underscore ("_"). A particular system can define a 468 different representation. 470 meta-variable-name = "AUTH_TYPE" | "CONTENT_LENGTH" | 471 "CONTENT_TYPE" | "GATEWAY_INTERFACE" | 472 "PATH_INFO" | "PATH_TRANSLATED" | 473 "QUERY_STRING" | "REMOTE_ADDR" | 474 "REMOTE_HOST" | "REMOTE_IDENT" | 475 "REMOTE_USER" | "REQUEST_METHOD" | 476 "SCRIPT_NAME" | "SERVER_NAME" | 477 "SERVER_PORT" | "SERVER_PROTOCOL" | 478 "SERVER_SOFTWARE" | scheme | 479 protocol-var-name | extension-var-name 480 protocol-var-name = ( protocol | scheme ) "_" var-name 481 scheme = alpha *( alpha | digit | "+" | "-" | "." ) 482 var-name = token 483 extension-var-name = token 485 Meta-variables with the same name as a scheme, and names beginning 486 with the name of a protocol or scheme (e.g. HTTP_ACCEPT) are also be 487 specified. The number and meaning of these variables may change 488 independently of this specification. (See also section 4.1.18.) 490 The server MAY define additional implementation-specific extension 491 meta-variables, whose names SHOULD be prefixed with "X_". 493 This specification does not distinguish between zero-length (NULL) 494 values and missing values. For example, a script cannot distinguish 495 between the two requests http://host/script and http://host/script? 496 as in both cases the QUERY_STRING meta-variable would be NULL. 498 meta-variable-value = "" | 1* 500 An optional meta-variable may be omitted (left unset) if its value is 501 NULL. Meta-variable values MUST be considered case-sensitive except 502 as noted otherwise. The representation of the characters in the 503 meta-variables is system defined; the server MUST convert values to 504 that representation. 506 4.1.1 AUTH_TYPE 508 The AUTH_TYPE variable identifies any mechanism used by the server to 509 authenticate the user. It contains a case-insensitive value defined 510 by the client protocol or server implementation. 512 For HTTP, If the client request required authentication for external 513 access, then the server MUST set the value of this variable from the 514 'auth-scheme' token in the request Authorization header field. 516 AUTH_TYPE = "" | auth-scheme 517 auth-scheme = "Basic" | "Digest" | extension-auth 518 extension-auth = token 520 HTTP access authentication schemes are described in RFC 2617 [9]. 522 4.1.2 CONTENT_LENGTH 524 The CONTENT_LENGTH variable contains the size of the message-body 525 attached to the request, if any, in decimal number of octets. If no 526 data is attached, then NULL (or unset). 528 CONTENT_LENGTH = "" | 1*digit 530 The server MUST set this meta-variable if and only if the request is 531 accompanied by a message-body entity. The CONTENT_LENGTH value must 532 reflect the length of the message-body after the server has removed 533 any transfer-codings or content-codings. 535 4.1.3 CONTENT_TYPE 537 If the request includes a message-body, the CONTENT_TYPE variable is 538 set to the Internet Media Type [10] of the message-body. 540 CONTENT_TYPE = "" | media-type 541 media-type = type "/" subtype *( ";" parameter ) 542 type = token 543 subtype = token 544 parameter = attribute "=" value 545 attribute = token 546 value = token | quoted-string 548 The type, subtype and parameter attribute names are not case- 549 sensitive. Parameter values may be case sensitive. Media types and 550 their use in HTTP are described section 3.7 of the HTTP/1.1 551 specification [8]. 553 There is no default value for this variable. If and only if it is 554 unset, then the script MAY attempt to determine the media type from 555 the data received. If the type remains unknown, then the script MAY 556 choose to assume a type of application/octet-stream or it may reject 557 the request with an error (as described in section 6.3.3). 559 Each media-type defines a set of optional and mandatory parameters. 560 This may include a charset parameter with a case-insensitive value 561 defining the coded character set for the message-body. If the 562 charset parameter is omitted, then the default value should be 563 derived according to whichever of the following rules is the first to 564 apply: 566 1. There MAY be a system-defined default charset for some 567 media-types. 569 2. The default for media-types of type "text" is ISO-8859-1 [8]. 571 3. Any default defined in the media-type specification. 573 4. The default is US-ASCII. 575 The server MUST set this meta-variable if an HTTP Content-Type field 576 is present in the client request header. If the server receives a 577 request with an attached entity but no Content-Type header field, it 578 MAY attempt to determine the correct content type, otherwise it 579 should omit this meta-variable. 581 4.1.4 GATEWAY_INTERFACE 583 The GATEWAY_INTERFACE variable MUST be set to the dialect of CGI 584 being used by the server to communicate with the script. Syntax: 586 GATEWAY_INTERFACE = "CGI" "/" 1*digit "." 1*digit 588 Note that the major and minor numbers are treated as separate 589 integers and hence each may be incremented higher than a single 590 digit. Thus CGI/2.4 is a lower version than CGI/2.13 which in turn 591 is lower than CGI/12.3. Leading zeros MUST be ignored by the script 592 and MUST NOT be generated by the server. 594 This document defines the 1.1 version of the CGI interface. 596 4.1.5 PATH_INFO 598 The PATH_INFO variable specifies a path to be interpreted by the CGI 599 script. It identifies the resource or sub-resource to be returned by 600 the CGI script, and is derived from the the portion of the URI path 601 hierarchy following the part that identifies the script itself. 602 Unlike a URI path, the PATH_INFO is not URL-encoded, and cannot 603 contain path-segment parameters. A PATH_INFO of "/" represents a 604 single void path segment. 606 PATH_INFO = "" | ( "/" path ) 607 path = lsegment *( "/" lsegment ) 608 lsegment = *lchar 609 lchar = 611 The value is considered case-sensitive and the server MUST preserve 612 the case of the path as presented in the request URI. The server MAY 613 impose restrictions and limitations on what values it permits for 614 PATH_INFO, and MAY reject the request with an error if it encounters 615 any values considered objectionable. That MAY include any requests 616 that would result in an encoded "/" being decoded into PATH_INFO, as 617 this might represent a loss of information to the script. Similarly, 618 treatment of non US-ASCII characters in the path is system defined. 620 URL-encoded, the PATH_INFO string forms the extra-path component of 621 the Script-URI (see section 3.3) which follows the SCRIPT_NAME part 622 of that path. 624 4.1.6 PATH_TRANSLATED 626 The PATH_TRANSLATED variable is derived by taking the PATH_INFO 627 value, parsing it as a local URI in its own right, and performing any 628 virtual-to-physical translation appropriate to map it onto the 629 server's document repository structure. The set of characters 630 permitted in the result is system defined. 632 PATH_TRANSLATED = * 634 This is the file location that would be accessed by a request for 636 "://" ":" 638 where is the scheme for the original client request and 639 is a URL-encoded version of PATH_INFO, with ";", "=" and 640 "?" reserved. For example, a request such as the following: 642 http://somehost.com/cgi-bin/somescript/this%2eis%2epath%3binfo 644 would result in a PATH_INFO value of 646 /this.is.the.path;info 648 An internal URI is constructed from the scheme, server location and 649 the URL-encoded PATH_INFO: 651 http://somehost.com/this.is.the.path%3binfo 653 This would then be translated to a location in the server's document 654 repository, perhaps a filesystem path something like this: 656 /usr/local/www/htdocs/this.is.the.path;info 658 The result of the translation is the value of PATH_TRANSLATED. 660 The value of PATH_TRANSLATED is derived in this way irrespective of 661 whether it maps to a valid repository location. The server MUST 662 preserve the case of the extra-path segment unless the underlying 663 repository supports case-insensitive names. If the repository is 664 only case-aware, case-preserving, or case-blind with regard to 665 document names, the server is not required to preserve the case of 666 the original segment through the translation. 668 The translation algorithm the server uses to derive PATH_TRANSLATED 669 is implementation defined; CGI scripts which use this variable may 670 suffer limited portability. 672 The server SHOULD set this meta-variable if the request URI includes 673 a path-info component. If PATH_INFO is NULL, then the 674 PATH_TRANSLATED variable MUST be set to NULL (or unset). 676 4.1.7 QUERY_STRING 678 The QUERY_STRING variable contains a URL-encoded search or parameter 679 string; it provides information to the CGI script to affect or refine 680 the document to be returned by the script. 682 The URL syntax for a search string is described in section 3 of RFC 683 2396 [3]. The QUERY_STRING value is case-sensitive. 685 QUERY_STRING = query-string 686 query-string = *uric 687 uric = reserved | unreserved | escaped 689 When parsing and decoding the query string, the details of the 690 parsing, reserved characters and support for non US-ASCII characters 691 depends on the context. For example, form submission from an HTML 692 document [15] uses application/x-www-form-urlencoded encoding, in 693 which the characters "+", "&" and "=" are reserved, and the ISO 694 8859-1 encoding may be used for non US-ASCII characters. 696 The QUERY_STRING value provides the query-string part of the 697 Script-URI. (See section 3.3). 699 The server MUST set this variable; if the Script-URI does not include 700 a query component, the QUERY_STRING MUST be defined as an empty 701 string (""). 703 4.1.8 REMOTE_ADDR 705 The REMOTE_ADDR variable MUST be set to the network address of the 706 client sending the request to the server. 708 REMOTE_ADDR = hostnumber 709 hostnumber = ipv4-address | ipv6-address 710 ipv4-address = 1*3digit "." 1*3digit "." 1*3digit "." 1*3digit 711 ipv6-address = hexpart [ ":" ipv4-address ] 712 hexpart = hexseq | ( [ hexseq ] "::" [ hexseq ] ) 713 hexseq = 1*4hex *( ":" 1*4hex ) 715 The format of IPv6 addresses is defined in RFC 2373 [12]. 717 4.1.9 REMOTE_HOST 719 The REMOTE_HOST variable contains the fully qualified domain name of 720 the client sending the request to the server, if available, otherwise 721 NULL. Fully qualified domain names take the form as described in 722 section 3.5 of RFC 1034 [14] and section 2.1 of RFC 1123 [4]. Domain 723 names are not case sensitive. 725 REMOTE_HOST = "" | hostname | hostnumber 726 hostname = *( domainlabel "." ) toplabel [ "." ] 727 domainlabel = alphanum [ *alphahypdigit alphanum ] 728 toplabel = alpha [ *alphahypdigit alphanum ] 729 alphahypdigit = alphanum | "-" 731 The server SHOULD set this variable. If the hostname is not 732 available for performance reasons or otherwise, the server MAY 733 substitute the REMOTE_ADDR value. 735 4.1.10 REMOTE_IDENT 737 The REMOTE_IDENT variable MAY be used to provide identity information 738 reported about the connection by an RFC 1413 [17] request to the 739 remote agent, if available. The server may choose not to support 740 this feature, or not to request the data for efficiency reasons, or 741 not to return available identity data. 743 REMOTE_IDENT = *TEXT 745 The data returned may be used for authentication purposes, but the 746 level of trust reposed in it should be minimal. 748 4.1.11 REMOTE_USER 750 The REMOTE_USER variable provides a user identification string 751 supplied by client as part of user authentication. 753 REMOTE_USER = *TEXT 755 If the client request required HTTP Authentication [9] (e.g. the 756 AUTH_TYPE meta-variable is set to "Basic" or "Digest"), then the 757 value of the REMOTE_USER meta-variable MUST be set to the user-ID 758 supplied. 760 4.1.12 REQUEST_METHOD 762 The REQUEST_METHOD meta-variable MUST be set to the method which 763 should be used by the script to process the request, as described in 764 section 4.3. 766 REQUEST_METHOD = method 767 method = "GET" | "POST" | "HEAD" | extension-method 768 extension-method = "PUT" | "DELETE" | token 770 The method is case sensitive. The HTTP methods are described in 771 section 5.1.1 of the HTTP/1.0 specification [2] and section 5.1.1 of 772 the HTTP/1.1 specification [8]. 774 4.1.13 SCRIPT_NAME 776 The SCRIPT_NAME variable MUST be set to a URI path (not URL-encoded) 777 which could identify the CGI script (rather then the script's 778 output). The syntax is the same as for PATH_INFO (section 4.1.5) 780 SCRIPT_NAME = "" | ( "/" path ) 782 The leading "/" is not part of the path. It is optional if the path 783 is NULL; however, the variable MUST still be set in that case. 785 The SCRIPT_NAME string forms some leading part of the path component 786 of the Script-URI derived in some implementation defined manner. No 787 PATH_INFO segment (see section 4.1.5) is included in the SCRIPT_NAME 788 value. 790 4.1.14 SERVER_NAME 792 The SERVER_NAME variable MUST be set to the name of the server host 793 to which the client request is directed. It is a case-insensitive 794 hostname or network address. It forms the host part of the 795 Script-URI. The syntax for an IPv6 address in a URI is defined in 796 RFC 2373 [12]. 798 SERVER_NAME = server-name 799 server-name = hostname | ipv4-address | ( "[" ipv6-address "]" ) 801 A deployed server can have more than one possible value for this 802 variable, where several HTTP virtual hosts share the same IP address. 803 In that case, the server uses the contents of the Host header field 804 to select the correct virtual host. 806 4.1.15 SERVER_PORT 808 The SERVER_PORT variable MUST be set to the TCP/IP port number on 809 which this request is received from the client. This value is used 810 in the port part of the Script-URI. 812 SERVER_PORT = server-port 813 server-port = 1*digit 815 Note that this variable MUST be set, even if the port is the default 816 port for the scheme and could otherwise be omitted from a URI. 818 4.1.16 SERVER_PROTOCOL 820 The SERVER_PROTOCOL variable MUST be set to the name and version of 821 the application protocol used for this CGI request. This is not 822 necessarily the same as the protocol version used by the server in 823 its communication with the client. 825 SERVER_PROTOCOL = HTTP-Version | "INCLUDED" | extension-version 826 HTTP-Version = "HTTP" "/" 1*digit "." 1*digit 827 extension-version = protocol [ "/" 1*digit "." 1*digit ] 828 protocol = token 830 'protocol' is a version of the scheme part of the Script-URI, and is 831 not case sensitive. By convention, 'protocol' is in upper case. The 832 protocol may not be identical to the scheme of the request; for 833 example, the request may have scheme "https", whilst the protocol is 834 "HTTP". 836 A well-known value for SERVER_PROTOCOL which the server MAY use is 837 "INCLUDED", which signals that the current document is being included 838 as part of a composite document, rather than being the direct target 839 of the client request. The script should treat this as an HTTP/1.0 840 request. 842 4.1.17 SERVER_SOFTWARE 844 The SERVER_SOFTWARE meta-variable MUST be set to the name and version 845 of the information server software making the CGI request (and 846 running the gateway). It SHOULD be the same as the server 847 description reported to the client, if any. 849 SERVER_SOFTWARE = 1*( product | comment ) 850 product = token [ "/" product-version ] 851 product-version = token 852 comment = "(" *( ctext | comment ) ")" 853 ctext = 855 4.1.18 Protocol-Specific Meta-Variables 857 The server SHOULD set meta-variables specific to the protocol and 858 scheme for the request. Interpretation of protocol-specific 859 variables depends on the protocol version in SERVER_PROTOCOL. The 860 server MAY set a meta-variable with the name of the scheme to a 861 non-NULL value if the scheme is not the same as the protocol. The 862 presence of such a variable indicates to a script which scheme is 863 used by the request. 865 Meta-variables with names beginning with "HTTP_" contain values read 866 from the client request header fields, if the protocol used is HTTP. 867 The HTTP header field name is converted to upper case, has all 868 occurrences of "-" replaced with "_" and has "HTTP_" prepended to 869 give the meta-variable name. The header data can be presented as 870 sent by the client, or can be rewritten in ways which do not change 871 its semantics. If multiple header fields with the same field-name 872 are received then the server MUST rewrite them as a single value 873 having the same semantics. Similarly, a header field that spans 874 multiple lines must be merged onto a single line. The server MUST, 875 if necessary, change the representation of the data (for example, the 876 character set) to be appropriate for a CGI meta-variable. 878 The server is not required to create meta-variables for all the 879 header fields that it receives. In particular, it SHOULD remove any 880 header fields carrying authentication information, such as 881 'Authorization'; or that are available to the script in other 882 variables, such as 'Content-Length' and 'Content-Type'. The server 883 MAY remove header fields that relate solely to client-side 884 communication issues, such as 'Connection'. 886 4.2 Request Message-Body 888 Request data is accessed by the script in a system-defined method; 889 unless defined otherwise, this will be by reading the 'standard 890 input' file descriptor or file handle. 892 Request-Data = [ request-body ] [ extension-data ] 893 request-body = OCTET 894 extension-data = *OCTET 896 A request-body is supplied with the request if the CONTENT_LENGTH is 897 not NULL. The server MUST make at least that many bytes available 898 for the script to read. The server MAY signal an end-of-file 899 condition after CONTENT_LENGTH bytes have been read or it MAY supply 900 extension data. Therefore, the script MUST NOT attempt to read more 901 than CONTENT_LENGTH bytes, even if more data is available. However, 902 it is not obliged to read any of the data. 904 For non-parsed header (NPH) scripts (section 5), the server SHOULD 905 attempt to ensure that the data supplied to the script is precisely 906 as supplied by the client and is unaltered by the server. 908 As transfer-codings are not supported on the request-body, the server 909 MUST remove any such codings from the message-body, and recalculate 910 the CONTENT_LENGTH. If this is not possible (for example, because of 911 large buffering requirements), the server SHOULD reject the client 912 request. It MAY also remove content-codings from the message-body. 914 4.3 Request Methods 916 The Request Method, as supplied in the REQUEST_METHOD meta-variable, 917 identifies the processing method to be applied by the script in 918 producing a response. The script author can choose to implement the 919 methods most appropriate for the particular application. If the 920 script receives a request with a method it does not support it SHOULD 921 reject it with an error (see section 6.3.3). 923 4.3.1 GET 925 The GET method method indicates that the script should produce a 926 document based on the meta-variable values. By convention, the GET 927 method is 'safe' and 'idempotent' and SHOULD NOT have the the 928 significance of taking an action other than producing a document. 930 The meaning of the GET method may be modified and refined by 931 protocol-specific meta-variables. 933 4.3.2 POST 935 The POST method is used to request the script perform processing and 936 produce a document based on the data in the request message-body, in 937 addition to meta-variable values. A common use is form submission in 938 HTML [15], intended to initiate processing by the script that has a 939 permanent affect, such a change in a database. 941 The script MUST check the value of the CONTENT_LENGTH variable before 942 reading the attached message-body, and SHOULD check the CONTENT_TYPE 943 value before processing it. 945 4.3.3 HEAD 947 The HEAD method requests the script to do sufficient processing to 948 return the response header fields, without providing a response 949 message-body. The script MUST NOT provide a response message-body 950 for a HEAD request. If it does, then the server MUST discard the 951 message-body when reading the response. 953 4.3.4 Protocol-Specific Methods 955 The script MAY implement any protocol-specific method, such as 956 HTTP/1.1 PUT and DELETE; it SHOULD check the value of SERVER_PROTOCOL 957 when doing so. 959 The server MAY decide that some methods are not appropriate or 960 permitted for a script, and may handle the methods itself or return 961 an error to the client. 963 4.4 The Script Command Line 965 Some systems support a method for supplying an array of strings to 966 the CGI script. This is only used in the case of an 'indexed' HTTP 967 query, which is identified by a 'GET' or 'HEAD' request with a URI 968 query string that does not contain any unencoded "=" characters. For 969 such a request, the server SHOULD treat the query-string as a 970 search-string and parse it into words, using the rules 972 search-string = search-word *( "+" search-word ) 973 search-word = 1*schar 974 schar = unreserved | escaped | xreserved 975 xreserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "," | 976 "$" 978 After parsing, each search-word is URL-decoded, optionally encoded in 979 a system defined manner and then added to the argument list. 981 If the server cannot create any part of the argument list, then the 982 server MUST NOT generate any command line information. For example, 983 the number of arguments may be greater than operating system or 984 server limits, or one of the words may not be representable as an 985 argument. 987 The script SHOULD check to see if the QUERY_STRING value contains an 988 unencoded "=" character, and SHOULD NOT use the command line 989 arguments if it does. 991 5 NPH Scripts 993 5.1 Identification 995 The server MAY support NPH (Non-Parsed Header) scripts; these are 996 scripts to which the server passes all responsibility for response 997 processing. 999 This specification provides no mechanism for an NPH script to be 1000 identified on the basis of its output data alone. By convention, 1001 therefore, any particular script can only ever provide output of one 1002 type (NPH or CGI) and hence the script itself is described as an 'NPH 1003 script'. A server with NPH support MUST provide an implementation- 1004 defined mechanism for identifying NPH scripts, perhaps based on the 1005 name or location of the script. 1007 5.2 NPH Response 1009 There MUST be a system defined method for the script to send data 1010 back to the server or client; a script MUST always return some data. 1011 Unless defined otherwise, this will be the same as for conventional 1012 CGI scripts. 1014 Currently, NPH scripts are only defined for HTTP client requests. An 1015 (HTTP) NPH script MUST return a complete HTTP response message, 1016 currently described in section 6 of the HTTP specifications [2], [8]. 1017 The script MUST use the SERVER_PROTOCOL variable to determine the 1018 appropriate format for a response. It MUST also take account of any 1019 generic or protocol-specific meta-variables in the request as might 1020 be mandated by the particular protocol specification. 1022 The server MUST ensure that the script output is sent to the client 1023 unmodified. Note that this requires the script to use the correct 1024 character set (US-ASCII [20] and ISO 8859-1 [21] for HTTP) in the 1025 header fields. The server SHOULD attempt to ensure that the script 1026 output is sent directly to the client, with minimal internal and no 1027 transport-visible buffering. 1029 Unless the implementation defines otherwise, the script MUST NOT 1030 indicate in its response that the client can send further requests 1031 over the same connection. 1033 6 CGI Response 1035 6.1 Response Handling 1037 A script MUST always provide a non-empty response, and so there is a 1038 system defined method for it to send this data back to the server. 1039 Unless defined otherwise, this will be via the 'standard output' file 1040 descriptor. 1042 The script MUST check the REQUEST_METHOD variable when processing the 1043 request and preparing its response. 1045 The server MAY implement a timeout period within which data must be 1046 received from the script. If a server implementation defines such a 1047 timeout and receives no data from a script within the timeout period, 1048 the server MAY terminate the script process. 1050 6.2 Response Types 1052 The response comprises a message-header and a message-body, separated 1053 by a blank line. The message-header contains one ore more header 1054 fields. The body may be NULL. 1056 generic-response = 1*header-field NL [ response-body ] 1058 The script MUST return one of either a document response, a local 1059 redirect response or a client redirect (with optional document) 1060 response. In the response definitions below, the order of header 1061 fields in a response is not significant (despite appearing so in the 1062 BNF). The header fields are defined in section 6.3. 1064 CGI-Response = document-response | local-redir-response | 1065 client-redir-response | client-redirdoc-response 1067 6.2.1 Document Response 1069 The CGI script can return a document to the user in a document 1070 response, with an optional error code indicating the success status 1071 of the response. 1073 document-response = Content-Type [ Status ] *other-field NL 1074 response-body 1076 The script MUST return a Content-Type header field. A Status header 1077 field is optional, and status 200 'OK' is assumed if it is omitted. 1078 The server MUST make any appropriate modifications to the script's 1079 output to ensure that the response to the client complies with the 1080 response protocol version. 1082 6.2.2 Local Redirect Response 1084 The CGI script can return a URI path and query-string 1085 ('local-pathquery') for a local resource in a Location header field. 1086 This indicates to the server that it should reprocess the request 1087 using the path specified. 1089 local-redir-response = local-Location NL 1091 The script MUST NOT return any other header fields or a message-body, 1092 and the server MUST generate the response that it would have produced 1093 in response to a request containing the URL 1095 scheme "://" server-name ":" server-port local-pathquery 1097 6.2.3 Client Redirect Response 1099 The CGI script can return an absolute URI path in a Location header 1100 field, to indicate to the client that it should reprocess the request 1101 using the URI specified. 1103 client-redir-response = client-Location *extension-field NL 1105 The script MUST not provide any other header fields, except for 1106 server-defined CGI extension fields. For an HTTP client request, the 1107 server MUST generate a 302 'Found' HTTP response message. 1109 6.2.4 Client Redirect Response with Document 1111 The CGI script can return an absolute URI path in a Location header 1112 field together with an attached document, to indicate to the client 1113 that it should reprocess the request using the URI specified. 1115 client-redirdoc-response = client-Location Status Content-Type 1116 *other-field NL response-body 1118 The Status header field MUST be supplied and MUST contain a status 1119 value of 302 'Found'. The server MUST make any appropriate 1120 modifications to the script's output to ensure that the response to 1121 the client complies with the response protocol version. 1123 6.3 Response Header Fields 1125 The response header fields are either CGI or extension header fields 1126 to be interpreted by the server, or protocol-specific headers to be 1127 included in the response returned to the client. At least one CGI 1128 field MUST be supplied; each CGI field MUST NOT appear more than once 1129 in the response. The response header fields have the syntax: 1131 header-field = CGI-field | other-field 1132 CGI-field = Content-Type | Location | Status 1133 other-field = protocol-field | extension-field 1134 protocol-field = generic-field 1135 extension-field = generic-field 1136 generic-field = field-name ":" [ field-value ] NL 1137 field-name = token 1138 field-value = *( field-content | LWSP ) 1139 field-content = *( token | separator | quoted-string ) 1141 The field-name is not case sensitive. A NULL field value is 1142 equivalent to a field not being sent. Note that each header field in 1143 a CGI-Response MUST be specified on a single line; CGI/1.1 does not 1144 support continuation lines. Whitespace is permitted between the ":" 1145 and the field-value (but not between the field-name and the ":"), and 1146 also between tokens in the field-value. 1148 6.3.1 Content-Type 1150 The Content-Type response field sets the Internet Media Type [10] of 1151 the entity body. 1153 Content-Type = "Content-Type:" media-type NL 1155 If an entity body is returned, the script MUST supply a Content-Type 1156 field in the response. If it fails to do so, the server SHOULD NOT 1157 attempt to determine the correct content type. The value SHOULD be 1158 sent unmodified to the client, except for any charset parameter 1159 changes. 1161 Unless it is otherwise system-defined, the default charset assumed by 1162 the client for text media-types is ISO-8859-1 if the protocol is HTTP 1163 and US-ASCII otherwise. Hence the script SHOULD include a charset 1164 parameter. See section 3.4.1 of the HTTP/1.1 specification [8] for a 1165 discussion of this issue. 1167 6.3.2 Location 1169 The Location header field is used to specify to the server that the 1170 script is returning a reference to a document rather than an actual 1171 document. It is either an absolute URI (with fragment), indicating 1172 that the client is to fetch the referenced document, or a local URI 1173 path (with query string), indicating that the server is to fetch the 1174 referenced document. 1176 Location = local-Location | client-Location 1177 client-Location = "Location:" fragment-URI NL 1178 local-Location = "Location:" local-pathquery NL 1179 fragment-URI = absoluteURI [ "#" fragment ] 1180 fragment = *uric 1181 local-pathquery = abs-path [ "?" query-string ] 1182 abs-path = "/" path-segments 1183 path-segments = segment *( "/" segment ) 1184 segment = *pchar 1185 pchar = unreserved | escaped | extra 1186 extra = ":" | "@" | "&" | "=" | "+" | "$" | "," 1188 The syntax of an absoluteURI is incorporated into this document from 1189 that specified in RFC 2396 [3] and RFC 2732 [11]. A valid 1190 absoluteURI always starts with the name of scheme followed by ":"; 1191 scheme names start with a letter and continue with alphanumerics, 1192 "+", "-" or ".". The local URI path and query must be an absolute 1193 path, and not a relative path or NULL, and hence must start with a 1194 "/". 1196 Note that any message-body attached to the request (such as for a 1197 POST request) may not be available to the resource that is the target 1198 of the redirect. 1200 6.3.3 Status 1202 The Status header field contains a 3-digit integer result code that 1203 indicates the level of success of the script's attempt to handle the 1204 request. 1206 Status = "Status:" status-code SP reason-phrase NL 1207 status-code = "200" | "302" | "400" | "501" | 3digit 1208 reason-phrase = *TEXT 1210 Status code 200 'OK' indicates success, and is the default value 1211 assumed for a document response. Status code 302 'Found' is used 1212 with a Location header field and response message-body. Status code 1213 400 'Bad Request' may be used for an unknown request format, such as 1214 a missing CONTENT_TYPE. Status code 501 'Not Implemented' may be 1215 returned by a script if it receives an unsupported REQUEST_METHOD. 1217 Other valid status codes are listed in section 6.1.1 of the HTTP 1218 specifications [2], [8], and also the IANA HTTP Status Code Registry 1219 [18], and can be used in addition to or instead of the ones listed 1220 above. The script SHOULD check the value of SERVER_PROTOCOL before 1221 using HTTP/1.1 status codes. The script MAY reject with error 405 1222 'Method Not Allowed' HTTP/1.1 requests made using a method it does 1223 not support. 1225 Note that returning an error status code does not have to mean an 1226 error condition with the script itself. For example, a script that 1227 is invoked as an error handler by the server should return the code 1228 appropriate to the server's error condition. 1230 The reason-phrase is a textual description of the error to be 1231 returned to the client for human consumption. 1233 6.3.4 Protocol-Specific Header Fields 1235 The script MAY return any other header fields that relate to the 1236 response message defined by the specification for the SERVER_PROTOCOL 1237 (HTTP/1.0 [2] or HTTP/1.1 [8]). The server MUST translate the header 1238 data from the CGI header syntax to the HTTP header syntax if these 1239 differ. For example, the character sequence for newline (such as 1240 UNIX's US-ASCII LF) used by CGI scripts may not be the same as that 1241 used by HTTP (US-ASCII CR followed by LF). 1243 The script MUST NOT return any header fields that relate to 1244 client-side communication issues and could affect the server's 1245 ability to send the response to the client. The server MAY remove 1246 any such header fields returned by the client. It SHOULD resolve any 1247 conflicts between headers returned by the script and headers that it 1248 would otherwise send itself. 1250 6.3.5 Extension Header Fields 1252 The server may define additional implementation-specific CGI header 1253 fields, whose field names SHOULD begin with "X-CGI-". It MAY ignore 1254 (and delete) any unrecognised header fields with names beginning 1255 "X-CGI-". 1257 6.4 Response Message-Body 1259 The response message-body is an attached document to be returned to 1260 the client by the server. The server MUST read all the data provided 1261 by the script, until the script signals the end of the message-body 1262 by way of an end-of-file condition. The message-body SHOULD be sent 1263 unmodified to the client, except for HEAD requests or any required 1264 transfer-codings, content-codings or charset conversions. 1266 response-body = *OCTET 1268 7 System Specifications 1270 7.1 AmigaDOS 1272 Meta-Variables 1273 Meta-variables are passed to the script in identically named 1274 environment variables. These are accessed by the DOS library 1275 routine GetVar(). The flags argument SHOULD be 0. Case is 1276 ignored, but upper case is recommended for compatibility with 1277 case-sensitive systems. 1279 The current working directory 1280 The current working directory for the script is set to the 1281 directory containing the script. 1283 Character set 1284 The US-ASCII character set [20] is used for the definition of 1285 meta-variables, header fields and values; the newline (NL) 1286 sequence is LF; servers SHOULD also accept CR LF as a newline. 1288 7.2 UNIX 1290 For UNIX compatible operating systems, the following are defined: 1292 Meta-Variables 1293 Meta-variables are passed to the script in identically named 1294 environment variables. These are accessed by the C library 1295 routine getenv() or variable environ. 1297 The command line 1298 This is accessed using the the argc and argv arguments to main(). 1299 The words have any characters which are 'active' in the Bourne 1300 shell escaped with a backslash. 1302 The current working directory 1303 The current working directory for the script SHOULD be set to the 1304 directory containing the script. 1306 Character set 1307 The US-ASCII character set [20], excluding NUL, is used for the 1308 definition of meta-variables, header fields and CHAR values; TEXT 1309 values use ISO-8859-1. The PATH_TRANSLATED value can contain any 1310 8-bit byte except NUL. The newline (NL) sequence is LF; servers 1311 should also accept CR LF as a newline. 1313 7.3 EBCDIC/POSIX 1315 For POSIX compatible operating systems using the EBCDIC character 1316 set, the following are defined: 1318 Meta-Variables 1319 Meta-variables are passed to the script in identically named 1320 environment variables. These are accessed by the C library 1321 routine getenv(). 1323 The command line 1324 This is accessed using the the argc and argv arguments to main(). 1325 The words have any characters which are 'active' in the Bourne 1326 shell escaped with a backslash. 1328 The current working directory 1329 The current working directory for the script SHOULD be set to the 1330 directory containing the script. 1332 Character set 1333 The IBM1047 character set [19], excluding NUL, is used for the 1334 definition of meta-variables, header fields, values, TEXT strings 1335 and the PATH_TRANSLATED value. The newline (NL) sequence is LF; 1336 servers should also accept CR LF as a newline. 1338 media-type charset default 1339 The default charset value for text (and other 1340 implementation-defined) media types is IBM1047. 1342 8 Implementation 1344 8.1 Recommendations for Servers 1346 Although the server and the CGI script need not be consistent in 1347 their handling of URL paths (client URLs and the PATH_INFO data, 1348 respectively), server authors may wish to impose consistency. So the 1349 server implementation should specify its behaviour for the following 1350 cases: 1352 1. define any restrictions on allowed path segments, in particular 1353 whether non-terminal NULL segments are permitted; 1355 2. define the behaviour for "." or ".." path segments; i.e. 1356 whether they are prohibited, treated as ordinary path segments 1357 or interpreted in accordance with the relative URL 1358 specification [3]; 1360 3. define any limits of the implementation, including limits on 1361 path or search string lengths, and limits on the volume of 1362 header fields the server will parse. 1364 8.2 Recommendations for Scripts 1366 If the script does not intend processing the PATH_INFO data, then it 1367 should reject the request with 404 Not Found if PATH_INFO is not 1368 NULL. 1370 If the output of a form is being processed, check that CONTENT_TYPE 1371 is "application/x-www-form-urlencoded" [15] or "multipart/form-data" 1372 [13]. If CONTENT_TYPE is blank, the script can reject the request 1373 with a 415 'Unsupported Media Type' error, where supported by the 1374 protocol. 1376 When parsing PATH_INFO, PATH_TRANSLATED or SCRIPT_NAME the script 1377 should be careful of void path segments ("//") and special path 1378 segments ("." and ".."). They should either be removed from the path 1379 before use in OS system calls, or the request should be rejected with 1380 404 'Not Found'. 1382 When returning header fields, the script should try to send the CGI 1383 headers as soon as possible, and should send them before any HTTP 1384 headers. This may help reduce the server's memory requirements. 1386 9 Security Considerations 1388 9.1 Safe Methods 1390 As discussed in the security considerations of the HTTP 1391 specifications [2], [8], the convention has been established that the 1392 GET and HEAD methods should be 'safe' and 'idempotent' (repeated 1393 requests have the same effect as a single request). See section 9.1 1394 of RFC 2616 [8] for a full discussion. 1396 9.2 Header Fields Containing Sensitive Information 1398 Some HTTP header fields may carry sensitive information which the 1399 server should not pass on to the script unless explicitly configured 1400 to do so. For example, if the server protects the script using the 1401 Basic authentication scheme, then the client will send an 1402 Authorization header field containing a username and password. The 1403 server validates this information and so it should not pass on the 1404 password via the HTTP_AUTHORIZATION meta-variable without careful 1405 consideration. This also applies to the Proxy-Authorization header 1406 field and the corresponding HTTP_PROXY_AUTHORIZATION meta-variable. 1408 9.3 Data Privacy 1410 Confidential data in a request should be placed in a message-body as 1411 part of a POST request, and not placed in the URI or message headers. 1412 On some systems, the environment used to pass meta-variables to a 1413 script may be visible to other scripts or users. In addition, many 1414 existing servers, proxies and clients will permanently record the URI 1415 where it might be visible to third parties. 1417 9.4 Information Security Model 1419 For a client connection using TLS, the security model applies between 1420 the client and the server, and not between the client and the script. 1421 It is the server's responsibility to handle the TLS session, and thus 1422 it is the server which is authenticated to the client, not the CGI 1423 script. 1425 This specification provides no mechanism for the script to 1426 authenticate the server which invoked it. There is no enforced 1427 integrity on the CGI request and response messages. 1429 9.5 Script Interference with the Server 1431 The most common implementation of CGI invokes the script as a child 1432 process using the same user and group as the server process. It 1433 should therefore be ensured that the script cannot interfere with the 1434 server process, its configuration, documents or log files. 1436 If the script is executed by calling a function linked in to the 1437 server software (either at compile-time or run-time) then precautions 1438 should be taken to protect the core memory of the server, or to 1439 ensure that untrusted code cannot be executed. 1441 9.6 Data Length and Buffering Considerations 1443 This specification places no limits on the length of the message-body 1444 presented to the script. The script should not assume that 1445 statically allocated buffers of any size are sufficient to contain 1446 the entire submission at one time. Use of a fixed length buffer 1447 without careful overflow checking may result in an attacker 1448 exploiting 'stack-smashing' or 'stack-overflow' vulnerabilities of 1449 the operating system. The script may spool large submissions to disk 1450 or other buffering media, but a rapid succession of large submissions 1451 may result in denial of service conditions. If the CONTENT_LENGTH of 1452 a message-body is larger than resource considerations allow, scripts 1453 should respond with an error status appropriate for the protocol 1454 version; potentially applicable status codes include 503 'Service 1455 Unavailable' (HTTP/1.0 and HTTP/1.1), 413 'Request Entity Too Large' 1456 (HTTP/1.1), and 414 'Request-URI Too Large' (HTTP/1.1). 1458 Similar considerations apply to the server's handling of the CGI 1459 response from the script. There is no limit on the length of the 1460 header or message-body returned by the script; the server should not 1461 assume that statically allocated buffers of any size are sufficient 1462 to contain the entire response. 1464 9.7 Stateless Processing 1466 The stateless nature of the Web makes each script execution and 1467 resource retrieval independent of all others even when multiple 1468 requests constitute a single conceptual Web transaction. Because of 1469 this, a script should not make any assumptions about the context of 1470 the user-agent submitting a request. In particular, scripts should 1471 examine data obtained from the client and verify that they are valid, 1472 both in form and content, before allowing them to be used for 1473 sensitive purposes such as input to other applications, commands, or 1474 operating system services. These uses include (but are not limited 1475 to) system call arguments, database writes, dynamically evaluated 1476 source code, and input to billing or other secure processes. It is 1477 important that applications be protected from invalid input 1478 regardless of whether the invalidity is the result of user error, 1479 logic error, or malicious action. 1481 Authors of scripts involved in multi-request transactions should be 1482 particularly cautious about validating the state information; 1483 undesirable effects may result from the substitution of dangerous 1484 values for portions of the submission which might otherwise be 1485 presumed safe. Subversion of this type occurs when alterations are 1486 made to data from a prior stage of the transaction that were not 1487 meant to be controlled by the client (e.g., hidden HTML form 1488 elements, cookies, embedded URLs, etc.). 1490 9.8 Relative Paths 1492 The server should be careful of ".." path segments in the request 1493 URI. These should be removed or resolved in the request URI before 1494 it is split into the script-path and extra-path. Alternatively, when 1495 the extra-path is used to find the PATH_TRANSLATED, care should be 1496 taken to avoid the path resolution from providing translated paths 1497 outside an expected path hierarchy. 1499 9.9 Non-parsed Header Output 1501 If a script returns a non-parsed header output, to be interpreted by 1502 the client in its native protocol, then the script must address all 1503 security considerations relating to that protocol. 1505 10 Acknowledgements 1507 This work is based on the original CGI interface that arose out of 1508 discussions on the 'www-talk' mailing list. In particular, Rob 1509 McCool, John Franks, Ari Luotonen, George Phillips and Tony Sanders 1510 deserve special recognition for their efforts in defining and 1511 implementing the early versions of this interface. 1513 This document has also greatly benefited from the comments and 1514 suggestions made Chris Adie, Dave Kristol and Mike Meyer; also David 1515 Morris, Jeremy Madea, Patrick McManus, Adam Donahue, Ross Patterson 1516 and Harald Alvestrand. 1518 11 References 1520 [1] Berners-Lee, T., 'Universal Resource Identifiers in WWW: A 1521 Unifying Syntax for the Expression of Names and Addresses of 1522 Objects on the Network as used in the World-Wide Web', RFC 1630, 1523 CERN, June 1994. 1525 [2] Berners-Lee, T., Fielding, R. T. and Frystyk, H., 'Hypertext 1526 Transfer Protocol -- HTTP/1.0', RFC 1945, MIT/LCS, UC Irvine, 1527 May 1996. 1529 [3] Berners-Lee, T., Fielding, R. and Masinter, L., 'Uniform 1530 Resource Identifiers (URI) : Generic Syntax', RFC 2396, MIT/LC, 1531 U.C. Irvine, Xerox Corporation, August 1998. 1533 [4] Braden, R. (Editor), 'Requirements for Internet Hosts -- 1534 Application and Support', STD 3, RFC 1123, IETF, October 1989. 1536 [5] Bradner, S., 'Key words for use in RFCs to Indicate Requirements 1537 Levels', BCP 14, RFC 2119, Harvard University, March 1997. 1539 [6] Crocker, D.H., 'Standard for the Format of ARPA Internet Text 1540 Messages', STD 11, RFC 822, University of Delaware, August 1982. 1542 [7] Dierks, T. and Allen, C., 'The TLS Protocol Version 1.0', RFC 1543 2246, Certicom, January 1999. 1545 [8] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., 1546 Leach, P. and Berners-Lee, T., 'Hypertext Transfer Protocol -- 1547 HTTP/1.1', RFC 2616, UC Irving, Compaq/W3C, Compaq, W3C/MIT, 1548 Xerox, Microsoft, W3C/MIT, June 1999. 1550 [9] Franks, J., Hallam-Baker, P., Hostetler, J., Lawrence, S., 1551 Leach, P., Luotonen, A. and Stewart L., 'HTTP Authentication: 1552 Basic and Digest Access Authentication', RFC 2617, Northwestern 1553 University, Verisign Inc., AbiSource, Inc., Agranat Systems, 1554 Inc., Microsoft Corporation, Netscape Communications 1555 Corporation, Open Market, Inc., June 1999. 1557 [10] Freed, N. and Borenstein N., 'Multipurpose Internet Mail 1558 Extensions (MIME) Part Two: Media Types', RFC 2046, Innosoft, 1559 First Virtual, November 1996. 1561 [11] Hinden, R., Carpenter, B. and Masinter, L., 'Format for Literal 1562 IPv6 Addresses in URL's', RFC 2732, Nokia, IBM, AT&T, December 1563 1999. 1565 [12] Hinden R. and Deering S., 'IP Version 6 Addressing 1566 Architecture', RFC 2373, Nokia, Cisco Systems, July 1998. 1568 [13] Masinter, L., 'Returning Values from Forms: 1569 multipart/form-data', RFC 2388, Xerox Corporation, August 1998. 1571 [14] Mockapetris, P., 'Domain Names - Concepts and Facilities', STD 1572 13, RFC 1034, ISI, November 1987. 1574 [15] Raggett, D., Le Hors, A. and Jacobs, I. (eds), 'HTML 4.01 1575 Specification', W3C Recommendation December 1999, 1576 http://www.w3.org/TR/html401/. 1578 [16] Rescola, E. 'HTTP Over TLS', RFC 2818, RTFM, May 2000. 1580 [17] St. Johns, M., 'Identification Protocol', RFC 1413, US 1581 Department of Defense, February 1993. 1583 [18] 'HTTP Status Code Registry', 1584 http://www.iana.org/assignments/http-status-codes, IANA. 1586 [19] IBM National Language Support Reference Manual Volume 2, 1587 SE09-8002-01, March 1990. 1589 [20] 'Information Systems -- Coded Character Sets -- 7-bit American 1590 Standard Code for Information Interchange (7-Bit ASCII)', ANSI 1591 INCITS.4-1986 (R2002). 1593 [21] 'Information technology -- 8-bit single-byte coded graphic 1594 character sets -- Part 1: Latin alphabet No. 1', ISO/IEC 1595 8859-1:1998. 1597 [22] 'The Common Gateway Interface', 1598 http://hoohoo.ncsa.uiuc.edu/cgi/, NCSA, University of Illinois. 1600 12 Authors' Addresses 1602 David Robinson 1603 Apache Software Foundation 1604 Email: drtr@apache.org 1606 Ken A. L. Coar 1607 MeepZor Consulting 1608 7824 Mayfaire Crest Lane, Suite 202 1609 Raleigh, NC 27615-4875 1610 USA 1611 Tel: +1 (919) 254 4237 1612 Fax: +1 (919) 254 5420 1613 Email: Ken.Coar@Golux.com