idnits 2.17.1 draft-ietf-httpbis-p3-payload-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 30. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1717. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1728. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1735. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1741. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 12, 2008) is 5948 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-8859-1' == Outdated reference: A later version (-26) exists of draft-ietf-httpbis-p1-messaging-01 == Outdated reference: A later version (-26) exists of draft-ietf-httpbis-p2-semantics-01 == Outdated reference: A later version (-26) exists of draft-ietf-httpbis-p4-conditional-01 == Outdated reference: A later version (-26) exists of draft-ietf-httpbis-p5-range-01 == Outdated reference: A later version (-26) exists of draft-ietf-httpbis-p6-cache-01 ** Obsolete normative reference: RFC 1766 (Obsoleted by RFC 3066, RFC 3282) ** Downref: Normative reference to an Informational RFC: RFC 1950 ** Downref: Normative reference to an Informational RFC: RFC 1951 ** Downref: Normative reference to an Informational RFC: RFC 1952 ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) -- Obsolete informational reference (is this intentional?): RFC 1806 (Obsoleted by RFC 2183) -- Obsolete informational reference (is this intentional?): RFC 2068 (Obsoleted by RFC 2616) -- Obsolete informational reference (is this intentional?): RFC 2388 (Obsoleted by RFC 7578) -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2822 (Obsoleted by RFC 5322) Summary: 6 errors (**), 0 flaws (~~), 6 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Fielding, Ed. 3 Internet-Draft Day Software 4 Obsoletes: 2616 (if approved) J. Gettys 5 Intended status: Standards Track One Laptop per Child 6 Expires: July 15, 2008 J. Mogul 7 HP 8 H. Frystyk 9 Microsoft 10 L. Masinter 11 Adobe Systems 12 P. Leach 13 Microsoft 14 T. Berners-Lee 15 W3C/MIT 16 Y. Lafon, Ed. 17 W3C 18 J. Reschke, Ed. 19 greenbytes 20 January 12, 2008 22 HTTP/1.1, part 3: Message Payload and Content Negotiation 23 draft-ietf-httpbis-p3-payload-01 25 Status of this Memo 27 By submitting this Internet-Draft, each author represents that any 28 applicable patent or other IPR claims of which he or she is aware 29 have been or will be disclosed, and any of which he or she becomes 30 aware will be disclosed, in accordance with Section 6 of BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF), its areas, and its working groups. Note that 34 other groups may also distribute working documents as Internet- 35 Drafts. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 The list of current Internet-Drafts can be accessed at 43 http://www.ietf.org/ietf/1id-abstracts.txt. 45 The list of Internet-Draft Shadow Directories can be accessed at 46 http://www.ietf.org/shadow.html. 48 This Internet-Draft will expire on July 15, 2008. 50 Copyright Notice 52 Copyright (C) The IETF Trust (2008). 54 Abstract 56 The Hypertext Transfer Protocol (HTTP) is an application-level 57 protocol for distributed, collaborative, hypermedia information 58 systems. HTTP has been in use by the World Wide Web global 59 information initiative since 1990. This document is Part 3 of the 60 seven-part specification that defines the protocol referred to as 61 "HTTP/1.1" and, taken together, obsoletes RFC 2616. Part 3 defines 62 HTTP message content, metadata, and content negotiation. 64 Editorial Note (To be removed by RFC Editor) 66 Discussion of this draft should take place on the HTTPBIS working 67 group mailing list (ietf-http-wg@w3.org). The current issues list is 68 at and related 69 documents (including fancy diffs) can be found at 70 . 72 This draft incorporates those issue resolutions that were either 73 collected in the original RFC2616 errata list 74 (), or which were agreed upon on the 75 mailing list between October 2006 and November 2007 (as published in 76 "draft-lafon-rfc2616bis-03"). 78 Table of Contents 80 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 81 1.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5 82 2. Protocol Parameters . . . . . . . . . . . . . . . . . . . . . 5 83 2.1. Character Sets . . . . . . . . . . . . . . . . . . . . . . 5 84 2.1.1. Missing Charset . . . . . . . . . . . . . . . . . . . 6 85 2.2. Content Codings . . . . . . . . . . . . . . . . . . . . . 7 86 2.3. Media Types . . . . . . . . . . . . . . . . . . . . . . . 8 87 2.3.1. Canonicalization and Text Defaults . . . . . . . . . . 9 88 2.3.2. Multipart Types . . . . . . . . . . . . . . . . . . . 9 89 2.4. Quality Values . . . . . . . . . . . . . . . . . . . . . . 10 90 2.5. Language Tags . . . . . . . . . . . . . . . . . . . . . . 10 91 3. Entity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 92 3.1. Entity Header Fields . . . . . . . . . . . . . . . . . . . 11 93 3.2. Entity Body . . . . . . . . . . . . . . . . . . . . . . . 12 94 3.2.1. Type . . . . . . . . . . . . . . . . . . . . . . . . . 12 95 3.2.2. Entity Length . . . . . . . . . . . . . . . . . . . . 12 96 4. Content Negotiation . . . . . . . . . . . . . . . . . . . . . 13 97 4.1. Server-driven Negotiation . . . . . . . . . . . . . . . . 13 98 4.2. Agent-driven Negotiation . . . . . . . . . . . . . . . . . 14 99 4.3. Transparent Negotiation . . . . . . . . . . . . . . . . . 15 100 5. Header Field Definitions . . . . . . . . . . . . . . . . . . . 15 101 5.1. Accept . . . . . . . . . . . . . . . . . . . . . . . . . . 16 102 5.2. Accept-Charset . . . . . . . . . . . . . . . . . . . . . . 18 103 5.3. Accept-Encoding . . . . . . . . . . . . . . . . . . . . . 18 104 5.4. Accept-Language . . . . . . . . . . . . . . . . . . . . . 20 105 5.5. Content-Encoding . . . . . . . . . . . . . . . . . . . . . 21 106 5.6. Content-Language . . . . . . . . . . . . . . . . . . . . . 22 107 5.7. Content-Location . . . . . . . . . . . . . . . . . . . . . 22 108 5.8. Content-MD5 . . . . . . . . . . . . . . . . . . . . . . . 23 109 5.9. Content-Type . . . . . . . . . . . . . . . . . . . . . . . 24 110 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 111 7. Security Considerations . . . . . . . . . . . . . . . . . . . 25 112 7.1. Privacy Issues Connected to Accept Headers . . . . . . . . 25 113 7.2. Content-Disposition Issues . . . . . . . . . . . . . . . . 26 114 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 26 115 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 116 9.1. Normative References . . . . . . . . . . . . . . . . . . . 26 117 9.2. Informative References . . . . . . . . . . . . . . . . . . 28 118 Appendix A. Differences Between HTTP Entities and RFC 2045 119 Entities . . . . . . . . . . . . . . . . . . . . . . 29 120 A.1. MIME-Version . . . . . . . . . . . . . . . . . . . . . . . 29 121 A.2. Conversion to Canonical Form . . . . . . . . . . . . . . . 30 122 A.3. Introduction of Content-Encoding . . . . . . . . . . . . . 30 123 A.4. No Content-Transfer-Encoding . . . . . . . . . . . . . . . 30 124 A.5. Introduction of Transfer-Encoding . . . . . . . . . . . . 31 125 A.6. MHTML and Line Length Limitations . . . . . . . . . . . . 31 127 Appendix B. Additional Features . . . . . . . . . . . . . . . . . 31 128 B.1. Content-Disposition . . . . . . . . . . . . . . . . . . . 31 129 Appendix C. Compatibility with Previous Versions . . . . . . . . 32 130 C.1. Changes from RFC 2068 . . . . . . . . . . . . . . . . . . 32 131 C.2. Changes from RFC 2616 . . . . . . . . . . . . . . . . . . 33 132 Appendix D. Change Log (to be removed by RFC Editor before 133 publication) . . . . . . . . . . . . . . . . . . . . 33 134 D.1. Since RFC2616 . . . . . . . . . . . . . . . . . . . . . . 33 135 D.2. Since draft-ietf-httpbis-p3-payload-00 . . . . . . . . . . 33 136 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 137 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 36 138 Intellectual Property and Copyright Statements . . . . . . . . . . 39 140 1. Introduction 142 This document defines HTTP/1.1 message payloads (a.k.a., content), 143 the associated metadata header fields that define how the payload is 144 intended to be interpreted by a recipient, the request header fields 145 that may influence content selection, and the various selection 146 algorithms that are collectively referred to as HTTP content 147 negotiation. 149 This document is currently disorganized in order to minimize the 150 changes between drafts and enable reviewers to see the smaller errata 151 changes. The next draft will reorganize the sections to better 152 reflect the content. In particular, the sections on entities will be 153 renamed payload and moved to the first half of the document, while 154 the sections on content negotiation and associated request header 155 fields will be moved to the second half. The current mess reflects 156 how widely dispersed these topics and associated requirements had 157 become in [RFC2616]. 159 1.1. Requirements 161 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 162 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 163 document are to be interpreted as described in [RFC2119]. 165 An implementation is not compliant if it fails to satisfy one or more 166 of the MUST or REQUIRED level requirements for the protocols it 167 implements. An implementation that satisfies all the MUST or 168 REQUIRED level and all the SHOULD level requirements for its 169 protocols is said to be "unconditionally compliant"; one that 170 satisfies all the MUST level requirements but not all the SHOULD 171 level requirements for its protocols is said to be "conditionally 172 compliant." 174 2. Protocol Parameters 176 2.1. Character Sets 178 HTTP uses the same definition of the term "character set" as that 179 described for MIME: 181 The term "character set" is used in this document to refer to a 182 method used with one or more tables to convert a sequence of octets 183 into a sequence of characters. Note that unconditional conversion in 184 the other direction is not required, in that not all characters may 185 be available in a given character set and a character set may provide 186 more than one sequence of octets to represent a particular character. 188 This definition is intended to allow various kinds of character 189 encoding, from simple single-table mappings such as US-ASCII to 190 complex table switching methods such as those that use ISO-2022's 191 techniques. However, the definition associated with a MIME character 192 set name MUST fully specify the mapping to be performed from octets 193 to characters. In particular, use of external profiling information 194 to determine the exact mapping is not permitted. 196 Note: This use of the term "character set" is more commonly 197 referred to as a "character encoding." However, since HTTP and 198 MIME share the same registry, it is important that the terminology 199 also be shared. 201 HTTP character sets are identified by case-insensitive tokens. The 202 complete set of tokens is defined by the IANA Character Set registry 203 (). 205 charset = token 207 Although HTTP allows an arbitrary token to be used as a charset 208 value, any token that has a predefined value within the IANA 209 Character Set registry MUST represent the character set defined by 210 that registry. Applications SHOULD limit their use of character sets 211 to those defined by the IANA registry. 213 HTTP uses charset in two contexts: within an Accept-Charset request 214 header (in which the charset value is an unquoted token) and as the 215 value of a parameter in a Content-Type header (within a request or 216 response), in which case the parameter value of the charset parameter 217 may be quoted. 219 Implementors should be aware of IETF character set requirements 220 [RFC3629] [RFC2277]. 222 2.1.1. Missing Charset 224 Some HTTP/1.0 software has interpreted a Content-Type header without 225 charset parameter incorrectly to mean "recipient should guess." 226 Senders wishing to defeat this behavior MAY include a charset 227 parameter even when the charset is ISO-8859-1 ([ISO-8859-1]) and 228 SHOULD do so when it is known that it will not confuse the recipient. 230 Unfortunately, some older HTTP/1.0 clients did not deal properly with 231 an explicit charset parameter. HTTP/1.1 recipients MUST respect the 232 charset label provided by the sender; and those user agents that have 233 a provision to "guess" a charset MUST use the charset from the 234 content-type field if they support that charset, rather than the 235 recipient's preference, when initially displaying a document. See 236 Section 2.3.1. 238 2.2. Content Codings 240 Content coding values indicate an encoding transformation that has 241 been or can be applied to an entity. Content codings are primarily 242 used to allow a document to be compressed or otherwise usefully 243 transformed without losing the identity of its underlying media type 244 and without loss of information. Frequently, the entity is stored in 245 coded form, transmitted directly, and only decoded by the recipient. 247 content-coding = token 249 All content-coding values are case-insensitive. HTTP/1.1 uses 250 content-coding values in the Accept-Encoding (Section 5.3) and 251 Content-Encoding (Section 5.5) header fields. Although the value 252 describes the content-coding, what is more important is that it 253 indicates what decoding mechanism will be required to remove the 254 encoding. 256 The Internet Assigned Numbers Authority (IANA) acts as a registry for 257 content-coding value tokens. Initially, the registry contains the 258 following tokens: 260 gzip 262 An encoding format produced by the file compression program "gzip" 263 (GNU zip) as described in [RFC1952]. This format is a Lempel-Ziv 264 coding (LZ77) with a 32 bit CRC. 266 compress 268 The encoding format produced by the common UNIX file compression 269 program "compress". This format is an adaptive Lempel-Ziv-Welch 270 coding (LZW). 272 Use of program names for the identification of encoding formats is 273 not desirable and is discouraged for future encodings. Their use 274 here is representative of historical practice, not good design. 275 For compatibility with previous implementations of HTTP, 276 applications SHOULD consider "x-gzip" and "x-compress" to be 277 equivalent to "gzip" and "compress" respectively. 279 deflate 281 The "zlib" format defined in [RFC1950] in combination with the 282 "deflate" compression mechanism described in [RFC1951]. 284 identity 286 The default (identity) encoding; the use of no transformation 287 whatsoever. This content-coding is used only in the Accept- 288 Encoding header, and SHOULD NOT be used in the Content-Encoding 289 header. 291 New content-coding value tokens SHOULD be registered; to allow 292 interoperability between clients and servers, specifications of the 293 content coding algorithms needed to implement a new value SHOULD be 294 publicly available and adequate for independent implementation, and 295 conform to the purpose of content coding defined in this section. 297 2.3. Media Types 299 HTTP uses Internet Media Types [RFC2046] in the Content-Type 300 (Section 5.9) and Accept (Section 5.1) header fields in order to 301 provide open and extensible data typing and type negotiation. 303 media-type = type "/" subtype *( ";" parameter ) 304 type = token 305 subtype = token 307 Parameters MAY follow the type/subtype in the form of attribute/value 308 pairs. 310 parameter = attribute "=" value 311 attribute = token 312 value = token | quoted-string 314 The type, subtype, and parameter attribute names are case- 315 insensitive. Parameter values might or might not be case-sensitive, 316 depending on the semantics of the parameter name. Linear white space 317 (LWS) MUST NOT be used between the type and subtype, nor between an 318 attribute and its value. The presence or absence of a parameter 319 might be significant to the processing of a media-type, depending on 320 its definition within the media type registry. 322 Note that some older HTTP applications do not recognize media type 323 parameters. When sending data to older HTTP applications, 324 implementations SHOULD only use media type parameters when they are 325 required by that type/subtype definition. 327 Media-type values are registered with the Internet Assigned Number 328 Authority (IANA). The media type registration process is outlined in 329 [RFC4288]. Use of non-registered media types is discouraged. 331 2.3.1. Canonicalization and Text Defaults 333 Internet media types are registered with a canonical form. An 334 entity-body transferred via HTTP messages MUST be represented in the 335 appropriate canonical form prior to its transmission except for 336 "text" types, as defined in the next paragraph. 338 When in canonical form, media subtypes of the "text" type use CRLF as 339 the text line break. HTTP relaxes this requirement and allows the 340 transport of text media with plain CR or LF alone representing a line 341 break when it is done consistently for an entire entity-body. HTTP 342 applications MUST accept CRLF, bare CR, and bare LF as being 343 representative of a line break in text media received via HTTP. In 344 addition, if the text is represented in a character set that does not 345 use octets 13 and 10 for CR and LF respectively, as is the case for 346 some multi-byte character sets, HTTP allows the use of whatever octet 347 sequences are defined by that character set to represent the 348 equivalent of CR and LF for line breaks. This flexibility regarding 349 line breaks applies only to text media in the entity-body; a bare CR 350 or LF MUST NOT be substituted for CRLF within any of the HTTP control 351 structures (such as header fields and multipart boundaries). 353 If an entity-body is encoded with a content-coding, the underlying 354 data MUST be in a form defined above prior to being encoded. 356 The "charset" parameter is used with some media types to define the 357 character set (Section 2.1) of the data. When no explicit charset 358 parameter is provided by the sender, media subtypes of the "text" 359 type are defined to have a default charset value of "ISO-8859-1" when 360 received via HTTP. Data in character sets other than "ISO-8859-1" or 361 its subsets MUST be labeled with an appropriate charset value. See 362 Section 2.1.1 for compatibility problems. 364 2.3.2. Multipart Types 366 MIME provides for a number of "multipart" types -- encapsulations of 367 one or more entities within a single message-body. All multipart 368 types share a common syntax, as defined in Section 5.1.1 of 369 [RFC2046], and MUST include a boundary parameter as part of the media 370 type value. The message body is itself a protocol element and MUST 371 therefore use only CRLF to represent line breaks between body-parts. 372 Unlike in RFC 2046, the epilogue of any multipart message MUST be 373 empty; HTTP applications MUST NOT transmit the epilogue (even if the 374 original multipart contains an epilogue). These restrictions exist 375 in order to preserve the self-delimiting nature of a multipart 376 message-body, wherein the "end" of the message-body is indicated by 377 the ending multipart boundary. 379 In general, HTTP treats a multipart message-body no differently than 380 any other media type: strictly as payload. The one exception is the 381 "multipart/byteranges" type (Appendix A of [Part5]) when it appears 382 in a 206 (Partial Content) response. In all other cases, an HTTP 383 user agent SHOULD follow the same or similar behavior as a MIME user 384 agent would upon receipt of a multipart type. The MIME header fields 385 within each body-part of a multipart message-body do not have any 386 significance to HTTP beyond that defined by their MIME semantics. 388 In general, an HTTP user agent SHOULD follow the same or similar 389 behavior as a MIME user agent would upon receipt of a multipart type. 390 If an application receives an unrecognized multipart subtype, the 391 application MUST treat it as being equivalent to "multipart/mixed". 393 Note: The "multipart/form-data" type has been specifically defined 394 for carrying form data suitable for processing via the POST 395 request method, as described in [RFC2388]. 397 2.4. Quality Values 399 HTTP content negotiation (Section 4) uses short "floating point" 400 numbers to indicate the relative importance ("weight") of various 401 negotiable parameters. A weight is normalized to a real number in 402 the range 0 through 1, where 0 is the minimum and 1 the maximum 403 value. If a parameter has a quality value of 0, then content with 404 this parameter is `not acceptable' for the client. HTTP/1.1 405 applications MUST NOT generate more than three digits after the 406 decimal point. User configuration of these values SHOULD also be 407 limited in this fashion. 409 qvalue = ( "0" [ "." 0*3DIGIT ] ) 410 | ( "1" [ "." 0*3("0") ] ) 412 "Quality values" is a misnomer, since these values merely represent 413 relative degradation in desired quality. 415 2.5. Language Tags 417 A language tag identifies a natural language spoken, written, or 418 otherwise conveyed by human beings for communication of information 419 to other human beings. Computer languages are explicitly excluded. 420 HTTP uses language tags within the Accept-Language and Content- 421 Language fields. 423 The syntax and registry of HTTP language tags is the same as that 424 defined by [RFC1766]. In summary, a language tag is composed of 1 or 425 more parts: A primary language tag and a possibly empty series of 426 subtags: 428 language-tag = primary-tag *( "-" subtag ) 429 primary-tag = 1*8ALPHA 430 subtag = 1*8ALPHA 432 White space is not allowed within the tag and all tags are case- 433 insensitive. The name space of language tags is administered by the 434 IANA. Example tags include: 436 en, en-US, en-cockney, i-cherokee, x-pig-latin 438 where any two-letter primary-tag is an ISO-639 language abbreviation 439 and any two-letter initial subtag is an ISO-3166 country code. (The 440 last three tags above are not registered tags; all but the last are 441 examples of tags which could be registered in future.) 443 3. Entity 445 Request and Response messages MAY transfer an entity if not otherwise 446 restricted by the request method or response status code. An entity 447 consists of entity-header fields and an entity-body, although some 448 responses will only include the entity-headers. 450 In this section, both sender and recipient refer to either the client 451 or the server, depending on who sends and who receives the entity. 453 3.1. Entity Header Fields 455 Entity-header fields define metainformation about the entity-body or, 456 if no body is present, about the resource identified by the request. 458 entity-header = Allow ; [Part2], Section 10.1 459 | Content-Encoding ; Section 5.5 460 | Content-Language ; Section 5.6 461 | Content-Length ; [Part1], Section 8.2 462 | Content-Location ; Section 5.7 463 | Content-MD5 ; Section 5.8 464 | Content-Range ; [Part5], Section 5.2 465 | Content-Type ; Section 5.9 466 | Expires ; [Part6], Section 15.3 467 | Last-Modified ; [Part4], Section 6.6 468 | extension-header 470 extension-header = message-header 472 The extension-header mechanism allows additional entity-header fields 473 to be defined without changing the protocol, but these fields cannot 474 be assumed to be recognizable by the recipient. Unrecognized header 475 fields SHOULD be ignored by the recipient and MUST be forwarded by 476 transparent proxies. 478 3.2. Entity Body 480 The entity-body (if any) sent with an HTTP request or response is in 481 a format and encoding defined by the entity-header fields. 483 entity-body = *OCTET 485 An entity-body is only present in a message when a message-body is 486 present, as described in Section 4.3 of [Part1]. The entity-body is 487 obtained from the message-body by decoding any Transfer-Encoding that 488 might have been applied to ensure safe and proper transfer of the 489 message. 491 3.2.1. Type 493 When an entity-body is included with a message, the data type of that 494 body is determined via the header fields Content-Type and Content- 495 Encoding. These define a two-layer, ordered encoding model: 497 entity-body := Content-Encoding( Content-Type( data ) ) 499 Content-Type specifies the media type of the underlying data. 500 Content-Encoding may be used to indicate any additional content 501 codings applied to the data, usually for the purpose of data 502 compression, that are a property of the requested resource. There is 503 no default encoding. 505 Any HTTP/1.1 message containing an entity-body SHOULD include a 506 Content-Type header field defining the media type of that body. If 507 and only if the media type is not given by a Content-Type field, the 508 recipient MAY attempt to guess the media type via inspection of its 509 content and/or the name extension(s) of the URI used to identify the 510 resource. If the media type remains unknown, the recipient SHOULD 511 treat it as type "application/octet-stream". 513 3.2.2. Entity Length 515 The entity-length of a message is the length of the message-body 516 before any transfer-codings have been applied. Section 4.4 of 517 [Part1] defines how the transfer-length of a message-body is 518 determined. 520 4. Content Negotiation 522 Most HTTP responses include an entity which contains information for 523 interpretation by a human user. Naturally, it is desirable to supply 524 the user with the "best available" entity corresponding to the 525 request. Unfortunately for servers and caches, not all users have 526 the same preferences for what is "best," and not all user agents are 527 equally capable of rendering all entity types. For that reason, HTTP 528 has provisions for several mechanisms for "content negotiation" -- 529 the process of selecting the best representation for a given response 530 when there are multiple representations available. 532 Note: This is not called "format negotiation" because the 533 alternate representations may be of the same media type, but use 534 different capabilities of that type, be in different languages, 535 etc. 537 Any response containing an entity-body MAY be subject to negotiation, 538 including error responses. 540 There are two kinds of content negotiation which are possible in 541 HTTP: server-driven and agent-driven negotiation. These two kinds of 542 negotiation are orthogonal and thus may be used separately or in 543 combination. One method of combination, referred to as transparent 544 negotiation, occurs when a cache uses the agent-driven negotiation 545 information provided by the origin server in order to provide server- 546 driven negotiation for subsequent requests. 548 4.1. Server-driven Negotiation 550 If the selection of the best representation for a response is made by 551 an algorithm located at the server, it is called server-driven 552 negotiation. Selection is based on the available representations of 553 the response (the dimensions over which it can vary; e.g. language, 554 content-coding, etc.) and the contents of particular header fields in 555 the request message or on other information pertaining to the request 556 (such as the network address of the client). 558 Server-driven negotiation is advantageous when the algorithm for 559 selecting from among the available representations is difficult to 560 describe to the user agent, or when the server desires to send its 561 "best guess" to the client along with the first response (hoping to 562 avoid the round-trip delay of a subsequent request if the "best 563 guess" is good enough for the user). In order to improve the 564 server's guess, the user agent MAY include request header fields 565 (Accept, Accept-Language, Accept-Encoding, etc.) which describe its 566 preferences for such a response. 568 Server-driven negotiation has disadvantages: 570 1. It is impossible for the server to accurately determine what 571 might be "best" for any given user, since that would require 572 complete knowledge of both the capabilities of the user agent and 573 the intended use for the response (e.g., does the user want to 574 view it on screen or print it on paper?). 576 2. Having the user agent describe its capabilities in every request 577 can be both very inefficient (given that only a small percentage 578 of responses have multiple representations) and a potential 579 violation of the user's privacy. 581 3. It complicates the implementation of an origin server and the 582 algorithms for generating responses to a request. 584 4. It may limit a public cache's ability to use the same response 585 for multiple user's requests. 587 HTTP/1.1 includes the following request-header fields for enabling 588 server-driven negotiation through description of user agent 589 capabilities and user preferences: Accept (Section 5.1), Accept- 590 Charset (Section 5.2), Accept-Encoding (Section 5.3), Accept-Language 591 (Section 5.4), and User-Agent (Section 10.9 of [Part2]). However, an 592 origin server is not limited to these dimensions and MAY vary the 593 response based on any aspect of the request, including information 594 outside the request-header fields or within extension header fields 595 not defined by this specification. 597 The Vary header field (Section 15.5 of [Part6]) can be used to 598 express the parameters the server uses to select a representation 599 that is subject to server-driven negotiation. 601 4.2. Agent-driven Negotiation 603 With agent-driven negotiation, selection of the best representation 604 for a response is performed by the user agent after receiving an 605 initial response from the origin server. Selection is based on a 606 list of the available representations of the response included within 607 the header fields or entity-body of the initial response, with each 608 representation identified by its own URI. Selection from among the 609 representations may be performed automatically (if the user agent is 610 capable of doing so) or manually by the user selecting from a 611 generated (possibly hypertext) menu. 613 Agent-driven negotiation is advantageous when the response would vary 614 over commonly-used dimensions (such as type, language, or encoding), 615 when the origin server is unable to determine a user agent's 616 capabilities from examining the request, and generally when public 617 caches are used to distribute server load and reduce network usage. 619 Agent-driven negotiation suffers from the disadvantage of needing a 620 second request to obtain the best alternate representation. This 621 second request is only efficient when caching is used. In addition, 622 this specification does not define any mechanism for supporting 623 automatic selection, though it also does not prevent any such 624 mechanism from being developed as an extension and used within 625 HTTP/1.1. 627 HTTP/1.1 defines the 300 (Multiple Choices) and 406 (Not Acceptable) 628 status codes for enabling agent-driven negotiation when the server is 629 unwilling or unable to provide a varying response using server-driven 630 negotiation. 632 4.3. Transparent Negotiation 634 Transparent negotiation is a combination of both server-driven and 635 agent-driven negotiation. When a cache is supplied with a form of 636 the list of available representations of the response (as in agent- 637 driven negotiation) and the dimensions of variance are completely 638 understood by the cache, then the cache becomes capable of performing 639 server-driven negotiation on behalf of the origin server for 640 subsequent requests on that resource. 642 Transparent negotiation has the advantage of distributing the 643 negotiation work that would otherwise be required of the origin 644 server and also removing the second request delay of agent-driven 645 negotiation when the cache is able to correctly guess the right 646 response. 648 This specification does not define any mechanism for transparent 649 negotiation, though it also does not prevent any such mechanism from 650 being developed as an extension that could be used within HTTP/1.1. 652 5. Header Field Definitions 654 This section defines the syntax and semantics of HTTP/1.1 header 655 fields related to the payload of messages. 657 For entity-header fields, both sender and recipient refer to either 658 the client or the server, depending on who sends and who receives the 659 entity. 661 5.1. Accept 663 The Accept request-header field can be used to specify certain media 664 types which are acceptable for the response. Accept headers can be 665 used to indicate that the request is specifically limited to a small 666 set of desired types, as in the case of a request for an in-line 667 image. 669 Accept = "Accept" ":" 670 #( media-range [ accept-params ] ) 672 media-range = ( "*/*" 673 | ( type "/" "*" ) 674 | ( type "/" subtype ) 675 ) *( ";" parameter ) 676 accept-params = ";" "q" "=" qvalue *( accept-extension ) 677 accept-extension = ";" token [ "=" ( token | quoted-string ) ] 679 The asterisk "*" character is used to group media types into ranges, 680 with "*/*" indicating all media types and "type/*" indicating all 681 subtypes of that type. The media-range MAY include media type 682 parameters that are applicable to that range. 684 Each media-range MAY be followed by one or more accept-params, 685 beginning with the "q" parameter for indicating a relative quality 686 factor. The first "q" parameter (if any) separates the media-range 687 parameter(s) from the accept-params. Quality factors allow the user 688 or user agent to indicate the relative degree of preference for that 689 media-range, using the qvalue scale from 0 to 1 (Section 2.4). The 690 default value is q=1. 692 Note: Use of the "q" parameter name to separate media type 693 parameters from Accept extension parameters is due to historical 694 practice. Although this prevents any media type parameter named 695 "q" from being used with a media range, such an event is believed 696 to be unlikely given the lack of any "q" parameters in the IANA 697 media type registry and the rare usage of any media type 698 parameters in Accept. Future media types are discouraged from 699 registering any parameter named "q". 701 The example 703 Accept: audio/*; q=0.2, audio/basic 705 SHOULD be interpreted as "I prefer audio/basic, but send me any audio 706 type if it is the best available after an 80% mark-down in quality." 708 If no Accept header field is present, then it is assumed that the 709 client accepts all media types. If an Accept header field is 710 present, and if the server cannot send a response which is acceptable 711 according to the combined Accept field value, then the server SHOULD 712 send a 406 (Not Acceptable) response. 714 A more elaborate example is 716 Accept: text/plain; q=0.5, text/html, 717 text/x-dvi; q=0.8, text/x-c 719 Verbally, this would be interpreted as "text/html and text/x-c are 720 the preferred media types, but if they do not exist, then send the 721 text/x-dvi entity, and if that does not exist, send the text/plain 722 entity." 724 Media ranges can be overridden by more specific media ranges or 725 specific media types. If more than one media range applies to a 726 given type, the most specific reference has precedence. For example, 728 Accept: text/*, text/html, text/html;level=1, */* 730 have the following precedence: 732 1) text/html;level=1 733 2) text/html 734 3) text/* 735 4) */* 737 The media type quality factor associated with a given type is 738 determined by finding the media range with the highest precedence 739 which matches that type. For example, 741 Accept: text/*;q=0.3, text/html;q=0.7, text/html;level=1, 742 text/html;level=2;q=0.4, */*;q=0.5 744 would cause the following values to be associated: 746 text/html;level=1 = 1 747 text/html = 0.7 748 text/plain = 0.3 749 image/jpeg = 0.5 750 text/html;level=2 = 0.4 751 text/html;level=3 = 0.7 753 Note: A user agent might be provided with a default set of quality 754 values for certain media ranges. However, unless the user agent is a 755 closed system which cannot interact with other rendering agents, this 756 default set ought to be configurable by the user. 758 5.2. Accept-Charset 760 The Accept-Charset request-header field can be used to indicate what 761 character sets are acceptable for the response. This field allows 762 clients capable of understanding more comprehensive or special- 763 purpose character sets to signal that capability to a server which is 764 capable of representing documents in those character sets. 766 Accept-Charset = "Accept-Charset" ":" 767 1#( ( charset | "*" ) [ ";" "q" "=" qvalue ] ) 769 Character set values are described in Section 2.1. Each charset MAY 770 be given an associated quality value which represents the user's 771 preference for that charset. The default value is q=1. An example 772 is 774 Accept-Charset: iso-8859-5, unicode-1-1;q=0.8 776 The special value "*", if present in the Accept-Charset field, 777 matches every character set (including ISO-8859-1) which is not 778 mentioned elsewhere in the Accept-Charset field. If no "*" is 779 present in an Accept-Charset field, then all character sets not 780 explicitly mentioned get a quality value of 0, except for ISO-8859-1, 781 which gets a quality value of 1 if not explicitly mentioned. 783 If no Accept-Charset header is present, the default is that any 784 character set is acceptable. If an Accept-Charset header is present, 785 and if the server cannot send a response which is acceptable 786 according to the Accept-Charset header, then the server SHOULD send 787 an error response with the 406 (Not Acceptable) status code, though 788 the sending of an unacceptable response is also allowed. 790 5.3. Accept-Encoding 792 The Accept-Encoding request-header field is similar to Accept, but 793 restricts the content-codings (Section 2.2) that are acceptable in 794 the response. 796 Accept-Encoding = "Accept-Encoding" ":" 797 #( codings [ ";" "q" "=" qvalue ] ) 798 codings = ( content-coding | "*" ) 800 Examples of its use are: 802 Accept-Encoding: compress, gzip 803 Accept-Encoding: 804 Accept-Encoding: * 805 Accept-Encoding: compress;q=0.5, gzip;q=1.0 806 Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0 808 A server tests whether a content-coding is acceptable, according to 809 an Accept-Encoding field, using these rules: 811 1. If the content-coding is one of the content-codings listed in the 812 Accept-Encoding field, then it is acceptable, unless it is 813 accompanied by a qvalue of 0. (As defined in Section 2.4, a 814 qvalue of 0 means "not acceptable.") 816 2. The special "*" symbol in an Accept-Encoding field matches any 817 available content-coding not explicitly listed in the header 818 field. 820 3. If multiple content-codings are acceptable, then the acceptable 821 content-coding with the highest non-zero qvalue is preferred. 823 4. The "identity" content-coding is always acceptable, unless 824 specifically refused because the Accept-Encoding field includes 825 "identity;q=0", or because the field includes "*;q=0" and does 826 not explicitly include the "identity" content-coding. If the 827 Accept-Encoding field-value is empty, then only the "identity" 828 encoding is acceptable. 830 If an Accept-Encoding field is present in a request, and if the 831 server cannot send a response which is acceptable according to the 832 Accept-Encoding header, then the server SHOULD send an error response 833 with the 406 (Not Acceptable) status code. 835 If no Accept-Encoding field is present in a request, the server MAY 836 assume that the client will accept any content coding. In this case, 837 if "identity" is one of the available content-codings, then the 838 server SHOULD use the "identity" content-coding, unless it has 839 additional information that a different content-coding is meaningful 840 to the client. 842 Note: If the request does not include an Accept-Encoding field, 843 and if the "identity" content-coding is unavailable, then content- 844 codings commonly understood by HTTP/1.0 clients (i.e., "gzip" and 845 "compress") are preferred; some older clients improperly display 846 messages sent with other content-codings. The server might also 847 make this decision based on information about the particular user- 848 agent or client. 850 Note: Most HTTP/1.0 applications do not recognize or obey qvalues 851 associated with content-codings. This means that qvalues will not 852 work and are not permitted with x-gzip or x-compress. 854 5.4. Accept-Language 856 The Accept-Language request-header field is similar to Accept, but 857 restricts the set of natural languages that are preferred as a 858 response to the request. Language tags are defined in Section 2.5. 860 Accept-Language = "Accept-Language" ":" 861 1#( language-range [ ";" "q" "=" qvalue ] ) 862 language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" ) 864 Each language-range MAY be given an associated quality value which 865 represents an estimate of the user's preference for the languages 866 specified by that range. The quality value defaults to "q=1". For 867 example, 869 Accept-Language: da, en-gb;q=0.8, en;q=0.7 871 would mean: "I prefer Danish, but will accept British English and 872 other types of English." A language-range matches a language-tag if 873 it exactly equals the tag, or if it exactly equals a prefix of the 874 tag such that the first tag character following the prefix is "-". 875 The special range "*", if present in the Accept-Language field, 876 matches every tag not matched by any other range present in the 877 Accept-Language field. 879 Note: This use of a prefix matching rule does not imply that 880 language tags are assigned to languages in such a way that it is 881 always true that if a user understands a language with a certain 882 tag, then this user will also understand all languages with tags 883 for which this tag is a prefix. The prefix rule simply allows the 884 use of prefix tags if this is the case. 886 The language quality factor assigned to a language-tag by the Accept- 887 Language field is the quality value of the longest language-range in 888 the field that matches the language-tag. If no language-range in the 889 field matches the tag, the language quality factor assigned is 0. If 890 no Accept-Language header is present in the request, the server 891 SHOULD assume that all languages are equally acceptable. If an 892 Accept-Language header is present, then all languages which are 893 assigned a quality factor greater than 0 are acceptable. 895 It might be contrary to the privacy expectations of the user to send 896 an Accept-Language header with the complete linguistic preferences of 897 the user in every request. For a discussion of this issue, see 898 Section 7.1. 900 As intelligibility is highly dependent on the individual user, it is 901 recommended that client applications make the choice of linguistic 902 preference available to the user. If the choice is not made 903 available, then the Accept-Language header field MUST NOT be given in 904 the request. 906 Note: When making the choice of linguistic preference available to 907 the user, we remind implementors of the fact that users are not 908 familiar with the details of language matching as described above, 909 and should provide appropriate guidance. As an example, users 910 might assume that on selecting "en-gb", they will be served any 911 kind of English document if British English is not available. A 912 user agent might suggest in such a case to add "en" to get the 913 best matching behavior. 915 5.5. Content-Encoding 917 The Content-Encoding entity-header field is used as a modifier to the 918 media-type. When present, its value indicates what additional 919 content codings have been applied to the entity-body, and thus what 920 decoding mechanisms must be applied in order to obtain the media-type 921 referenced by the Content-Type header field. Content-Encoding is 922 primarily used to allow a document to be compressed without losing 923 the identity of its underlying media type. 925 Content-Encoding = "Content-Encoding" ":" 1#content-coding 927 Content codings are defined in Section 2.2. An example of its use is 929 Content-Encoding: gzip 931 The content-coding is a characteristic of the entity identified by 932 the Request-URI. Typically, the entity-body is stored with this 933 encoding and is only decoded before rendering or analogous usage. 934 However, a non-transparent proxy MAY modify the content-coding if the 935 new coding is known to be acceptable to the recipient, unless the 936 "no-transform" cache-control directive is present in the message. 938 If the content-coding of an entity is not "identity", then the 939 response MUST include a Content-Encoding entity-header (Section 5.5) 940 that lists the non-identity content-coding(s) used. 942 If the content-coding of an entity in a request message is not 943 acceptable to the origin server, the server SHOULD respond with a 944 status code of 415 (Unsupported Media Type). 946 If multiple encodings have been applied to an entity, the content 947 codings MUST be listed in the order in which they were applied. 948 Additional information about the encoding parameters MAY be provided 949 by other entity-header fields not defined by this specification. 951 5.6. Content-Language 953 The Content-Language entity-header field describes the natural 954 language(s) of the intended audience for the enclosed entity. Note 955 that this might not be equivalent to all the languages used within 956 the entity-body. 958 Content-Language = "Content-Language" ":" 1#language-tag 960 Language tags are defined in Section 2.5. The primary purpose of 961 Content-Language is to allow a user to identify and differentiate 962 entities according to the user's own preferred language. Thus, if 963 the body content is intended only for a Danish-literate audience, the 964 appropriate field is 966 Content-Language: da 968 If no Content-Language is specified, the default is that the content 969 is intended for all language audiences. This might mean that the 970 sender does not consider it to be specific to any natural language, 971 or that the sender does not know for which language it is intended. 973 Multiple languages MAY be listed for content that is intended for 974 multiple audiences. For example, a rendition of the "Treaty of 975 Waitangi," presented simultaneously in the original Maori and English 976 versions, would call for 978 Content-Language: mi, en 980 However, just because multiple languages are present within an entity 981 does not mean that it is intended for multiple linguistic audiences. 982 An example would be a beginner's language primer, such as "A First 983 Lesson in Latin," which is clearly intended to be used by an English- 984 literate audience. In this case, the Content-Language would properly 985 only include "en". 987 Content-Language MAY be applied to any media type -- it is not 988 limited to textual documents. 990 5.7. Content-Location 992 The Content-Location entity-header field MAY be used to supply the 993 resource location for the entity enclosed in the message when that 994 entity is accessible from a location separate from the requested 995 resource's URI. A server SHOULD provide a Content-Location for the 996 variant corresponding to the response entity; especially in the case 997 where a resource has multiple entities associated with it, and those 998 entities actually have separate locations by which they might be 999 individually accessed, the server SHOULD provide a Content-Location 1000 for the particular variant which is returned. 1002 Content-Location = "Content-Location" ":" 1003 ( absoluteURI | relativeURI ) 1005 The value of Content-Location also defines the base URI for the 1006 entity. 1008 The Content-Location value is not a replacement for the original 1009 requested URI; it is only a statement of the location of the resource 1010 corresponding to this particular entity at the time of the request. 1011 Future requests MAY specify the Content-Location URI as the request- 1012 URI if the desire is to identify the source of that particular 1013 entity. 1015 A cache cannot assume that an entity with a Content-Location 1016 different from the URI used to retrieve it can be used to respond to 1017 later requests on that Content-Location URI. However, the Content- 1018 Location can be used to differentiate between multiple entities 1019 retrieved from a single requested resource, as described in Section 7 1020 of [Part6]. 1022 If the Content-Location is a relative URI, the relative URI is 1023 interpreted relative to the Request-URI. 1025 The meaning of the Content-Location header in PUT or POST requests is 1026 undefined; servers are free to ignore it in those cases. 1028 5.8. Content-MD5 1030 The Content-MD5 entity-header field, as defined in [RFC1864], is an 1031 MD5 digest of the entity-body for the purpose of providing an end-to- 1032 end message integrity check (MIC) of the entity-body. (Note: a MIC 1033 is good for detecting accidental modification of the entity-body in 1034 transit, but is not proof against malicious attacks.) 1036 Content-MD5 = "Content-MD5" ":" md5-digest 1037 md5-digest = 1039 The Content-MD5 header field MAY be generated by an origin server or 1040 client to function as an integrity check of the entity-body. Only 1041 origin servers or clients MAY generate the Content-MD5 header field; 1042 proxies and gateways MUST NOT generate it, as this would defeat its 1043 value as an end-to-end integrity check. Any recipient of the entity- 1044 body, including gateways and proxies, MAY check that the digest value 1045 in this header field matches that of the entity-body as received. 1047 The MD5 digest is computed based on the content of the entity-body, 1048 including any content-coding that has been applied, but not including 1049 any transfer-encoding applied to the message-body. If the message is 1050 received with a transfer-encoding, that encoding MUST be removed 1051 prior to checking the Content-MD5 value against the received entity. 1053 This has the result that the digest is computed on the octets of the 1054 entity-body exactly as, and in the order that, they would be sent if 1055 no transfer-encoding were being applied. 1057 HTTP extends RFC 1864 to permit the digest to be computed for MIME 1058 composite media-types (e.g., multipart/* and message/rfc822), but 1059 this does not change how the digest is computed as defined in the 1060 preceding paragraph. 1062 There are several consequences of this. The entity-body for 1063 composite types MAY contain many body-parts, each with its own MIME 1064 and HTTP headers (including Content-MD5, Content-Transfer-Encoding, 1065 and Content-Encoding headers). If a body-part has a Content- 1066 Transfer-Encoding or Content-Encoding header, it is assumed that the 1067 content of the body-part has had the encoding applied, and the body- 1068 part is included in the Content-MD5 digest as is -- i.e., after the 1069 application. The Transfer-Encoding header field is not allowed 1070 within body-parts. 1072 Conversion of all line breaks to CRLF MUST NOT be done before 1073 computing or checking the digest: the line break convention used in 1074 the text actually transmitted MUST be left unaltered when computing 1075 the digest. 1077 Note: while the definition of Content-MD5 is exactly the same for 1078 HTTP as in RFC 1864 for MIME entity-bodies, there are several ways 1079 in which the application of Content-MD5 to HTTP entity-bodies 1080 differs from its application to MIME entity-bodies. One is that 1081 HTTP, unlike MIME, does not use Content-Transfer-Encoding, and 1082 does use Transfer-Encoding and Content-Encoding. Another is that 1083 HTTP more frequently uses binary content types than MIME, so it is 1084 worth noting that, in such cases, the byte order used to compute 1085 the digest is the transmission byte order defined for the type. 1086 Lastly, HTTP allows transmission of text types with any of several 1087 line break conventions and not just the canonical form using CRLF. 1089 5.9. Content-Type 1091 The Content-Type entity-header field indicates the media type of the 1092 entity-body sent to the recipient or, in the case of the HEAD method, 1093 the media type that would have been sent had the request been a GET. 1095 Content-Type = "Content-Type" ":" media-type 1097 Media types are defined in Section 2.3. An example of the field is 1099 Content-Type: text/html; charset=ISO-8859-4 1101 Further discussion of methods for identifying the media type of an 1102 entity is provided in Section 3.2.1. 1104 6. IANA Considerations 1106 TBD. 1108 7. Security Considerations 1110 This section is meant to inform application developers, information 1111 providers, and users of the security limitations in HTTP/1.1 as 1112 described by this document. The discussion does not include 1113 definitive solutions to the problems revealed, though it does make 1114 some suggestions for reducing security risks. 1116 7.1. Privacy Issues Connected to Accept Headers 1118 Accept request-headers can reveal information about the user to all 1119 servers which are accessed. The Accept-Language header in particular 1120 can reveal information the user would consider to be of a private 1121 nature, because the understanding of particular languages is often 1122 strongly correlated to the membership of a particular ethnic group. 1123 User agents which offer the option to configure the contents of an 1124 Accept-Language header to be sent in every request are strongly 1125 encouraged to let the configuration process include a message which 1126 makes the user aware of the loss of privacy involved. 1128 An approach that limits the loss of privacy would be for a user agent 1129 to omit the sending of Accept-Language headers by default, and to ask 1130 the user whether or not to start sending Accept-Language headers to a 1131 server if it detects, by looking for any Vary response-header fields 1132 generated by the server, that such sending could improve the quality 1133 of service. 1135 Elaborate user-customized accept header fields sent in every request, 1136 in particular if these include quality values, can be used by servers 1137 as relatively reliable and long-lived user identifiers. Such user 1138 identifiers would allow content providers to do click-trail tracking, 1139 and would allow collaborating content providers to match cross-server 1140 click-trails or form submissions of individual users. Note that for 1141 many users not behind a proxy, the network address of the host 1142 running the user agent will also serve as a long-lived user 1143 identifier. In environments where proxies are used to enhance 1144 privacy, user agents ought to be conservative in offering accept 1145 header configuration options to end users. As an extreme privacy 1146 measure, proxies could filter the accept headers in relayed requests. 1147 General purpose user agents which provide a high degree of header 1148 configurability SHOULD warn users about the loss of privacy which can 1149 be involved. 1151 7.2. Content-Disposition Issues 1153 [RFC1806], from which the often implemented Content-Disposition (see 1154 Appendix B.1) header in HTTP is derived, has a number of very serious 1155 security considerations. Content-Disposition is not part of the HTTP 1156 standard, but since it is widely implemented, we are documenting its 1157 use and risks for implementors. See [RFC2183] (which updates 1158 [RFC1806]) for details. 1160 8. Acknowledgments 1162 9. References 1164 9.1. Normative References 1166 [ISO-8859-1] 1167 International Organization for Standardization, 1168 "Information technology -- 8-bit single-byte coded graphic 1169 character sets -- Part 1: Latin alphabet No. 1", ISO/ 1170 IEC 8859-1:1998, 1998. 1172 [Part1] Fielding, R., Ed., Gettys, J., Mogul, J., Frystyk, H., 1173 Masinter, L., Leach, P., Berners-Lee, T., Lafon, Y., Ed., 1174 and J. Reschke, Ed., "HTTP/1.1, part 1: URIs, Connections, 1175 and Message Parsing", draft-ietf-httpbis-p1-messaging-01 1176 (work in progress), January 2008. 1178 [Part2] Fielding, R., Ed., Gettys, J., Mogul, J., Frystyk, H., 1179 Masinter, L., Leach, P., Berners-Lee, T., Lafon, Y., Ed., 1180 and J. Reschke, Ed., "HTTP/1.1, part 2: Message 1181 Semantics", draft-ietf-httpbis-p2-semantics-01 (work in 1182 progress), January 2008. 1184 [Part4] Fielding, R., Ed., Gettys, J., Mogul, J., Frystyk, H., 1185 Masinter, L., Leach, P., Berners-Lee, T., Lafon, Y., Ed., 1186 and J. Reschke, Ed., "HTTP/1.1, part 4: Conditional 1187 Requests", draft-ietf-httpbis-p4-conditional-01 (work in 1188 progress), January 2008. 1190 [Part5] Fielding, R., Ed., Gettys, J., Mogul, J., Frystyk, H., 1191 Masinter, L., Leach, P., Berners-Lee, T., Lafon, Y., Ed., 1192 and J. Reschke, Ed., "HTTP/1.1, part 5: Range Requests and 1193 Partial Responses", draft-ietf-httpbis-p5-range-01 (work 1194 in progress), January 2008. 1196 [Part6] Fielding, R., Ed., Gettys, J., Mogul, J., Frystyk, H., 1197 Masinter, L., Leach, P., Berners-Lee, T., Lafon, Y., Ed., 1198 and J. Reschke, Ed., "HTTP/1.1, part 6: Caching", 1199 draft-ietf-httpbis-p6-cache-01 (work in progress), 1200 January 2008. 1202 [RFC1766] Alvestrand, H., "Tags for the Identification of 1203 Languages", RFC 1766, March 1995. 1205 [RFC1864] Myers, J. and M. Rose, "The Content-MD5 Header Field", 1206 RFC 1864, October 1995. 1208 [RFC1950] Deutsch, L. and J-L. Gailly, "ZLIB Compressed Data Format 1209 Specification version 3.3", RFC 1950, May 1996. 1211 RFC1950 is an Informational RFC, thus it may be less 1212 stable than this specification. On the other hand, this 1213 downward reference was present since [RFC2068] (published 1214 in 1997), therefore it is unlikely to cause problems in 1215 practice. 1217 [RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification 1218 version 1.3", RFC 1951, May 1996. 1220 RFC1951 is an Informational RFC, thus it may be less 1221 stable than this specification. On the other hand, this 1222 downward reference was present since [RFC2068] (published 1223 in 1997), therefore it is unlikely to cause problems in 1224 practice. 1226 [RFC1952] Deutsch, P., Gailly, J-L., Adler, M., Deutsch, L., and G. 1227 Randers-Pehrson, "GZIP file format specification version 1228 4.3", RFC 1952, May 1996. 1230 RFC1952 is an Informational RFC, thus it may be less 1231 stable than this specification. On the other hand, this 1232 downward reference was present since [RFC2068] (published 1233 in 1997), therefore it is unlikely to cause problems in 1234 practice. 1236 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1237 Extensions (MIME) Part One: Format of Internet Message 1238 Bodies", RFC 2045, November 1996. 1240 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1241 Extensions (MIME) Part Two: Media Types", RFC 2046, 1242 November 1996. 1244 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1245 Requirement Levels", BCP 14, RFC 2119, March 1997. 1247 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 1248 Registration Procedures", BCP 13, RFC 4288, December 2005. 1250 9.2. Informative References 1252 [RFC1806] Troost, R. and S. Dorner, "Communicating Presentation 1253 Information in Internet Messages: The Content-Disposition 1254 Header", RFC 1806, June 1995. 1256 [RFC1945] Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext 1257 Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996. 1259 [RFC2049] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1260 Extensions (MIME) Part Five: Conformance Criteria and 1261 Examples", RFC 2049, November 1996. 1263 [RFC2068] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., and T. 1264 Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", 1265 RFC 2068, January 1997. 1267 [RFC2076] Palme, J., "Common Internet Message Headers", RFC 2076, 1268 February 1997. 1270 [RFC2183] Troost, R., Dorner, S., and K. Moore, "Communicating 1271 Presentation Information in Internet Messages: The 1272 Content-Disposition Header Field", RFC 2183, August 1997. 1274 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 1275 Languages", BCP 18, RFC 2277, January 1998. 1277 [RFC2388] Masinter, L., "Returning Values from Forms: multipart/ 1278 form-data", RFC 2388, August 1998. 1280 [RFC2557] Palme, F., Hopmann, A., Shelness, N., and E. Stefferud, 1281 "MIME Encapsulation of Aggregate Documents, such as HTML 1282 (MHTML)", RFC 2557, March 1999. 1284 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1285 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1286 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1288 [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, 1289 April 2001. 1291 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1292 10646", RFC 3629, STD 63, November 2003. 1294 Appendix A. Differences Between HTTP Entities and RFC 2045 Entities 1296 HTTP/1.1 uses many of the constructs defined for Internet Mail 1297 ([RFC2822]) and the Multipurpose Internet Mail Extensions (MIME 1298 [RFC2045]) to allow entities to be transmitted in an open variety of 1299 representations and with extensible mechanisms. However, RFC 2045 1300 discusses mail, and HTTP has a few features that are different from 1301 those described in RFC 2045. These differences were carefully chosen 1302 to optimize performance over binary connections, to allow greater 1303 freedom in the use of new media types, to make date comparisons 1304 easier, and to acknowledge the practice of some early HTTP servers 1305 and clients. 1307 This appendix describes specific areas where HTTP differs from RFC 1308 2045. Proxies and gateways to strict MIME environments SHOULD be 1309 aware of these differences and provide the appropriate conversions 1310 where necessary. Proxies and gateways from MIME environments to HTTP 1311 also need to be aware of the differences because some conversions 1312 might be required. 1314 A.1. MIME-Version 1316 HTTP is not a MIME-compliant protocol. However, HTTP/1.1 messages 1317 MAY include a single MIME-Version general-header field to indicate 1318 what version of the MIME protocol was used to construct the message. 1319 Use of the MIME-Version header field indicates that the message is in 1320 full compliance with the MIME protocol (as defined in [RFC2045]). 1321 Proxies/gateways are responsible for ensuring full compliance (where 1322 possible) when exporting HTTP messages to strict MIME environments. 1324 MIME-Version = "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 1326 MIME version "1.0" is the default for use in HTTP/1.1. However, 1327 HTTP/1.1 message parsing and semantics are defined by this document 1328 and not the MIME specification. 1330 A.2. Conversion to Canonical Form 1332 [RFC2045] requires that an Internet mail entity be converted to 1333 canonical form prior to being transferred, as described in Section 4 1334 of [RFC2049]. Section 2.3.1 of this document describes the forms 1335 allowed for subtypes of the "text" media type when transmitted over 1336 HTTP. [RFC2046] requires that content with a type of "text" 1337 represent line breaks as CRLF and forbids the use of CR or LF outside 1338 of line break sequences. HTTP allows CRLF, bare CR, and bare LF to 1339 indicate a line break within text content when a message is 1340 transmitted over HTTP. 1342 Where it is possible, a proxy or gateway from HTTP to a strict MIME 1343 environment SHOULD translate all line breaks within the text media 1344 types described in Section 2.3.1 of this document to the RFC 2049 1345 canonical form of CRLF. Note, however, that this might be 1346 complicated by the presence of a Content-Encoding and by the fact 1347 that HTTP allows the use of some character sets which do not use 1348 octets 13 and 10 to represent CR and LF, as is the case for some 1349 multi-byte character sets. 1351 Implementors should note that conversion will break any cryptographic 1352 checksums applied to the original content unless the original content 1353 is already in canonical form. Therefore, the canonical form is 1354 recommended for any content that uses such checksums in HTTP. 1356 A.3. Introduction of Content-Encoding 1358 RFC 2045 does not include any concept equivalent to HTTP/1.1's 1359 Content-Encoding header field. Since this acts as a modifier on the 1360 media type, proxies and gateways from HTTP to MIME-compliant 1361 protocols MUST either change the value of the Content-Type header 1362 field or decode the entity-body before forwarding the message. (Some 1363 experimental applications of Content-Type for Internet mail have used 1364 a media-type parameter of ";conversions=" to perform 1365 a function equivalent to Content-Encoding. However, this parameter 1366 is not part of RFC 2045). 1368 A.4. No Content-Transfer-Encoding 1370 HTTP does not use the Content-Transfer-Encoding field of RFC 2045. 1371 Proxies and gateways from MIME-compliant protocols to HTTP MUST 1372 remove any Content-Transfer-Encoding prior to delivering the response 1373 message to an HTTP client. 1375 Proxies and gateways from HTTP to MIME-compliant protocols are 1376 responsible for ensuring that the message is in the correct format 1377 and encoding for safe transport on that protocol, where "safe 1378 transport" is defined by the limitations of the protocol being used. 1379 Such a proxy or gateway SHOULD label the data with an appropriate 1380 Content-Transfer-Encoding if doing so will improve the likelihood of 1381 safe transport over the destination protocol. 1383 A.5. Introduction of Transfer-Encoding 1385 HTTP/1.1 introduces the Transfer-Encoding header field (Section 8.7 1386 of [Part1]). Proxies/gateways MUST remove any transfer-coding prior 1387 to forwarding a message via a MIME-compliant protocol. 1389 A.6. MHTML and Line Length Limitations 1391 HTTP implementations which share code with MHTML [RFC2557] 1392 implementations need to be aware of MIME line length limitations. 1393 Since HTTP does not have this limitation, HTTP does not fold long 1394 lines. MHTML messages being transported by HTTP follow all 1395 conventions of MHTML, including line length limitations and folding, 1396 canonicalization, etc., since HTTP transports all message-bodies as 1397 payload (see Section 2.3.2) and does not interpret the content or any 1398 MIME header lines that might be contained therein. 1400 Appendix B. Additional Features 1402 [RFC1945] and [RFC2068] document protocol elements used by some 1403 existing HTTP implementations, but not consistently and correctly 1404 across most HTTP/1.1 applications. Implementors are advised to be 1405 aware of these features, but cannot rely upon their presence in, or 1406 interoperability with, other HTTP/1.1 applications. Some of these 1407 describe proposed experimental features, and some describe features 1408 that experimental deployment found lacking that are now addressed in 1409 the base HTTP/1.1 specification. 1411 A number of other headers, such as Content-Disposition and Title, 1412 from SMTP and MIME are also often implemented (see [RFC2076]). 1414 B.1. Content-Disposition 1416 The Content-Disposition response-header field has been proposed as a 1417 means for the origin server to suggest a default filename if the user 1418 requests that the content is saved to a file. This usage is derived 1419 from the definition of Content-Disposition in [RFC1806]. 1421 content-disposition = "Content-Disposition" ":" 1422 disposition-type *( ";" disposition-parm ) 1423 disposition-type = "attachment" | disp-extension-token 1424 disposition-parm = filename-parm | disp-extension-parm 1425 filename-parm = "filename" "=" quoted-string 1426 disp-extension-token = token 1427 disp-extension-parm = token "=" ( token | quoted-string ) 1429 An example is 1431 Content-Disposition: attachment; filename="fname.ext" 1433 The receiving user agent SHOULD NOT respect any directory path 1434 information present in the filename-parm parameter, which is the only 1435 parameter believed to apply to HTTP implementations at this time. 1436 The filename SHOULD be treated as a terminal component only. 1438 If this header is used in a response with the application/ 1439 octet-stream content-type, the implied suggestion is that the user 1440 agent should not display the response, but directly enter a `save 1441 response as...' dialog. 1443 See Section 7.2 for Content-Disposition security issues. 1445 Appendix C. Compatibility with Previous Versions 1447 C.1. Changes from RFC 2068 1449 Transfer-coding and message lengths all interact in ways that 1450 required fixing exactly when chunked encoding is used (to allow for 1451 transfer encoding that may not be self delimiting); it was important 1452 to straighten out exactly how message lengths are computed. 1453 (Section 3.2.2, see also [Part1], [Part5] and [Part6]). 1455 Charset wildcarding is introduced to avoid explosion of character set 1456 names in accept headers. (Section 5.2) 1458 Content-Base was deleted from the specification: it was not 1459 implemented widely, and there is no simple, safe way to introduce it 1460 without a robust extension mechanism. In addition, it is used in a 1461 similar, but not identical fashion in MHTML [RFC2557]. 1463 A content-coding of "identity" was introduced, to solve problems 1464 discovered in caching. (Section 2.2) 1466 Quality Values of zero should indicate that "I don't want something" 1467 to allow clients to refuse a representation. (Section 2.4) 1468 The Alternates, Content-Version, Derived-From, Link, URI, Public and 1469 Content-Base header fields were defined in previous versions of this 1470 specification, but not commonly implemented. See [RFC2068]. 1472 C.2. Changes from RFC 2616 1474 Clarify contexts that charset is used in. (Section 2.1) 1476 Remove reference to non-existant identity transfer-coding value 1477 tokens. (Appendix A.4) 1479 Appendix D. Change Log (to be removed by RFC Editor before publication) 1481 D.1. Since RFC2616 1483 Extracted relevant partitions from [RFC2616]. 1485 D.2. Since draft-ietf-httpbis-p3-payload-00 1487 Closed issues: 1489 o : "Media Type 1490 Registrations" () 1492 o : 1493 "Clarification regarding quoting of charset values" 1494 () 1496 o : "Remove 1497 'identity' token references" 1498 () 1500 o : "Accept- 1501 Encoding BNF" 1503 o : "Normative 1504 and Informative references" 1506 o : "RFC1700 1507 references" 1509 o : 1510 "Informative references" 1512 o : 1513 "ISO-8859-1 Reference" 1515 o : "Encoding 1516 References Normative" 1518 o : "Normative 1519 up-to-date references" 1521 Index 1523 A 1524 Accept header 16 1525 Accept-Charset header 18 1526 Accept-Encoding header 18 1527 Accept-Language header 20 1528 Alternates header 33 1530 C 1531 compress 7 1532 Content-Base header 33 1533 Content-Disposition header 31 1534 Content-Encoding header 21 1535 Content-Language header 22 1536 Content-Location header 22 1537 Content-MD5 header 23 1538 Content-Type header 24 1539 Content-Version header 33 1541 D 1542 deflate 7 1543 Derived-From header 33 1545 G 1546 Grammar 1547 Accept 16 1548 Accept-Charset 18 1549 Accept-Encoding 18 1550 accept-extension 16 1551 Accept-Language 20 1552 accept-params 16 1553 attribute 8 1554 charset 6 1555 codings 18 1556 content-coding 7 1557 content-disposition 32 1558 Content-Encoding 21 1559 Content-Language 22 1560 Content-Location 23 1561 Content-MD5 23 1562 Content-Type 24 1563 disp-extension-parm 32 1564 disp-extension-token 32 1565 disposition-parm 32 1566 disposition-type 32 1567 entity-body 12 1568 entity-header 11 1569 extension-header 11 1570 filename-parm 32 1571 language-range 20 1572 language-tag 11 1573 md5-digest 23 1574 media-range 16 1575 media-type 8 1576 MIME-Version 29 1577 parameter 8 1578 primary-tag 11 1579 qvalue 10 1580 subtag 11 1581 subtype 8 1582 type 8 1583 value 8 1584 gzip 7 1586 H 1587 Headers 1588 Accept 16 1589 Accept-Charset 18 1590 Accept-Encoding 18 1591 Accept-Language 20 1592 Alternate 33 1593 Content-Base 33 1594 Content-Disposition 31 1595 Content-Encoding 21 1596 Content-Language 22 1597 Content-Location 22 1598 Content-MD5 23 1599 Content-Type 24 1600 Content-Version 33 1601 Derived-From 33 1602 Link 33 1603 Public 33 1604 URI 33 1606 I 1607 identity 8 1609 L 1610 Link header 33 1612 P 1613 Public header 33 1615 U 1616 URI header 33 1618 Authors' Addresses 1620 Roy T. Fielding (editor) 1621 Day Software 1622 23 Corporate Plaza DR, Suite 280 1623 Newport Beach, CA 92660 1624 USA 1626 Phone: +1-949-706-5300 1627 Fax: +1-949-706-5305 1628 Email: fielding@gbiv.com 1629 URI: http://roy.gbiv.com/ 1631 Jim Gettys 1632 One Laptop per Child 1633 21 Oak Knoll Road 1634 Carlisle, MA 01741 1635 USA 1637 Email: jg@laptop.org 1638 URI: http://www.laptop.org/ 1640 Jeffrey C. Mogul 1641 Hewlett-Packard Company 1642 HP Labs, Large Scale Systems Group 1643 1501 Page Mill Road, MS 1177 1644 Palo Alto, CA 94304 1645 USA 1647 Email: JeffMogul@acm.org 1648 Henrik Frystyk Nielsen 1649 Microsoft Corporation 1650 1 Microsoft Way 1651 Redmond, WA 98052 1652 USA 1654 Email: henrikn@microsoft.com 1656 Larry Masinter 1657 Adobe Systems, Incorporated 1658 345 Park Ave 1659 San Jose, CA 95110 1660 USA 1662 Email: LMM@acm.org 1663 URI: http://larry.masinter.net/ 1665 Paul J. Leach 1666 Microsoft Corporation 1667 1 Microsoft Way 1668 Redmond, WA 98052 1670 Email: paulle@microsoft.com 1672 Tim Berners-Lee 1673 World Wide Web Consortium 1674 MIT Computer Science and Artificial Intelligence Laboratory 1675 The Stata Center, Building 32 1676 32 Vassar Street 1677 Cambridge, MA 02139 1678 USA 1680 Email: timbl@w3.org 1681 URI: http://www.w3.org/People/Berners-Lee/ 1682 Yves Lafon (editor) 1683 World Wide Web Consortium 1684 W3C / ERCIM 1685 2004, rte des Lucioles 1686 Sophia-Antipolis, AM 06902 1687 France 1689 Email: ylafon@w3.org 1690 URI: http://www.raubacapeu.net/people/yves/ 1692 Julian F. Reschke (editor) 1693 greenbytes GmbH 1694 Hafenweg 16 1695 Muenster, NW 48155 1696 Germany 1698 Phone: +49 251 2807760 1699 Fax: +49 251 2807761 1700 Email: julian.reschke@greenbytes.de 1701 URI: http://greenbytes.de/tech/webdav/ 1703 Full Copyright Statement 1705 Copyright (C) The IETF Trust (2008). 1707 This document is subject to the rights, licenses and restrictions 1708 contained in BCP 78, and except as set forth therein, the authors 1709 retain all their rights. 1711 This document and the information contained herein are provided on an 1712 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1713 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1714 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1715 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1716 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1717 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1719 Intellectual Property 1721 The IETF takes no position regarding the validity or scope of any 1722 Intellectual Property Rights or other rights that might be claimed to 1723 pertain to the implementation or use of the technology described in 1724 this document or the extent to which any license under such rights 1725 might or might not be available; nor does it represent that it has 1726 made any independent effort to identify any such rights. Information 1727 on the procedures with respect to rights in RFC documents can be 1728 found in BCP 78 and BCP 79. 1730 Copies of IPR disclosures made to the IETF Secretariat and any 1731 assurances of licenses to be made available, or the result of an 1732 attempt made to obtain a general license or permission for the use of 1733 such proprietary rights by implementers or users of this 1734 specification can be obtained from the IETF on-line IPR repository at 1735 http://www.ietf.org/ipr. 1737 The IETF invites any interested party to bring to its attention any 1738 copyrights, patents or patent applications, or other proprietary 1739 rights that may cover technology that may be required to implement 1740 this standard. Please address the information to the IETF at 1741 ietf-ipr@ietf.org. 1743 Acknowledgment 1745 Funding for the RFC Editor function is provided by the IETF 1746 Administrative Support Activity (IASA).