idnits 2.17.1 draft-ietf-websec-mime-sniff-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 7, 2011) is 4737 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 656 -- Looks like a reference, but probably isn't: '1' on line 656 -- Looks like a reference, but probably isn't: '2' on line 656 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 None A. Barth 3 Internet-Draft I. Hickson 4 Expires: November 8, 2011 Google, Inc. 5 May 7, 2011 7 Media Type Sniffing 8 draft-ietf-websec-mime-sniff-03 10 Abstract 12 Many web servers supply incorrect Content-Type header fields with 13 their HTTP responses. In order to be compatible with these servers, 14 user agents consider the content of HTTP responses as well as the 15 Content-Type header fields when determining the effective media type 16 of the response. This document describes an algorithm for 17 determining the effective media type of HTTP responses that balances 18 security and compatibility considerations. 20 Please send feedback on this draft to websec@ietf.org. 22 Status of this Memo 24 This Internet-Draft is submitted to IETF in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as Internet- 30 Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/ietf/1id-abstracts.txt. 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html. 43 This Internet-Draft will expire on November 8, 2011. 45 Copyright Notice 47 Copyright (c) 2011 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 5 64 3. Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 4. Web Pages . . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 5. Text or Binary . . . . . . . . . . . . . . . . . . . . . . . . 9 67 6. Unknown Type . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 6.1. Signature for MP4 . . . . . . . . . . . . . . . . . . . . 16 69 7. Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 70 8. Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 71 9. Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 72 10. Feed or HTML . . . . . . . . . . . . . . . . . . . . . . . . . 20 73 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 74 11.1. Normative References . . . . . . . . . . . . . . . . . . . 23 75 11.2. Informative References . . . . . . . . . . . . . . . . . . 23 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 78 1. Introduction 80 The HTTP Content-Type header field indicates the media type of an 81 HTTP response. However, many HTTP servers supply a Content-Type that 82 does not match the actual contents of the response. Historically, 83 web browsers have tolerated these servers by examining the content of 84 HTTP responses in addition to the Content-Type header field to 85 determine the effective media type of the response. 87 Without a clear specification of how to "sniff" the media type, each 88 user agent implementor was forced to reverse engineer the behavior of 89 the other user agents and to develop their own algorithm. These 90 divergent algorithms have lead to a lack of interoperability between 91 user agents and to security issues when the server intends an HTTP 92 response to be interpreted as one media type but some user agents 93 interpret the responses as another media type. 95 These security issues are most severe when an "honest" server lets 96 potentially malicious users upload files and then serves the contents 97 of those files with a low-privilege media type (such as text/plain or 98 image/jpeg). (Malicious servers, of course, can specify an arbitrary 99 media type in the Content-Type header field.) In the absence of 100 media type sniffing, this user-generated content would not be 101 interpreted as a high-privilege media type, such as text/html. 102 However, if a user agent does interpret a low-privilege media type, 103 such as image/gif, as a high-privilege media type, such as text/html, 104 the user agent has created a privilege escalation vulnerability in 105 the server. For example, a malicious user might be able to leverage 106 content sniffing to mount a cross-site script attack by including 107 JavaScript code in the uploaded file that a user agent treats as 108 text/html. 110 This document describes a content sniffing algorithm that carefully 111 balances the compatibility needs of user agent implementors with the 112 security constraints. The algorithm has been constructed with 113 reference to content sniffing algorithms present in popular user 114 agents, an extensive database of existing web content, and metrics 115 collected from implementations deployed to a sizable number of users 116 [BarthCaballeroSong2009]. 118 WARNING! Whenever possible, user agents SHOULD NOT employ a content 119 sniffing algorithm. However, if a user agent does employ a content 120 sniffing algorithm, the user agent SHOULD use the algorithm in this 121 document because using a different content sniffing algorithm than 122 servers expect causes security problems. For example, if a server 123 believes that the client will treat a contributed file as an image 124 (and thus treat it as benign), but a user agent believes the content 125 to be HTML (and thus privileged to execute any scripts contained 126 therein), an attacker might be able to steal the user's 127 authentication credentials and mount other cross-site scripting 128 attacks. 130 Conformance requirements phrased as algorithms or specific steps MAY 131 be implemented in any manner, so long as the end result is 132 equivalent. (In particular, the algorithms defined in this 133 specification are intended to be easy to follow, and not intended to 134 be performant.) 136 2. Conventions 138 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 139 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 140 document are to be interpreted as described in [RFC2119]. 142 Requirements phrased in the imperative as part of algorithms (such as 143 "strip any leading space characters" or "return false and abort these 144 steps") are to be interpreted with the meaning of the key word 145 ("MUST", "SHOULD", "MAY", etc) used in introducing the algorithm. 147 Conformance requirements phrased as algorithms or specific steps can 148 be implemented in any manner, so long as the end result is 149 equivalent. In particular, the algorithms defined in this 150 specification are intended to be easy to understand and are not 151 intended to be performant. 153 3. Metadata 155 The explicit media type metadata information associated with sequence 156 of octets depends on the protocol that was used to fetch the octets. 158 For octets received via HTTP, the Content-Type HTTP header field, if 159 present, indicates the media type. Let the official-type be the 160 media type indicted by the HTTP Content-Type header field, if 161 present. If the Content-Type header field is absent or if its value 162 cannot be interpreted as a media type (e.g. because its value doesn't 163 contain a U+002F SOLIDUS ('/') character), then there is no official- 164 type. (Such messages are invalid according to [RFC2616] 166 Note: If an HTTP response contains multiple Content-Type header 167 fields, the user agent MUST use the textually last Content-Type 168 header field to the official-type. For example, if the last 169 Content-Type header field contains the value "foo", then there is 170 no official media type because "foo" cannot be interpreted as a 171 media type (even if the HTTP response contains another Content- 172 Type header field that could be interpreted as a media type). 174 For octets fetched from the file system, user agents should use 175 platform-specific conventions (e.g., operating system file extension/ 176 type mappings) to determine the official-type. 178 Note: It is essential that file extensions are not used for 179 determining the media type for octets fetched over HTTP because, 180 in some cases, file extensions can be supplied by malicious 181 parties. For example, most PHP installations let the attacker 182 append arbitrary path information to URLs (e.g., 183 http://example.com/foo.php/bar.html) and thereby determine the 184 file extension. 186 For octets fetched over some other protocols, e.g. FTP [RFC0959], 187 there is no type information. 189 Note: Comparisons between media types, as defined by MIME 190 specifications, are done in an ASCII case-insensitive manner. 191 [RFC2046] 193 4. Web Pages 195 The user agent MUST use the following algorithm to determine the 196 sniffed-type of a sequence of octets: 198 1. If the user agent is configured to strictly obey the official- 199 type, then let the sniffed-type be the official-type and abort 200 these steps. 202 2. If the octets were fetched via HTTP and there is an HTTP Content- 203 Type header field and the value of the last such header field has 204 octets that *exactly* match the octets contained in one of the 205 following lines: 207 +-------------------------------+--------------------------------+ 208 | Bytes in Hexadecimal | Textual Representation | 209 +-------------------------------+--------------------------------+ 210 | 74 65 78 74 2f 70 6c 61 69 6e | text/plain | 211 +-------------------------------+--------------------------------+ 212 | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=ISO-8859-1 | 213 | 3b 20 63 68 61 72 73 65 74 3d | | 214 | 49 53 4f 2d 38 38 35 39 2d 31 | | 215 +-------------------------------+--------------------------------+ 216 | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=iso-8859-1 | 217 | 3b 20 63 68 61 72 73 65 74 3d | | 218 | 69 73 6f 2d 38 38 35 39 2d 31 | | 219 +-------------------------------+--------------------------------+ 220 | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=UTF-8 | 221 | 3b 20 63 68 61 72 73 65 74 3d | | 222 | 55 54 46 2d 38 | | 223 +-------------------------------+--------------------------------+ 225 ...then jump to the "text or binary" section below. 227 3. If there is no official-type, jump to the "unknown type" section 228 below. 230 4. If the official-type is "unknown/unknown", "application/unknown", 231 or "*/*", jump to the "unknown type" section below. 233 5. If the official-type ends in "+xml", or if it is either "text/ 234 xml" or "application/xml", then let the sniffed-type be the 235 official-type and abort these steps. 237 6. If the official-type is an image type supported by the user agent 238 (e.g., "image/png", "image/gif", "image/jpeg", etc), then jump to 239 the "images" section below. 241 7. If the official-type is "text/html", then jump to the "feed or 242 HTML" section below. 244 8. Let the sniffed-type be the official type. 246 5. Text or Binary 248 This section defines the *rules for distinguishing if a resource is 249 text or binary*. 251 1. The user agent MAY wait for 512 or more octets to arrive. 253 Note: Waiting for 512 octets octets to arrive causes the text- 254 or-binary algorithm to be deterministic for a given sequence 255 of octets. However, in some cases, the user agent might need 256 to wait an arbitrary length of time for these octets to 257 arrive. User agents SHOULD wait for 512 octets to arrive, 258 when feasible. 260 2. Let n be the smaller of either 512 or the number of octets that 261 have already arrived. 263 3. If n is greater than or equal to 3, and the first 2 or 3 octets 264 match one of the following octet sequences: 266 +----------------------+--------------+ 267 | Bytes in Hexadecimal | Description | 268 +----------------------+--------------+ 269 | FE FF | UTF-16BE BOM | 270 | FF FE | UTF-16LE BOM | 271 | EF BB BF | UTF-8 BOM | 272 +----------------------+--------------+ 274 ...then let the sniffed-type be "text/plain" and abort these 275 steps. 277 4. If none of the first n octets are binary data octets then let the 278 sniffed-type be "text/plain" and abort these steps. 280 +-------------------------+ 281 | Binary Data Byte Ranges | 282 +-------------------------+ 283 | 0x00 -- 0x08 | 284 | 0x0B | 285 | 0x0E -- 0x1A | 286 | 0x1C -- 0x1F | 287 +-------------------------+ 289 5. If the first octets match one of the octet sequences in the 290 "pattern" column of the table in the "unknown type" section 291 below, ignoring any rows whose cell in the "security" column says 292 "scriptable" (or "n/a"), then let the sniffed-type be the type 293 given in the corresponding cell in the "sniffed type" column on 294 that row and abort these steps. 296 WARNING! It is critical that this step not ever return a 297 scriptable type (e.g., text/html), because otherwise that 298 would allow a privilege escalation attack. 300 6. Otherwise, let the sniffed-type be "application/octet-stream" and 301 abort these steps. 303 6. Unknown Type 305 1. The user agent MAY wait for 512 or more octets to arrive for the 306 same reason as in the "text or binary" section above. 308 2. Let n be the smaller of either 512 or the number of octets that 309 have already arrived. 311 3. For each row in the table below: 313 * If the row has no "WS" octets: 315 1. Let pattern-length be the length of the pattern. 317 2. If n is smaller than pattern-length then skip this row. 319 3. Apply the bit-wise "and" operator to the first pattern- 320 length octets and the given mask, and let the result be 321 the masked-data. 323 4. If the octets of the masked-data matches the given pattern 324 octets exactly, then let the sniffed-type be the type 325 given in the cell of the third column in that row and 326 abort these steps. 328 * If the row has a "WS" octet or a "_>" octet: 330 1. Let index-pattern be an index into the mask and pattern 331 octet strings of the row. 333 2. Let index-stream be an index into the octet stream being 334 examined. 336 3. LOOP: If index-stream points beyond the end of the octet 337 stream, then this row doesn't match and skip this row. 339 4. Examine the index-stream-th octet of the octet stream as 340 follows: 342 - If the index-pattern-th octet of the pattern is a 343 normal hexadecimal octet and not a "WS" octet or a "_>" 344 octet: 346 If the bit-wise "and" operator, applied to the 347 index-stream-th octet of the stream and the index- 348 pattern-th octet of the mask, yield a value 349 different than the index-pattern-th octet of the 350 pattern, then skip this row. 352 Otherwise, increment index-pattern to the next octet 353 in the mask and pattern and index-stream to the next 354 octet in the octet stream. 356 - Otherwise, if the index-pattern-th octet of the pattern 357 is a "WS" octet: 359 "WS" means "whitespace", and allows insignificant 360 whitespace to be skipped when sniffing for a type 361 signature. 363 If the index-stream-th octet of the stream is one of 364 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 365 0x0D (ASCII CR), or 0x20 (ASCII space), then 366 increment only the index-stream to the next octet in 367 the octet stream. 369 Otherwise, increment only the index-pattern to the 370 next octet in the mask and pattern. 372 - Otherwise, if the index-pattern-th octet of the pattern 373 is a "_>" octet: 375 "_>" means "space-or-bracket", and allows HTML tag 376 names to terminate with either a space or a greater 377 than sign. 379 If index-stream-th octet of the stream is different 380 than 0x20 (ASCII space) or 0x3E (ASCII ">"), then 381 skip this row. 383 Otherwise, increment index-pattern to the next octet 384 in the mask and pattern and index-stream to the next 385 octet in the octet stream. 387 5. If index-pattern does not point beyond the end of the mask 388 and pattern octet strings, then jump back to the LOOP step 389 in this algorithm. 391 6. Otherwise, let the sniffed-type be the type given in the 392 cell of the third column in that row and abort these 393 steps. 395 4. If the first n octets match the signature for MP4 (as define in 396 Section 6.1), then let the sniffed-type be video/mp4 and abort 397 these steps. 399 5. If none of the first n octets are binary data (as defined in the 400 "text or binary" section), then let the sniffed-type be "text/ 401 plain" and abort these steps. 403 6. Otherwise, let the sniffed-type be "application/octet-stream" and 404 abort these steps. 406 The table used by the above algorithm is: 408 +-------------------+-------------------+-----------------+------------+ 409 | Mask in Hex | Pattern in Hex | Sniffed Type | Security | 410 +-------------------+-------------------+-----------------+------------+ 411 | FF FF FF DF DF DF | WS 3C 21 44 4F 43 | text/html | Scriptable | 412 | DF DF DF DF FF DF | 54 59 50 45 20 48 | | | 413 | DF DF DF FF | 54 4D 4C _> | | | 414 | Comment: | | | 418 | Comment: | | | 422 | Comment: | | | 426 | Comment: