None A. Barth Internet-Draft I. Hickson Expires: August 8, 2011 Google, Inc. February 4, 2011 Media Type Sniffing draft-ietf-websec-mime-sniff-02 Abstract Many web servers supply incorrect Content-Type header fields with their HTTP responses. In order to be compatible with these servers, user agents consider the content of HTTP responses as well as the Content-Type header fields when determining the effective media type of the response. This document describes an algorithm for determining the effective media type of HTTP responses that balances security and compatibility considerations. Please send feedback on this draft to websec@ietf.org. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 8, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. Barth & Hickson Expires August 8, 2011 [Page 1] Internet-Draft Media Type Sniffing February 2011 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Web Pages . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Text or Binary . . . . . . . . . . . . . . . . . . . . . . . . 8 5. Unknown Type . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.1. Signature for H.264 . . . . . . . . . . . . . . . . . . . 15 6. Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 7. Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 8. Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 9. Feed or HTML . . . . . . . . . . . . . . . . . . . . . . . . . 19 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 Barth & Hickson Expires August 8, 2011 [Page 2] Internet-Draft Media Type Sniffing February 2011 1. Introduction The HTTP Content-Type header field indicates the media type of an HTTP response. However, many HTTP servers supply a Content-Type that does not match the actual contents of the response. Historically, web browsers have tolerated these servers by examining the content of HTTP responses in addition to the Content-Type header field to determine the effective media type of the response. Without a clear specification of how to "sniff" the media type, each user agent implementor was forced to reverse engineer the behavior of the other user agents and to develop their own algorithm. These divergent algorithms have lead to a lack of interoperability between user agents and to security issues when the server intends an HTTP response to be interpreted as one media type but some user agents interpret the responses as another media type. These security issues are most severe when an "honest" server lets potentially malicious users upload files and then serves the contents of those files with a low-privilege media type (such as text/plain or image/jpeg). (Malicious servers, of course, can specify an arbitrary media type in the Content-Type header field.) In the absence of media type sniffing, this user-generated content would not be interpreted as a high-privilege media type, such as text/html. However, if a user agent does interpret a low-privilege media type, such as image/gif, as a high-privilege media type, such as text/html, the user agent has created a privilege escalation vulnerability in the server. For example, a malicious user might be able to leverage content sniffing to mount a cross-site script attack by including JavaScript code in the uploaded file that a user agent treats as text/html. This document describes a content sniffing algorithm that carefully balances the compatibility needs of user agent implementors with the security constraints. The algorithm has been constructed with reference to content sniffing algorithms present in popular user agents, an extensive database of existing web content, and metrics collected from implementations deployed to a sizable number of users [BarthCaballeroSong2009]. WARNING! Whenever possible, user agents SHOULD NOT employ a content sniffing algorithm. However, if a user agent does employ a content sniffing algorithm, the user agent SHOULD use the algorithm in this document because using a different content sniffing algorithm than servers expect causes security problems. For example, if a server believes that the client will treat a contributed file as an image (and thus treat it as benign), but a user agent believes the content to be HTML (and thus privileged to execute any scripts contained Barth & Hickson Expires August 8, 2011 [Page 3] Internet-Draft Media Type Sniffing February 2011 therein), an attacker might be able to steal the user's authentication credentials and mount other cross-site scripting attacks. Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.) Barth & Hickson Expires August 8, 2011 [Page 4] Internet-Draft Media Type Sniffing February 2011 2. Metadata The explicit media type metadata information associated with sequence of octets depends on the protocol that was used to fetch the octets. For octets received via HTTP, the Content-Type HTTP header field, if present, indicates the media type. Let the official-type be the media type indicted by the HTTP Content-Type header field, if present. If the Content-Type header field is absent or if its value cannot be interpreted as a media type (e.g. because its value doesn't contain a U+002F SOLIDUS ('/') character), then there is no official- type. Note: If an HTTP response contains multiple Content-Type header fields, the user agent MUST use the textually last Content-Type header field to the official-type. For example, if the last Content-Type header field contains the value "foo", then there is no official media type because "foo" cannot be interpreted as a media type (even if the HTTP response contains another Content- Type header field that could be interpreted as a media type). For octets fetched from the file system, user agents should use platform-specific conventions (e.g., operating system file extension/ type mappings) to determine the official-type. Note: It is essential that file extensions are not used for determining the media type for octets fetched over HTTP because, in some cases, file extensions can be supplied by malicious parties. For example, most PHP installations let the attacker append arbitrary path information to URLs (e.g., http://example.com/foo.php/bar.html) and thereby determine the file extension. For octets fetched over some other protocols, e.g. FTP, there is no type information. Note: Comparisons between media types, as defined by MIME specifications, are done in an ASCII case-insensitive manner. [RFC2046] Barth & Hickson Expires August 8, 2011 [Page 5] Internet-Draft Media Type Sniffing February 2011 3. Web Pages The user agent MUST use the following algorithm to determine the sniffed-type of a sequence of octets: 1. If the user agent is configured to strictly obey the official- type, then let the sniffed-type be the official-type and abort these steps. 2. If the octets were fetched via HTTP and there is an HTTP Content- Type header field and the value of the last such header field has octets that *exactly* match the octets contained in one of the following lines: +-------------------------------+--------------------------------+ | Bytes in Hexadecimal | Textual Representation | +-------------------------------+--------------------------------+ | 74 65 78 74 2f 70 6c 61 69 6e | text/plain | +-------------------------------+--------------------------------+ | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=ISO-8859-1 | | 3b 20 63 68 61 72 73 65 74 3d | | | 49 53 4f 2d 38 38 35 39 2d 31 | | +-------------------------------+--------------------------------+ | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=iso-8859-1 | | 3b 20 63 68 61 72 73 65 74 3d | | | 69 73 6f 2d 38 38 35 39 2d 31 | | +-------------------------------+--------------------------------+ | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=UTF-8 | | 3b 20 63 68 61 72 73 65 74 3d | | | 55 54 46 2d 38 | | +-------------------------------+--------------------------------+ ...then jump to the "text or binary" section below. 3. If there is no official-type, jump to the "unknown type" section below. 4. If the official-type is "unknown/unknown", "application/unknown", or "*/*", jump to the "unknown type" section below. 5. If the official-type ends in "+xml", or if it is either "text/ xml" or "application/xml", then let the sniffed-type be the official-type and abort these steps. 6. If the official-type is an image type supported by the user agent (e.g., "image/png", "image/gif", "image/jpeg", etc), then jump to the "images" section below. Barth & Hickson Expires August 8, 2011 [Page 6] Internet-Draft Media Type Sniffing February 2011 7. If the official-type is "text/html", then jump to the "feed or HTML" section below. 8. Let the sniffed-type be the official type. Barth & Hickson Expires August 8, 2011 [Page 7] Internet-Draft Media Type Sniffing February 2011 4. Text or Binary This section defines the *rules for distinguishing if a resource is text or binary*. 1. The user agent MAY wait for 512 or more octets be to arrive. Note: Waiting for 512 octets octets to arrive causes the text- or-binary algorithm to be deterministic for a given sequence of octets. However, in some cases, the user agent might need to wait an arbitrary length of time for these octets to arrive. User agents SHOULD wait for 512 octets to arrive, when feasible. 2. Let n be the smaller of either 512 or the number of octets that have already arrived. 3. If n is greater than or equal to 3, and the first 2 or 3 octets match one of the following octet sequences: +----------------------+--------------+ | Bytes in Hexadecimal | Description | +----------------------+--------------+ | FE FF | UTF-16BE BOM | | FF FE | UTF-16LE BOM | | EF BB BF | UTF-8 BOM | +----------------------+--------------+ ...then let the sniffed-type be "text/plain" and abort these steps. 4. If none of the first n octets are binary data octets then let the sniffed-type be "text/plain" and abort these steps. +-------------------------+ | Binary Data Byte Ranges | +-------------------------+ | 0x00 -- 0x08 | | 0x0B | | 0x0E -- 0x1A | | 0x1C -- 0x1F | +-------------------------+ 5. If the first octets match one of the octet sequences in the "pattern" column of the table in the "unknown type" section below, ignoring any rows whose cell in the "security" column says "scriptable" (or "n/a"), then let the sniffed-type be the type given in the corresponding cell in the "sniffed type" column on Barth & Hickson Expires August 8, 2011 [Page 8] Internet-Draft Media Type Sniffing February 2011 that row and abort these steps. WARNING! It is critical that this step not ever return a scriptable type (e.g., text/html), because otherwise that would allow a privilege escalation attack. 6. Otherwise, let the sniffed-type be "application/octet-stream" and abort these steps. Barth & Hickson Expires August 8, 2011 [Page 9] Internet-Draft Media Type Sniffing February 2011 5. Unknown Type 1. The user agent MAY wait for 512 or more octets to arrive for the same reason as in the "text or binary" section above. 2. Let n be the smaller of either 512 or the number of octets that have already arrived. 3. For each row in the table below: * If the row has no "WS" octets: 1. Let pattern-length be the length of the pattern. 2. If n is smaller than pattern-length then skip this row. 3. Apply the bit-wise "and" operator to the first pattern- length octets and the given mask, and let the result be the masked-data. 4. If the octets of the masked-data matches the given pattern octets exactly, then let the sniffed-type be the type given in the cell of the third column in that row and abort these steps. * If the row has a "WS" octet or a "_>" octet: 1. Let index-pattern be an index into the mask and pattern octet strings of the row. 2. Let index-stream be an index into the octet stream being examined. 3. LOOP: If index-stream points beyond the end of the octet stream, then this row doesn't match and skip this row. 4. Examine the index-stream-th octet of the octet stream as follows: - If the index-pattern-th octet of the pattern is a normal hexadecimal octet and not a "WS" octet or a "_>" octet: If the bit-wise "and" operator, applied to the index-stream-th octet of the stream and the index- pattern-th octet of the mask, yield a value different than the index-pattern-th octet of the pattern, then skip this row. Barth & Hickson Expires August 8, 2011 [Page 10] Internet-Draft Media Type Sniffing February 2011 Otherwise, increment index-pattern to the next octet in the mask and pattern and index-stream to the next octet in the octet stream. - Otherwise, if the index-pattern-th octet of the pattern is a "WS" octet: "WS" means "whitespace", and allows insignificant whitespace to be skipped when sniffing for a type signature. If the index-stream-th octet of the stream is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space), then increment only the index-stream to the next octet in the octet stream. Otherwise, increment only the index-pattern to the next octet in the mask and pattern. - Otherwise, if the index-pattern-th octet of the pattern is a "_>" octet: "_>" means "space-or-bracket", and allows HTML tag names to terminate with either a space or a greater than sign. If index-stream-th octet of the stream different than 0x20 (ASCII space) or 0x3E (ASCII ">"), then skip this row. Otherwise, increment index-pattern to the next octet in the mask and pattern and index-stream to the next octet in the octet stream. 5. If index-pattern does not point beyond the end of the mask and pattern octet strings, then jump back to the LOOP step in this algorithm. 6. Otherwise, let the sniffed-type be the type given in the cell of the third column in that row and abort these steps. 4. If the first n octets match the signature for H264 (as define in Section 5.1), then let the sniffed-type be video/H264 and abort these steps. Barth & Hickson Expires August 8, 2011 [Page 11] Internet-Draft Media Type Sniffing February 2011 5. If none of the first n octets are binary data (as defined in the "text or binary" section), then let the sniffed-type be "text/ plain" and abort these steps. 6. Otherwise, let the sniffed-type be "application/octet-stream" and abort these steps. The table used by the above algorithm is: +-------------------+-------------------+-----------------+------------+ | Mask in Hex | Pattern in Hex | Sniffed Type | Security | +-------------------+-------------------+-----------------+------------+ | FF FF FF DF DF DF | WS 3C 21 44 4F 43 | text/html | Scriptable | | DF DF DF DF FF DF | 54 59 50 45 20 48 | | | | DF DF DF FF | 54 4D 4C _> | | | | Comment: | | | | Comment: | | | | Comment: | | | | Comment: