idnits 2.17.1 

draft-abarth-mime-sniff-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (January 9, 2009) is 5576 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'HTTP' on line 123 looks like a reference

  -- Missing reference section? 'RFC2616' on line 178 looks like a reference

  -- Missing reference section? 'RFC2046' on line 216 looks like a reference

  -- Missing reference section? '0' on line 508 looks like a reference

  -- Missing reference section? '1' on line 508 looks like a reference

  -- Missing reference section? '2' on line 508 looks like a reference


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Working Group                                                   A. Barth
3	Internet-Draft                                             U.C. Berkeley
4	Expires: July 13, 2009                                        I. Hickson
5	                                                            Google, Inc.
6	                                                         January 9, 2009

8	                     Content-Type Processing Model
9	                       draft-abarth-mime-sniff-00

11	Status of this Memo

13	   This Internet-Draft is submitted to IETF in full conformance with the
14	   provisions of BCP 78 and BCP 79.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt.

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	   This Internet-Draft will expire on July 13, 2009.

34	Copyright Notice

36	   Copyright (c) 2009 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents
41	   (http://trustee.ietf.org/license-info) in effect on the date of
42	   publication of this document.  Please review these documents
43	   carefully, as they describe your rights and restrictions with respect
44	   to this document.

46	Abstract

48	   Many Web servers supply incorrect Content-Type headers with their
49	   HTTP responses.  In order to be compatible with these Web servers,
50	   Web browsers must consider the content of HTTP responses as well as
51	   the Content-Type header when determining the effective mime type of
52	   the response.  This document describes an algorithm for determining
53	   the effective mime type of HTTP responses that balances security and
54	   compatibility considerations.

56	Table of Contents

58	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
59	   2.  Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
60	   3.  Web Pages  . . . . . . . . . . . . . . . . . . . . . . . . . .  6
61	   4.  Text or Binary . . . . . . . . . . . . . . . . . . . . . . . .  8
62	   5.  Unknown Type . . . . . . . . . . . . . . . . . . . . . . . . . 10
63	   6.  Image  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
64	   7.  Feed or HTML . . . . . . . . . . . . . . . . . . . . . . . . . 15
65	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18

67	1.  Introduction

69	   The HTTP Content-Type header indicates the mime type of an HTTP
70	   responses.  However, many HTTP servers supply a Content-Type that
71	   does not match the actual contents of the response.  Historically,
72	   Web browsers have been tolerated these servers by examining the
73	   content of HTTP responses in addition to the Content-Type header to
74	   determine the effective mime type of the response.

76	   Without a clear specification of how to "sniff" the mime type, each
77	   browser vendor was forced to reverse engineer the behavior of the
78	   other borwsers and to developed their own algorithm.  These divergent
79	   algorithms have lead to a lack of interoperability between browsers
80	   and to security issues when the site intends an HTTP response to be
81	   interpreted as one mime type but the browser interpretes the
82	   responses as another mime type.

84	   These security issues are must severe when a Web site lets users
85	   upload files and then serves the contents of those files with a low-
86	   privilege mime type (such as text/plain or image/jpeg).  In the
87	   absense of mime sniffing, this user-generated content will not be
88	   able to run JavaScript, but if the browser treats the response as
89	   text/html, then the user can mount a cross-site scripting attack by
90	   including JavaScript code in the uploaded file.

92	   This document describes a mime sniffing algorithm that carefully
93	   balances the compatibility needs of browser vendors with the security
94	   constraints.  The algorithm has been constructed with reference to
95	   mime sniffing algorithms present in popular Web browsers, an
96	   extensive database of Web content, and metrics collected from
97	   implementations deployed to a sizable number of Web users.

99	   Warning!  It is imperative that the algorithm in this document be
100	   followed exactly.  When a user agent uses different heuristics for
101	   content type detection than the server expects, security problems can
102	   occur.  For example, if a server believes that the client will treat
103	   a contributed file as an image (and thus treat it as benign), but a
104	   Web browser believes the content to be HTML (and thus execute any
105	   scripts contained therein), the end user can be exposed to malicious
106	   content, making the user vulnerable to cookie theft attacks and other
107	   cross-site scripting attacks.

109	2.  Metadata

111	   What explicit Content-Type metadata is associated with the resource
112	   (the resource's type information) depends on the protocol that was
113	   used to fetch the resource.

115	   For HTTP resources, only the first Content-Type HTTP header, if any,
116	   contributes any type information; the explicit type of the resource
117	   is then the value of that header, interpreted as described by the
118	   HTTP specifications.  If the Content-Type HTTP header is present but
119	   the value of the first such header cannot be interpreted as described
120	   by the HTTP specifications (e.g. because its value doesn't contain a
121	   U+002F SOLIDUS ('/') character), then the resource has no type
122	   information (even if there are multiple Content-Type HTTP headers and
123	   one of the other ones is syntactically correct).  [HTTP]

125	   For resources fetched from the file system, user agents should use
126	   platform-specific conventions, e.g. operating system extension/type
127	   mappings.

129	   Extensions must not be used for determining resource types for
130	   resources fetched over HTTP.

132	   For resources fetched over most other protocols, e.g.  FTP, there is
133	   no type information.

135	   The algorithm for extracting an encoding from a Content-Type, given a
136	   string s, is as follows.  It either returns an encoding or nothing.

138	   1.  Find the first seven characters in s that are an ASCII case-
139	       insensitive match for the word "charset".  If no such match is
140	       found, return nothing.

142	   2.  Skip any U+0009, U+000A, U+000C, U+000D, or U+0020 characters
143	       that immediately follow the word 'charset' (there might not be
144	       any).

146	   3.  If the next character is not a U+003D EQUALS SIGN ('='), return
147	       nothing.

149	   4.  Skip any U+0009, U+000A, U+000C, U+000D, or U+0020 characters
150	       that immediately follow the equals sign (there might not be any).

152	   5.  Process the next character as follows:

154	       *  If it is a U+0022 QUOTATION MARK ('"') and there is a later
155	          U+0022 QUOTATION MARK ('"') in s, or

157	       *  If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027
158	          APOSTROPHE ("'") in s

160	             Return the string between this character and the next
161	             earliest occurrence of this character.

163	       *  If it is an unmatched U+0022 QUOTATION MARK ('"'),

165	       *  If it is an unmatched U+0027 APOSTROPHE ("'"), or

167	       *  If there is no next character

169	             Return nothing.

171	       *  Otherwise

173	             Return the string from this character to the first U+0009,
174	             U+000A, U+000C, U+000D, U+0020, or U+003B character or the
175	             end of s, whichever comes first.

177	   Note: The above algorithm is a willful violation of the HTTP
178	   specification.  [RFC2616]

180	3.  Web Pages

182	   The sniffed type of a resource must be found as follows:

184	   1.  If the user agent is configured to strictly obey Content-Type
185	       headers for this resource, then jump to the last step in this set
186	       of steps.

188	   2.  If the resource was fetched over an HTTP protocol and there is an
189	       HTTP Content-Type header and the value of the first such header
190	       has bytes that exactly match one of the following lines:

192	      +-------------------------------+--------------------------------+
193	      | Bytes in Hexadecimal          | Textual representation         |
194	      +-------------------------------+--------------------------------+
195	      | 74 65 78 74 2f 70 6c 61 69 6e | text/plain                     |
196	      +-------------------------------+--------------------------------+
197	      | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=ISO-8859-1 |
198	      | 3b 20 63 68 61 72 73 65 74 3d |                                |
199	      | 49 53 4f 2d 38 38 35 39 2d 31 |                                |
200	      +-------------------------------+--------------------------------+
201	      | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=iso-8859-1 |
202	      | 3b 20 63 68 61 72 73 65 74 3d |                                |
203	      | 69 73 6f 2d 38 38 35 39 2d 31 |                                |
204	      +-------------------------------+--------------------------------+
205	      | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=UTF-8      |
206	      | 3b 20 63 68 61 72 73 65 74 3d |                                |
207	      | 55 54 46 2d 38                |                                |
208	      +-------------------------------+--------------------------------+

210	       ...then jump to the "text or binary" section below.

212	   3.  Let official type be the type given by the Content-Type metadata
213	       for the resource, ignoring parameters.  If there is no such type,
214	       jump to the unknown type step below.  Comparisons with this type,
215	       as defined by MIME specifications, are done in an ASCII case-
216	       insensitive manner.  [RFC2046]

218	   4.  If official type is "unknown/unknown" or "application/unknown",
219	       jump to the unknown type step below.

221	   5.  If official type ends in "+xml", or if it is either "text/xml" or
222	       "application/xml", then the sniffed type of the resource is
223	       official type; return that and abort these steps.

225	   6.  If official type is an image type supported by the user agent
226	       (e.g. "image/png", "image/gif", "image/jpeg", etc), then jump to
227	       the "images" section below, passing it the official type.

229	   7.  If official type is "text/html", then jump to the feed or HTML
230	       section below.

232	   8.  The sniffed type of the resource is official type.

234	4.  Text or Binary

236	   1.  The user agent may wait for 512 or more bytes of the resource to
237	       be available.

239	   2.  Let n be the smaller of either 512 or the number of bytes already
240	       available.

242	   3.  If n is 4 or more, and the first bytes of the resource match one
243	       of the following byte sets:

245	                   +----------------------+--------------+
246	                   | Bytes in Hexadecimal | Description  |
247	                   +----------------------+--------------+
248	                   | FE FF                | UTF-16BE BOM |
249	                   | FF FE                | UTF-16LE BOM |
250	                   | EF BB BF             | UTF-8 BOM    |
251	                   +----------------------+--------------+

253	       ...then the sniffed type of the resource is "text/plain".  Abort
254	       these steps.

256	   4.  If none of the first n bytes of the resource are binary data
257	       bytes then the sniffed type of the resource is "text/plain".
258	       Abort these steps.

260	                         +-------------------------+
261	                         | Binary data byte ranges |
262	                         +-------------------------+
263	                         | 0x00 -- 0x08            |
264	                         | 0x0B                    |
265	                         | 0x0E -- 0x1A            |
266	                         | 0x1C -- 0x1F            |
267	                         +-------------------------+

269	   5.  If the first bytes of the resource match one of the byte
270	       sequences in the "pattern" column of the table in the unknown
271	       type section below, ignoring any rows whose cell in the
272	       "security" column says "scriptable" (or "n/a"), then the sniffed
273	       type of the resource is the type given in the corresponding cell
274	       in the "sniffed type" column on that row; abort these steps.

276	          Warning!  It is critical that this step not ever return a
277	          scriptable type (e.g. text/html), as otherwise that would
278	          allow a privilege escalation attack.

280	   6.  Otherwise, the sniffed type of the resource is "application/
281	       octet-stream".

283	5.  Unknown Type

285	   1.  The user agent may wait for 512 or more bytes of the resource to
286	       be available.

288	   2.  Let stream length be the smaller of either 512 or the number of
289	       bytes already available.

291	   3.  For each row in the table below:

293	       *  If the row has no "WS" bytes:

295	          1.  Let pattern length be the length of the pattern (number of
296	              bytes described by the cell in the second column of the
297	              row).

299	          2.  If stream length is smaller than pattern length then skip
300	              this row.

302	          3.  Apply the "and" operator to the first pattern length bytes
303	              of the resource and the given mask (the bytes in the cell
304	              of first column of that row), and let the result be the
305	              data.

307	          4.  If the bytes of the data matches the given pattern bytes
308	              exactly, then the sniffed type of the resource is the type
309	              given in the cell of the third column in that row; abort
310	              these steps.

312	       *  If the row has a "WS" byte:

314	          1.  Let index_pattern be an index into the mask and pattern
315	              byte strings of the row.

317	          2.  Let index_stream be an index into the byte stream being
318	              examined.

320	          3.  Loop: If indexstream points beyond the end of the byte
321	              stream, then this row doesn't match, skip this row.

323	          4.  Examine the indexstreamth byte of the byte stream as
324	              follows:

326	              -  If the index_patternth byte of the pattern is a normal
327	                 hexadecimal byte and not a "WS" byte:

329	                    If the "and" operator, applied to the index_streamth
330	                    byte of the stream and the index_patternth byte of
331	                    the mask, yield a value different that the
332	                    index_patternth byte of the pattern, then skip this
333	                    row.

335	                    Otherwise, increment index_pattern to the next byte
336	                    in the mask and pattern and index_stream to the next
337	                    byte in the byte stream.

339	              -  Otherwise, if the indexpatternth byte of the pattern is
340	                 a "WS" byte:

342	                    "WS" means "whitespace", and allows insignificant
343	                    whitespace to be skipped when sniffing for a type
344	                    signature.

346	                    If the index_streamth byte of the stream is one of
347	                    0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF),
348	                    0x0D (ASCII CR), or 0x20 (ASCII space), then
349	                    increment only the index_stream to the next byte in
350	                    the byte stream.

352	                    Otherwise, increment only the index_pattern to the
353	                    next byte in the mask and pattern.

355	          5.  If index_pattern does not point beyond the end of the mask
356	              and pattern byte strings, then jump back to the loop step
357	              in this algorithm.

359	          6.  Otherwise, the sniffed type of the resource is the type
360	              given in the cell of the third column in that row; abort
361	              these steps.

363	   4.  If none of the first n bytes of the resource are binary data
364	       bytes then the sniffed type of the resource is "text/plain".
365	       Abort these steps.

367	   5.  Otherwise, the sniffed type of the resource is "application/
368	       octet-stream".

370	   The table used by the above algorithm is:

372	+-------------------+-------------------+-----------------+------------+
373	| Mask in Hex       | Pattern in Hex    | Sniffed type    | Security   |
374	+-------------------+-------------------+-----------------+------------+
375	| FF FF DF DF DF DF | 3C 21 44 4F 43 54 | text/html       | Scriptable |
376	| DF DF DF FF DF DF | 59 50 45 20 48 54 |                 |            |
377	| DF DF             | 4D 4C             |                 |            |
378	|                                                                      |
379	| Comment: The string "<!DOCTYPE HTML" in US-ASCII or compatible       |
380	|          encodings, case-insensitively.                              |
381	+-------------------+-------------------+-----------------+------------+
382	| FF FF DF DF DF DF | WS 3C 48 54 4D 4C | text/html       | Scriptable |
383	|                                                                      |
384	| Comment: The string "<HTML" in US-ASCII or compatible encodings,     |
385	|          case-insensitively, possibly with leading spaces.           |
386	+-------------------+-------------------+-----------------+------------+
387	| FF FF DF DF DF DF | WS 3C 48 45 41 44 | text/html       | Scriptable |
388	|                                                                      |
389	| Comment: The string "<HEAD" in US-ASCII or compatible encodings,     |
390	| case-insensitively, possibly with leading spaces.                    |
391	+-------------------+-------------------+-----------------+------------+
392	| FF FF DF DF DF DF | WS 3C 53 43 52 49 | text/html       | Scriptable |
393	| DF DF             | 50 54             |                 |            |
394	|                                                                      |
395	| Comment: The string "<SCRIPT" in US-ASCII or compatible              |
396	|          encodings, case-insensitively, possibly with leading        |
397	|          spaces.                                                     |
398	+-------------------+-------------------+-----------------+------------+
399	| FF FF FF FF FF    | 25 50 44 46 2D    | application/pdf | Scriptable |
400	|                                                                      |
401	| Comment: The string "%PDF-", the PDF signature.                      |
402	+-------------------+-------------------+-----------------+------------+
403	| FF FF FF FF FF FF | 25 21 50 53 2D 41 | application/    | Safe       |
404	| FF FF FF FF FF    | 64 6F 62 65 2D    |      postscript |            |
405	|                                                                      |
406	| Comment: The string "%!PS-Adobe-", the PostScript signature.         |
407	+-------------------+-------------------+-----------------+------------+
408	| FF FF 00 00       | FE FF 00 00       | text/plain      | n/a        |
409	|                                                                      |
410	| Comment: UTF-16BE BOM                                                |
411	+-------------------+-------------------+-----------------+------------+
412	| FF FF 00 00       | FF FF 00 00       | text/plain      | n/a        |
413	|                                                                      |
414	| Comment: UTF-16LE BOM                                                |
415	+-------------------+-------------------+-----------------+------------+
416	| FF FF FF 00       | EF BB BF 00       | text/plain      | n/a        |
417	|                                                                      |
418	| Comment: UTF-8 BOM                                                   |
419	+-------------------+-------------------+-----------------+------------+
420	| FF FF FF FF FF FF | 47 49 46 38 37 61 | image/gif       | Safe       |
421	|                                                                      |
422	| Comment: The string "GIF87a", a GIF signature.                       |
423	+-------------------+-------------------+-----------------+------------+
424	| FF FF FF FF FF FF | 47 49 46 38 39 61 | image/gif       | Safe       |
425	|                                                                      |
426	| Comment: The string "GIF89a", a GIF signature.                       |
427	+-------------------+-------------------+-----------------+------------+
428	| FF FF FF FF FF FF | 89 50 4E 47 0D 0A | image/png       | Safe       |
429	| FF FF             | 1A 0A             |                 |            |
430	|                                                                      |
431	| Comment: The PNG signature.                                          |
432	+-------------------+-------------------+-----------------+------------+
433	| FF FF FF          | FF D8 FF          | image/jpeg      | Safe       |
434	|                                                                      |
435	| Comment: A JPEG SOI marker followed by a byte of another marker.     |
436	+-------------------+-------------------+-----------------+------------+
437	| FF FF             | 42 4D             | image/bmp       | Safe       |
438	|                                                                      |
439	| Comment: The string "BM", a BMP signature.                           |
440	+-------------------+-------------------+-----------------+------------+
441	| FF FF FF FF       | 00 00 01 00       | image/vnd.      | Safe       |
442	|                   |                   |  microsoft.icon |            |
443	|                                                                      |
444	| Comment: A 0 word following by a 1 word, a Windows Icon signature.   |
445	+-------------------+-------------------+-----------------+------------+

447	   Note: I'd like to add types like MPEG, AVI, Flash, Java, etc, to the
448	   above table.

450	   User agents may support further types if desired, by implicitly
451	   adding to the above table.  However, user agents should not use any
452	   other patterns for types already mentioned in the table above, as
453	   this could then be used for privilege escalation (where, e.g., a
454	   server uses the above table to determine that content is not HTML and
455	   thus safe from XSS attacks, but then a user agent detects it as HTML
456	   anyway and allows script to execute).

458	   The column marked "security" is used by the algorithm in the "text or
459	   binary" section, to avoid sniffing text/plain content as a type that
460	   can be used for a privilege escalation attack.

462	6.  Image

464	   If the resource's official type is "image/svg+xml", then the sniffed
465	   type of the resource is its official type (an XML type).

467	   Otherwise, if the first bytes of the resource match one of the byte
468	   sequences in the first column of the following table, then the
469	   sniffed type of the resource is the type given in the corresponding
470	   cell in the second column on the same row:

472	     +-------------------------+--------------------------+----------+
473	     | Bytes in Hexadecimal    | Sniffed type             | Comment  |
474	     +-------------------------+--------------------------+----------+
475	     | 47 49 46 38 37 61       | image/gif                | "GIF87a" |
476	     | 47 49 46 38 39 61       | image/gif                | "GIF89a" |
477	     | 89 50 4E 47 0D 0A 1A 0A | image/png                |          |
478	     | FF D8 FF                | image/jpeg               |          |
479	     | 42 4D                   | image/bmp                | "BM"     |
480	     | 00 00 01 00             | image/vnd.microsoft.icon |          |
481	     +-------------------------+--------------------------+----------+

483	   Otherwise, the sniffed type of the resource is the same as its
484	   official type.

486	7.  Feed or HTML

488	   1.   The user agent may wait for 512 or more bytes of the resource to
489	        be available.

491	   2.   Let s be the stream of bytes, and let s[i] represent the byte in
492	        s with position i, treating s as zero-indexed (so the first byte
493	        is at i=0).

495	   3.   If at any point this algorithm requires the user agent to
496	        determine the value of a byte in s which is not yet available,
497	        or which is past the first 512 bytes of the resource, or which
498	        is beyond the end of the resource, the user agent must stop this
499	        algorithm, and assume that the sniffed type of the resource is
500	        "text/html".

502	           Note: User agents are allowed, by the first step of this
503	           algorithm, to wait until the first 512 bytes of the resource
504	           are available.

506	   4.   Initialize pos to 0.

508	   5.   If s[0] is 0xEF, s[1] is 0xBB, and s[2] is 0xBF, then set pos to
509	        3.  (This skips over a leading UTF-8 BOM, if any.)

511	   6.   Loop start: Examine s[pos].

513	        *  If it is 0x09 (ASCII tab), 0x20 (ASCII space), 0x0A (ASCII
514	           LF), or 0x0D (ASCII CR)

516	              Increase pos by 1 and repeat this step.

518	        *  If it is 0x3C (ASCII "<")

520	              Increase pos by 1 and go to the next step.

522	        *  If it is anything else

524	              The sniffed type of the resource is "text/html".  Abort
525	              these steps.

527	   7.   If the bytes with positions pos to pos+2 in s are exactly equal
528	        to 0x21, 0x2D, 0x2D respectively (ASCII for "!--"), then:

530	        1.  Increase pos by 3.

532	        2.  If the bytes with positions pos to pos+2 in s are exactly
533	            equal to 0x2D, 0x2D, 0x3E respectively (ASCII for "-->"),
534	            then increase pos by 3 and jump back to the previous step
535	            (the step labeled loop start) in the overall algorithm in
536	            this section.

538	        3.  Otherwise, increase pos by 1.

540	        4.  Return to step 2 in these substeps.

542	   8.   If s[pos] is 0x21 (ASCII "!"):

544	        1.  Increase pos by 1.

546	        2.  If s[pos] equal 0x3E, then increase pos by 1 and jump back
547	            to the step labeled loop start in the overall algorithm in
548	            this section.

550	        3.  Otherwise, return to step 1 in these substeps.

552	   9.   If s[pos] is 0x3F (ASCII "?"):

554	        1.  Increase pos by 1.

556	        2.  If s[pos] and s[pos+1] equal 0x3F and 0x3E respectively,
557	            then increase pos by 1 and jump back to the step labeled
558	            loop start in the overall algorithm in this section.

560	        3.  Otherwise, return to step 1 in these substeps.

562	   10.  Otherwise, if the bytes in s starting at pos match any of the
563	        sequences of bytes in the first column of the following table,
564	        then the user agent must follow the steps given in the
565	        corresponding cell in the second column of the same row.

567	+----------------------+-----------------------------------+-----------+
568	| Bytes in Hexadecimal | Requirement                       | Comment   |
569	+----------------------+-----------------------------------+-----------+
570	| 72 73 73             | The sniffed type of the resource  | "rss"     |
571	|                      | is "application/rss+xml"; abort   |           |
572	|                      | these steps.                      |           |
573	+----------------------+-----------------------------------+-----------+
574	| 66 65 65 64          | The sniffed type of the resource  | "feed"    |
575	|                      | si "application/atom+xml"; abort  |           |
576	|                      | these steps.                      |           |
577	+----------------------+-----------------------------------+-----------+
578	| 72 64 66 3A 52 44 46 | Continue to the next step in this | "rdf:RDF" |
579	|                      | algorithm.                        |           |
580	+----------------------+-----------------------------------+-----------+

582	        If none of the byte sequences above match the bytes in s
583	        starting at pos, then the sniffed type of the resource is "text/
584	        html".  Abort these steps.

586	   11.  ????  If, before the next ">", you find two xmlns* attributes
587	        with http://www.w3.org/1999/02/22-rdf-syntax-ns# and
588	        http://purl.org/rss/1.0/ as the namespaces, then the sniffed
589	        type of the resource is "application/rss+xml", abort these
590	        steps. (maybe we only need to check for http://purl.org/rss/1.0/
591	        actually) ????

593	   12.  Otherwise, the sniffed type of the resource is "text/html".

595	   For efficiency reasons, implementations may wish to implement this
596	   algorithm and the algorithm for detecting the character encoding of
597	   HTML documents in parallel.

599	Authors' Addresses

601	   Adam Barth
602	   Univeristy of California, Berkeley

604	   Email: abarth@eecs.berkeley.edu
605	   URI:   http://www.adambarth.com/

607	   Ian Hickson
608	   Google, Inc.

610	   Email: ian@hixie.ch
611	   URI:   http://ln.hixie.ch/