| < draft-alakuijala-brotli-08.txt | draft-alakuijala-brotli-11.txt > | |||
|---|---|---|---|---|
| Network Working Group J. Alakuijala | Network Working Group J. Alakuijala | |||
| Internet-Draft Z. Szabadka | Internet-Draft Z. Szabadka | |||
| Intended Status: Informational Google, Inc | Intended Status: Informational Google, Inc | |||
| Expires: June 10, 2016 December 2015 | Expires: November 24, 2016 May 2016 | |||
| Brotli Compressed Data Format | Brotli Compressed Data Format | |||
| draft-alakuijala-brotli-08 | draft-alakuijala-brotli-11 | |||
| Abstract | Abstract | |||
| This specification defines a lossless compressed data format that | This specification defines a lossless compressed data format that | |||
| compresses data using a combination of the LZ77 algorithm and Huffman | compresses data using a combination of the LZ77 algorithm and Huffman | |||
| coding, with efficiency comparable to the best currently available | coding, with efficiency comparable to the best currently available | |||
| general-purpose compression methods. | general-purpose compression methods. | |||
| Status of this Memo | Status of this Memo | |||
| skipping to change at page 1, line 33 ¶ | skipping to change at page 1, line 33 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on June 10, 2016. | This Internet-Draft will expire on November 24, 2016. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2015 IETF Trust and the persons identified as the | Copyright (c) 2016 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.1. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.2. Intended audience . . . . . . . . . . . . . . . . . . . . 3 | 1.2. Intended audience . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.4. Compliance . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.4. Compliance . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.5. Definitions of terms and conventions used . . . . . . . . 4 | 1.5. Definitions of terms and conventions used . . . . . . . . 4 | |||
| 1.5.1. Packing into bytes . . . . . . . . . . . . . . . . . 4 | 1.5.1. Packing into bytes . . . . . . . . . . . . . . . . . 5 | |||
| 2. Compressed representation overview . . . . . . . . . . . . . . 5 | 2. Compressed representation overview . . . . . . . . . . . . . . 6 | |||
| 3. Compressed representation of prefix codes . . . . . . . . . . . 9 | 3. Compressed representation of prefix codes . . . . . . . . . . 10 | |||
| 3.1. Introduction to prefix coding . . . . . . . . . . . . . . 9 | 3.1. Introduction to prefix coding . . . . . . . . . . . . . . 10 | |||
| 3.2. Use of prefix coding in the brotli format . . . . . . . . 10 | 3.2. Use of prefix coding in the brotli format . . . . . . . . 11 | |||
| 3.3. Alphabet sizes . . . . . . . . . . . . . . . . . . . . . 12 | 3.3. Alphabet sizes . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 3.4. Simple prefix codes . . . . . . . . . . . . . . . . . . . 13 | 3.4. Simple prefix codes . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.5. Complex prefix codes . . . . . . . . . . . . . . . . . . 14 | 3.5. Complex prefix codes . . . . . . . . . . . . . . . . . . 14 | |||
| 4. Encoding of distances . . . . . . . . . . . . . . . . . . . . 16 | 4. Encoding of distances . . . . . . . . . . . . . . . . . . . . 17 | |||
| 5. Encoding of literal insertion lengths and copy lengths . . . . 18 | 5. Encoding of literal insertion lengths and copy lengths . . . . 18 | |||
| 6. Encoding of block switch commands . . . . . . . . . . . . . . 20 | 6. Encoding of block switch commands . . . . . . . . . . . . . . 20 | |||
| 7. Context modeling . . . . . . . . . . . . . . . . . . . . . . . 22 | 7. Context modeling . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
| 7.1. Context modes and context ID lookup for literals . . . . 22 | 7.1. Context modes and context ID lookup for literals . . . . 22 | |||
| 7.2. Context ID for distances . . . . . . . . . . . . . . . . 24 | 7.2. Context ID for distances . . . . . . . . . . . . . . . . 24 | |||
| 7.3. Encoding of the context map . . . . . . . . . . . . . . . 24 | 7.3. Encoding of the context map . . . . . . . . . . . . . . . 24 | |||
| 8. Static dictionary . . . . . . . . . . . . . . . . . . . . . . 26 | 8. Static dictionary . . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 9. Compressed data format . . . . . . . . . . . . . . . . . . . . 28 | 9. Compressed data format . . . . . . . . . . . . . . . . . . . . 29 | |||
| 9.1. Format of the stream header . . . . . . . . . . . . . . . 29 | 9.1. Format of the stream header . . . . . . . . . . . . . . . 29 | |||
| 9.2. Format of the meta-block header . . . . . . . . . . . . . 29 | 9.2. Format of the meta-block header . . . . . . . . . . . . . 30 | |||
| 9.3. Format of the meta-block data . . . . . . . . . . . . . . 32 | 9.3. Format of the meta-block data . . . . . . . . . . . . . . 32 | |||
| 10. Decoding algorithm . . . . . . . . . . . . . . . . . . . . . 33 | 10. Decoding algorithm . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 11. Security Considerations . . . . . . . . . . . . . . . . . . . 36 | 11. Considerations for compressor implementations . . . . . . . . 36 | |||
| 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 | 11.1. Trivial compressor . . . . . . . . . . . . . . . . . . . 36 | |||
| 13. Informative References . . . . . . . . . . . . . . . . . . . 36 | 11.2. Aligning compressed meta-blocks to byte boundaries . . . 37 | |||
| 14. Source code . . . . . . . . . . . . . . . . . . . . . . . . . 37 | 11.3. Creating self-contained parts within the compressed data 37 | |||
| 15. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 37 | 12. Security Considerations . . . . . . . . . . . . . . . . . . . 38 | |||
| Appendix A. Static dictionary data . . . . . . . . . . . . . . . 37 | 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40 | |||
| Appendix B. List of word transformations . . . . . . . . . . . . 117 | 14. Informative References . . . . . . . . . . . . . . . . . . . 40 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 120 | 15. Source code . . . . . . . . . . . . . . . . . . . . . . . . . 40 | |||
| 16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 41 | ||||
| Appendix A. Static dictionary data . . . . . . . . . . . . . . . 41 | ||||
| Appendix B. List of word transformations . . . . . . . . . . . . 121 | ||||
| Appendix C. Computing CRC-32 check values . . . . . . . . . . . 124 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 124 | ||||
| 1. Introduction | 1. Introduction | |||
| 1.1. Purpose | 1.1. Purpose | |||
| The purpose of this specification is to define a lossless compressed | The purpose of this specification is to define a lossless compressed | |||
| data format that: | data format that: | |||
| * Is independent of CPU type, operating system, file system, | * Is independent of CPU type, operating system, file system, | |||
| and character set, and hence can be used for interchange; | and character set, and hence can be used for interchange; | |||
| skipping to change at page 3, line 30 ¶ | skipping to change at page 3, line 30 ¶ | |||
| best currently available general-purpose compression methods, | best currently available general-purpose compression methods, | |||
| and in particular, considerably better than the gzip program; | and in particular, considerably better than the gzip program; | |||
| * Decompresses much faster than current LZMA implementations. | * Decompresses much faster than current LZMA implementations. | |||
| The data format defined by this specification does not attempt to: | The data format defined by this specification does not attempt to: | |||
| * Allow random access to compressed data; | * Allow random access to compressed data; | |||
| * Compress specialized data (e.g., raster graphics) as well | * Compress specialized data (e.g., raster graphics) as well | |||
| as the best currently available specialized algorithms. | as the best currently available specialized algorithms. | |||
| This document is the authoritative specification of the brotli | ||||
| compressed data format. It defines the set of valid brotli compressed | ||||
| data streams and a decoder algorithm that produces the uncompressed | ||||
| data stream from a valid brotli compressed data stream. | ||||
| 1.2. Intended audience | 1.2. Intended audience | |||
| This specification is intended for use by software implementers to | This specification is intended for use by software implementers to | |||
| compress data into and/or decompress data from the brotli format. | compress data into and/or decompress data from the brotli format. | |||
| The text of the specification assumes a basic background in | The text of the specification assumes a basic background in | |||
| programming at the level of bits and other primitive data | programming at the level of bits and other primitive data | |||
| representations. Familiarity with the technique of Huffman coding is | representations. Familiarity with the technique of Huffman coding is | |||
| helpful but not required. | helpful but not required. | |||
| skipping to change at page 5, line 26 ¶ | skipping to change at page 5, line 38 ¶ | |||
| starting with the least-significant bit of the data | starting with the least-significant bit of the data | |||
| element. These are referred to here as integer values | element. These are referred to here as integer values | |||
| and are considered unsigned. | and are considered unsigned. | |||
| * Prefix codes are packed starting with the most-significant | * Prefix codes are packed starting with the most-significant | |||
| bit of the code. | bit of the code. | |||
| In other words, if one were to print out the compressed data as a | In other words, if one were to print out the compressed data as a | |||
| sequence of bytes, starting with the first byte at the *right* margin | sequence of bytes, starting with the first byte at the *right* margin | |||
| and proceeding to the *left*, with the most-significant bit of each | and proceeding to the *left*, with the most-significant bit of each | |||
| byte on the left as usual, one would be able to parse the result from | byte on the left as usual, one would be able to parse the result from | |||
| right to left, with fixed-width elements in the correct MSB-to-LSB | right to left, with fixed-width elements in the correct msb-to-lsb | |||
| order and prefix codes in bit-reversed order (i.e., with the first | order and prefix codes in bit-reversed order (i.e., with the first | |||
| bit of the code in the relative LSB position). | bit of the code in the relative lsb position). | |||
| As an example, consider packing the following data elements into a | ||||
| sequence of 3 bytes: 3-bit integer value 6, 4-bit integer value 2, | ||||
| prefix code 110, prefix code 10, 12-bit integer value 3628. | ||||
| byte 2 byte 1 byte 0 | ||||
| +--------+--------+--------+ | ||||
| |11100010|11000101|10010110| | ||||
| +--------+--------+--------+ | ||||
| ^ ^ ^ ^ ^ | ||||
| | | | | | | ||||
| | | | | +------ integer value 6 | ||||
| | | | +---------- integer value 2 | ||||
| | | +-------------- prefix code 110 | ||||
| | +---------------- prefix code 10 | ||||
| +----------------------------- integer value 3628 | ||||
| 2. Compressed representation overview | 2. Compressed representation overview | |||
| A compressed data set consists of a header and a series of meta- | A compressed data set consists of a header and a series of meta- | |||
| blocks. Each meta-block decompresses to a sequence of 0 to 16,777,216 | blocks. Each meta-block decompresses to a sequence of 0 to 16,777,216 | |||
| (16 MiB) uncompressed bytes. The final uncompressed data is the | (16 MiB) uncompressed bytes. The final uncompressed data is the | |||
| concatenation of the uncompressed sequences from each meta-block. | concatenation of the uncompressed sequences from each meta-block. | |||
| The header contains the size of the sliding window that was used | The header contains the size of the sliding window that was used | |||
| during compression. The decompressor must retain at least that | during compression. The decompressor must retain at least that | |||
| skipping to change at page 14, line 36 ¶ | skipping to change at page 14, line 46 ¶ | |||
| or 1, 2, 3, 3 (tree-select bit 1). | or 1, 2, 3, 3 (tree-select bit 1). | |||
| 3.5. Complex prefix codes | 3.5. Complex prefix codes | |||
| A complex prefix code is a canonical prefix code, defined by the | A complex prefix code is a canonical prefix code, defined by the | |||
| sequence of code lengths, as discussed in Section 3.2., above. For | sequence of code lengths, as discussed in Section 3.2., above. For | |||
| even greater compactness, the code length sequences themselves are | even greater compactness, the code length sequences themselves are | |||
| compressed using a prefix code. The alphabet for code lengths is as | compressed using a prefix code. The alphabet for code lengths is as | |||
| follows: | follows: | |||
| 0 - 15: Represent code lengths of 0 - 15 | 0..15: Represent code lengths of 0..15 | |||
| 16: Copy the previous non-zero code length 3 - 6 times | 16: Copy the previous non-zero code length 3..6 times | |||
| The next 2 bits indicate repeat length | The next 2 bits indicate repeat length | |||
| (0 = 3, ... , 3 = 6) | (0 = 3, ... , 3 = 6) | |||
| If this is the first code length, or all previous | If this is the first code length, or all previous | |||
| code lengths are zero, a code length of 8 is | code lengths are zero, a code length of 8 is | |||
| repeated 3 - 6 times | repeated 3..6 times | |||
| A repeated code length code of 16 modifies the | A repeated code length code of 16 modifies the | |||
| repeat count of the previous one as follows: | repeat count of the previous one as follows: | |||
| repeat count = (4 * (repeat count - 2)) + | repeat count = (4 * (repeat count - 2)) + | |||
| (3 - 6 on the next 2 bits) | (3..6 on the next 2 bits) | |||
| Example: Codes 7, 16 (+2 bits 11), 16 (+2 bits 10) | Example: Codes 7, 16 (+2 bits 11), 16 (+2 bits 10) | |||
| will expand to 22 code lengths of 7 | will expand to 22 code lengths of 7 | |||
| (1 + 4 * (6 - 2) + 5) | (1 + 4 * (6 - 2) + 5) | |||
| 17: Repeat a code length of 0 for 3 - 10 times. | 17: Repeat a code length of 0 for 3..10 times. | |||
| (3 bits of length) | (3 bits of length) | |||
| A repeated code length code of 17 modifies the | A repeated code length code of 17 modifies the | |||
| repeat count of the previous one as follows: | repeat count of the previous one as follows: | |||
| repeat count = (8 * (repeat count - 2)) + | repeat count = (8 * (repeat count - 2)) + | |||
| (3 - 10 on the next 3 bits) | (3..10 on the next 3 bits) | |||
| Note that a code of 16 that follows an immediately preceding 16 | Note that a code of 16 that follows an immediately preceding 16 | |||
| modifies the previous repeat count, which becomes the new repeat | modifies the previous repeat count, which becomes the new repeat | |||
| count. The same is true for a 17 following a 17. A sequence of three | count. The same is true for a 17 following a 17. A sequence of three | |||
| or more 16 codes in a row or three of more 17 codes in a row is | or more 16 codes in a row or three of more 17 codes in a row is | |||
| possible, modifying the count each time. Only the final repeat count | possible, modifying the count each time. Only the final repeat count | |||
| is used. The modification only applies if the same code follows. A 16 | is used. The modification only applies if the same code follows. A 16 | |||
| repeat does not modify an immediately preceding 17 count nor vice | repeat does not modify an immediately preceding 17 count nor vice | |||
| versa. | versa. | |||
| skipping to change at page 15, line 51 ¶ | skipping to change at page 16, line 14 ¶ | |||
| Code lengths for symbols in the code length alphabet given | Code lengths for symbols in the code length alphabet given | |||
| just above, in the order: 1, 2, 3, 4, 0, 5, 17, 6, 16, 7, | just above, in the order: 1, 2, 3, 4, 0, 5, 17, 6, 16, 7, | |||
| 8, 9, 10, 11, 12, 13, 14, 15. If HSKIP is 2, then the | 8, 9, 10, 11, 12, 13, 14, 15. If HSKIP is 2, then the | |||
| code lengths for symbols 1 and 2 are zero, and the first | code lengths for symbols 1 and 2 are zero, and the first | |||
| code length is for symbol 3. If HSKIP is 3, then the code | code length is for symbol 3. If HSKIP is 3, then the code | |||
| length for symbol 3 is also zero, and the first code length | length for symbol 3 is also zero, and the first code length | |||
| is for symbol 4. | is for symbol 4. | |||
| The code lengths of code length symbols are between 0 and | The code lengths of code length symbols are between 0 and | |||
| 5, and they are represented with 2 - 4 bits according to | 5, and they are represented with 2..4 bits according to | |||
| the variable length code above. A code length of 0 means | the variable length code above. A code length of 0 means | |||
| the corresponding code length symbol is not used. | the corresponding code length symbol is not used. | |||
| If HSKIP is 2 or 3, a respective number of leading code | If HSKIP is 2 or 3, a respective number of leading code | |||
| lengths are implicit zeros and are not present in the | lengths are implicit zeros and are not present in the | |||
| code lengths sequence above. | code lengths sequence above. | |||
| If there are at least two non-zero code lengths, any | If there are at least two non-zero code lengths, any | |||
| trailing zero code lengths are omitted, i.e. the last | trailing zero code lengths are omitted, i.e. the last | |||
| code length in the sequence must be non-zero. In this | code length in the sequence must be non-zero. In this | |||
| skipping to change at page 17, line 6 ¶ | skipping to change at page 17, line 16 ¶ | |||
| As described in Section 2., one component of a compressed meta-block | As described in Section 2., one component of a compressed meta-block | |||
| is a sequence of backward distances. In this section we provide the | is a sequence of backward distances. In this section we provide the | |||
| details to the encoding of distances. | details to the encoding of distances. | |||
| Each distance in the compressed data part of a meta-block is | Each distance in the compressed data part of a meta-block is | |||
| represented with a pair <distance code, extra bits>. The distance | represented with a pair <distance code, extra bits>. The distance | |||
| code and the extra bits are encoded back-to-back, the distance code | code and the extra bits are encoded back-to-back, the distance code | |||
| is encoded using a prefix code over the distance alphabet, while the | is encoded using a prefix code over the distance alphabet, while the | |||
| extra bits value is encoded as a fixed-width integer value. The | extra bits value is encoded as a fixed-width integer value. The | |||
| number of extra bits can be 0 - 24, and it is dependent on the | number of extra bits can be 0..24, and it is dependent on the | |||
| distance code. | distance code. | |||
| To convert a distance code and associated extra bits to a backward | To convert a distance code and associated extra bits to a backward | |||
| distance, we need the sequence of past distances and two additional | distance, we need the sequence of past distances and two additional | |||
| parameters, the number of "postfix bits", denoted by NPOSTFIX (0..3), | parameters, the number of "postfix bits", denoted by NPOSTFIX (0..3), | |||
| and the number of direct distance codes, denoted by NDIRECT (0..120). | and the number of direct distance codes, denoted by NDIRECT (0..120). | |||
| Both of these parameters are encoded in the meta-block header. We | Both of these parameters are encoded in the meta-block header. We | |||
| will also use the following derived parameter: | will also use the following derived parameter: | |||
| POSTFIX_MASK = (1 << NPOSTFIX) - 1 | POSTFIX_MASK = (1 << NPOSTFIX) - 1 | |||
| skipping to change at page 18, line 43 ¶ | skipping to change at page 19, line 4 ¶ | |||
| Each <insertion length, copy length> pair in the compressed data part | Each <insertion length, copy length> pair in the compressed data part | |||
| of a meta-block is represented with the following triplet: | of a meta-block is represented with the following triplet: | |||
| <insert-and-copy length code, insert extra bits, copy extra bits> | <insert-and-copy length code, insert extra bits, copy extra bits> | |||
| The insert-and-copy length code, the insert extra bits, and the copy | The insert-and-copy length code, the insert extra bits, and the copy | |||
| extra bits are encoded back-to-back, the insert-and-copy length code | extra bits are encoded back-to-back, the insert-and-copy length code | |||
| is encoded using a prefix code over the insert-and-copy length code | is encoded using a prefix code over the insert-and-copy length code | |||
| alphabet, while the extra bits values are encoded as fixed-width | alphabet, while the extra bits values are encoded as fixed-width | |||
| integer values. The number of insert and copy extra bits can be 0 - | integer values. The number of insert and copy extra bits can be | |||
| 24, and they are dependent on the insert-and-copy length code. | 0..24, and they are dependent on the insert-and-copy length code. | |||
| Some of the insert-and-copy length codes also express the fact that | Some of the insert-and-copy length codes also express the fact that | |||
| the distance symbol of the distance in the same command is 0, i.e. | the distance symbol of the distance in the same command is 0, i.e. | |||
| the distance component of the command is the same as that of the | the distance component of the command is the same as that of the | |||
| previous command. In this case, the distance code and extra bits for | previous command. In this case, the distance code and extra bits for | |||
| the distance are omitted from the compressed data stream. | the distance are omitted from the compressed data stream. | |||
| We describe the insert-and-copy length code alphabet in terms of the | We describe the insert-and-copy length code alphabet in terms of the | |||
| (not directly used) insert length code and copy length code | (not directly used) insert length code and copy length code | |||
| alphabets. The symbols of the insert length code alphabet, along with | alphabets. The symbols of the insert length code alphabet, along with | |||
| the number of insert extra bits, and the range of the insert lengths | the number of insert extra bits, and the range of the insert lengths | |||
| are as follows: | are as follows: | |||
| Extra Extra Extra | Extra Extra Extra | |||
| Code Bits Lengths Code Bits Lengths Code Bits Lengths | Code Bits Lengths Code Bits Lengths Code Bits Lengths | |||
| ---- ---- ------ ---- ---- ------- ---- ---- ------- | ---- ---- ------- ---- ---- ------- ---- ---- ------- | |||
| 0 0 0 8 2 10-13 16 6 130-193 | 0 0 0 8 2 10..13 16 6 130..193 | |||
| 1 0 1 9 2 14-17 17 7 194-321 | 1 0 1 9 2 14..17 17 7 194..321 | |||
| 2 0 2 10 3 18-25 18 8 322-577 | 2 0 2 10 3 18..25 18 8 322..577 | |||
| 3 0 3 11 3 26-33 19 9 578-1089 | 3 0 3 11 3 26..33 19 9 578..1089 | |||
| 4 0 4 12 4 34-49 20 10 1090-2113 | 4 0 4 12 4 34..49 20 10 1090..2113 | |||
| 5 0 5 13 4 50-65 21 12 2114-6209 | 5 0 5 13 4 50..65 21 12 2114..6209 | |||
| 6 1 6,7 14 5 66-97 22 14 6210-22593 | 6 1 6,7 14 5 66..97 22 14 6210..22593 | |||
| 7 1 8,9 15 5 98-129 23 24 22594-16799809 | 7 1 8,9 15 5 98..129 23 24 22594..16799809 | |||
| The symbols of the copy length code alphabet, along with the number | The symbols of the copy length code alphabet, along with the number | |||
| of copy extra bits, and the range of copy lengths are as follows: | of copy extra bits, and the range of copy lengths are as follows: | |||
| Extra Extra Extra | Extra Extra Extra | |||
| Code Bits Lengths Code Bits Lengths Code Bits Lengths | Code Bits Lengths Code Bits Lengths Code Bits Lengths | |||
| ---- ---- ------ ---- ---- ------- ---- ---- ------- | ---- ---- ------- ---- ---- ------- ---- ---- ------- | |||
| 0 0 2 8 1 10,11 16 5 70-101 | 0 0 2 8 1 10,11 16 5 70..101 | |||
| 1 0 3 9 1 12,13 17 5 102-133 | 1 0 3 9 1 12,13 17 5 102..133 | |||
| 2 0 4 10 2 14-17 18 6 134-197 | 2 0 4 10 2 14..17 18 6 134..197 | |||
| 3 0 5 11 2 18-21 19 7 198-325 | 3 0 5 11 2 18..21 19 7 198..325 | |||
| 4 0 6 12 3 22-29 20 8 326-581 | 4 0 6 12 3 22..29 20 8 326..581 | |||
| 5 0 7 13 3 30-37 21 9 582-1093 | 5 0 7 13 3 30..37 21 9 582..1093 | |||
| 6 0 8 14 4 38-53 22 10 1094-2117 | 6 0 8 14 4 38..53 22 10 1094..2117 | |||
| 7 0 9 15 4 54-69 23 24 2118-16779333 | 7 0 9 15 4 54..69 23 24 2118..16779333 | |||
| To convert an insert-and-copy length code to an insert length code | To convert an insert-and-copy length code to an insert length code | |||
| and a copy length code, the following table can be used: | and a copy length code, the following table can be used: | |||
| Insert | Insert | |||
| length Copy length code | length Copy length code | |||
| code 0-7 8-15 16-23 | code 0..7 8..15 16..23 | |||
| +---------+---------+ | +----------+----------+ | |||
| | | | | | | | | |||
| 0-7 | 0-63 | 64-127 | <--- distance symbol 0 | 0..7 | 0..63 | 64..127 | <--- distance symbol 0 | |||
| | | | | | | | | |||
| +---------+---------+---------+ | +----------+----------+----------+ | |||
| | | | | | | | | | | |||
| 0-7 | 128-191 | 192-255 | 384-447 | | 0..7 | 128..191 | 192..255 | 384..447 | | |||
| | | | | | | | | | | |||
| +---------+---------+---------+ | +----------+----------+----------+ | |||
| | | | | | | | | | | |||
| 8-15 | 256-319 | 320-383 | 512-575 | | 8..15 | 256..319 | 320..383 | 512..575 | | |||
| | | | | | | | | | | |||
| +---------+---------+---------+ | +----------+----------+----------+ | |||
| | | | | | | | | | | |||
| 16-23 | 448-511 | 576-639 | 640-703 | | 16..23 | 448..511 | 576..639 | 640..703 | | |||
| | | | | | | | | | | |||
| +---------+---------+---------+ | +----------+----------+----------+ | |||
| First, look up the cell with the 64 value range containing the | First, look up the cell with the 64 value range containing the | |||
| insert-and-copy length code, this gives the insert length code and | insert-and-copy length code, this gives the insert length code and | |||
| the copy length code ranges, both 8 values long. The copy length | the copy length code ranges, both 8 values long. The copy length | |||
| code within its range is determined by bits 0-2 (counted from the | code within its range is determined by bits 0..2 (counted from the | |||
| LSB) of the insert-and-copy length code. The insert length code | lsb) of the insert-and-copy length code. The insert length code | |||
| within its range is determined by bits 3-5 (counted from the LSB) of | within its range is determined by bits 3..5 (counted from the lsb) of | |||
| the insert-and-copy length code. Given the insert length and copy | the insert-and-copy length code. Given the insert length and copy | |||
| length codes, the actual insert and copy lengths can be obtained by | length codes, the actual insert and copy lengths can be obtained by | |||
| reading the number of extra bits given by the tables above. | reading the number of extra bits given by the tables above. | |||
| If the insert-and-copy length code is between 0 and 127, the distance | If the insert-and-copy length code is between 0 and 127, the distance | |||
| code of the command is set to zero (the last distance reused). | code of the command is set to zero (the last distance reused). | |||
| 6. Encoding of block switch commands | 6. Encoding of block switch commands | |||
| As described in Section 2., a block-switch command is a pair <block | As described in Section 2., a block-switch command is a pair <block | |||
| skipping to change at page 21, line 5 ¶ | skipping to change at page 21, line 5 ¶ | |||
| particular block category. | particular block category. | |||
| Each block type in the compressed data is represented with a block | Each block type in the compressed data is represented with a block | |||
| type code, encoded using a prefix code over the block type code | type code, encoded using a prefix code over the block type code | |||
| alphabet. A block type symbol 0 means that the new block type is the | alphabet. A block type symbol 0 means that the new block type is the | |||
| same as the type of the previous block from the same block category, | same as the type of the previous block from the same block category, | |||
| i.e. the block type that preceded the current type, while a block | i.e. the block type that preceded the current type, while a block | |||
| type symbol 1 means that the new block type equals the current block | type symbol 1 means that the new block type equals the current block | |||
| type plus one. If the current block type is the maximal possible, | type plus one. If the current block type is the maximal possible, | |||
| then a block type symbol of 1 results in wrapping to a new block type | then a block type symbol of 1 results in wrapping to a new block type | |||
| of 0. Block type symbols 2 - 257 represent block types 0 - 255 | of 0. Block type symbols 2..257 represent block types 0..255 | |||
| respectively. The previous and current block types are initialized to | respectively. The previous and current block types are initialized to | |||
| 1 and 0, respectively, at the end of the meta-block header. | 1 and 0, respectively, at the end of the meta-block header. | |||
| Since the first block type of each block category is 0, the block | Since the first block type of each block category is 0, the block | |||
| type of the first block-switch command is not encoded in the | type of the first block-switch command is not encoded in the | |||
| compressed data. If a block category has only one block type, the | compressed data. If a block category has only one block type, the | |||
| block count of the first block-switch command is also omitted from | block count of the first block-switch command is also omitted from | |||
| the compressed data, otherwise it is encoded in the meta-block | the compressed data, otherwise it is encoded in the meta-block | |||
| header. | header. | |||
| skipping to change at page 21, line 37 ¶ | skipping to change at page 21, line 37 ¶ | |||
| [0..NBLTYPESI-1], and [0..NBLTYPESD-1], respectively. From this it | [0..NBLTYPESI-1], and [0..NBLTYPESD-1], respectively. From this it | |||
| follows that the alphabet size of literal, insert-and-copy length, | follows that the alphabet size of literal, insert-and-copy length, | |||
| and distance block type codes is NBLTYPESL + 2, NBLTYPESI + 2, and | and distance block type codes is NBLTYPESL + 2, NBLTYPESI + 2, and | |||
| NBLTYPESD + 2, respectively. | NBLTYPESD + 2, respectively. | |||
| Each block count in the compressed data is represented with a pair | Each block count in the compressed data is represented with a pair | |||
| <block count code, extra bits>. The block count code and the extra | <block count code, extra bits>. The block count code and the extra | |||
| bits are encoded back-to-back, the block count code is encoded using | bits are encoded back-to-back, the block count code is encoded using | |||
| a prefix code over the block count code alphabet, while the extra | a prefix code over the block count code alphabet, while the extra | |||
| bits value is encoded as a fixed-width integer value. The number of | bits value is encoded as a fixed-width integer value. The number of | |||
| extra bits can be 0 - 24, and it is dependent on the block count | extra bits can be 0..24, and it is dependent on the block count code. | |||
| code. The symbols of the block count code alphabet, along with the | The symbols of the block count code alphabet, along with the number | |||
| number of extra bits, and the range of block counts are as follows: | of extra bits, and the range of block counts are as follows: | |||
| Extra Extra Extra | Extra Extra Extra | |||
| Code Bits Lengths Code Bits Lengths Code Bits Lengths | Code Bits Lengths Code Bits Lengths Code Bits Lengths | |||
| ---- ---- ------ ---- ---- ------- ---- ---- ------- | ---- ---- ------- ---- ---- ------- ---- ---- ------- | |||
| 0 2 1-4 9 4 65-80 18 7 369-496 | 0 2 1..4 9 4 65..80 18 7 369..496 | |||
| 1 2 5-8 10 4 81-96 19 8 497-752 | 1 2 5..8 10 4 81..96 19 8 497..752 | |||
| 2 2 9-12 11 4 97-112 20 9 753-1264 | 2 2 9..12 11 4 97..112 20 9 753..1264 | |||
| 3 2 13-16 12 5 113-144 21 10 1265-2288 | 3 2 13..16 12 5 113..144 21 10 1265..2288 | |||
| 4 3 17-24 13 5 145-176 22 11 2289-4336 | 4 3 17..24 13 5 145..176 22 11 2289..4336 | |||
| 5 3 25-32 14 5 177-208 23 12 4337-8432 | 5 3 25..32 14 5 177..208 23 12 4337..8432 | |||
| 6 3 33-40 15 5 209-240 24 13 8433-16624 | 6 3 33..40 15 5 209..240 24 13 8433..16624 | |||
| 7 3 41-48 16 6 241-304 25 24 16625-16793840 | 7 3 41..48 16 6 241..304 25 24 16625..16793840 | |||
| 8 4 49-64 17 6 305-368 | 8 4 49..64 17 6 305..368 | |||
| The first block-switch command of each block category is special in | The first block-switch command of each block category is special in | |||
| the sense that it is encoded in the meta-block header, and as | the sense that it is encoded in the meta-block header, and as | |||
| described earlier, the block type code is omitted since it is an | described earlier, the block type code is omitted since it is an | |||
| implicit zero. | implicit zero. | |||
| 7. Context modeling | 7. Context modeling | |||
| As described in Section 2., the prefix tree used to encode a literal | As described in Section 2., the prefix tree used to encode a literal | |||
| byte or a distance code depends on the block type and the context ID. | byte or a distance code depends on the block type and the context ID. | |||
| skipping to change at page 24, line 13 ¶ | skipping to change at page 24, line 13 ¶ | |||
| 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, | 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, | |||
| 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, | 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, | |||
| 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, | 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, | |||
| 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, | 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, | |||
| 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, | 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, | |||
| 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, | 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, | |||
| 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, | 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, | |||
| 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, | 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, | |||
| 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7 | 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7 | |||
| The lengths and the zlib CRC-32 (ITU-T Recommendation V.42) check | The lengths and the CRC-32 check values (see Appendix C.) of each of | |||
| values of each of these tables as a sequence of bytes are as follows: | these tables as a sequence of bytes are as follows: | |||
| Table Length CRC-32 | Table Length CRC-32 | |||
| ----- ------ ------ | ----- ------ ------ | |||
| Lut0 256 0x8e91efb7 | Lut0 256 0x8e91efb7 | |||
| Lut1 256 0xd01a32f4 | Lut1 256 0xd01a32f4 | |||
| Lut2 256 0x0dd7a0d6 | Lut2 256 0x0dd7a0d6 | |||
| Given p1 is the last uncompressed byte and p2 is the second-to-last | Given p1 is the last uncompressed byte and p2 is the second-to-last | |||
| uncompressed byte, the context IDs can be computed as follows: | uncompressed byte, the context IDs can be computed as follows: | |||
| skipping to change at page 25, line 42 ¶ | skipping to change at page 25, line 42 ¶ | |||
| times, read RLEMAX bits for repeat length | times, read RLEMAX bits for repeat length | |||
| RLEMAX + 1: value 1 | RLEMAX + 1: value 1 | |||
| ... | ... | |||
| RLEMAX + NTREES - 1: value NTREES - 1 | RLEMAX + NTREES - 1: value NTREES - 1 | |||
| If RLEMAX = 0, the run length coding is not used, and the symbols of | If RLEMAX = 0, the run length coding is not used, and the symbols of | |||
| the alphabet are directly the values in the context map. We can now | the alphabet are directly the values in the context map. We can now | |||
| define the format of the context map (the same format is used for | define the format of the context map (the same format is used for | |||
| literal and distance context maps): | literal and distance context maps): | |||
| 1-5 bits: RLEMAX, 0 is encoded with one 0 bit, and values | 1..5 bits: RLEMAX, 0 is encoded with one 0 bit, and values | |||
| 1 - 16 are encoded with bit pattern xxxx1 (so 01001 | 1..16 are encoded with bit pattern xxxx1 (so 01001 | |||
| is 5) | is 5) | |||
| Prefix code with alphabet size NTREES + RLEMAX | Prefix code with alphabet size NTREES + RLEMAX | |||
| Context map size values encoded with the above prefix code | Context map size values encoded with the above prefix code | |||
| and run length coding for zero values. If a run length | and run length coding for zero values. If a run length | |||
| would result in more lengths in total than the size of | would result in more lengths in total than the size of | |||
| the context map, then the stream should be rejected as | the context map, then the stream should be rejected as | |||
| invalid. | invalid. | |||
| 1 bit: IMTF bit, if set, we do an inverse move-to-front | 1 bit: IMTF bit, if set, we do an inverse move-to-front | |||
| skipping to change at page 27, line 45 ¶ | skipping to change at page 27, line 45 ¶ | |||
| greater than 120, or the length is smaller than 4 or greater than 24, | greater than 120, or the length is smaller than 4 or greater than 24, | |||
| then the compressed stream should be rejected as invalid. | then the compressed stream should be rejected as invalid. | |||
| Each word transformation has the following form: | Each word transformation has the following form: | |||
| transform_i(word) = prefix_i + T_i(word) + suffix_i | transform_i(word) = prefix_i + T_i(word) + suffix_i | |||
| where the _i subscript denotes the transform_id above. Each T_i is | where the _i subscript denotes the transform_id above. Each T_i is | |||
| one of the following 21 elementary transforms: | one of the following 21 elementary transforms: | |||
| Identity, UppercaseFirst, UppercaseAll, | Identity, FermentFirst, FermentAll, | |||
| OmitFirst1, ..., OmitFirst9, OmitLast1, ..., OmitLast9 | OmitFirst1, ..., OmitFirst9, OmitLast1, ..., OmitLast9 | |||
| The form of these elementary transforms is as follows: | The form of these elementary transforms is as follows: | |||
| Identity(word) = word | Identity(word) = word | |||
| UppercaseFirst(word) = first UTF-8 character of word upper-cased | FermentFirst(word) = see below | |||
| UppercaseAll(word) = all UTF-8 characters of word upper-cased | FermentAll(word) = see below | |||
| OmitFirstk(word) = the last (length(word) - k) bytes of word, or | OmitFirstk(word) = the last (length(word) - k) bytes of word, or | |||
| empty string if length(word) < k | empty string if length(word) < k | |||
| OmitLastk(word) = the first (length(word) - k) bytes of word, or | OmitLastk(word) = the first (length(word) - k) bytes of word, or | |||
| empty string if length(word) < k | empty string if length(word) < k | |||
| For the purposes of UppercaseAll, word is parsed into UTF-8 | We define the FermentFirst and FermentAll transforms used in this | |||
| characters and converted to upper-case by taking 1 - 3 bytes at a | specification by the following C language functions: | |||
| time, using the algorithm below: | ||||
| i = 0 | int Ferment(uint8_t* word, int word_len, int pos) { | |||
| while i < length(word): | if (word[pos] < 192) { | |||
| if word[i] < 192: | if (word[pos] >= 97 and word[pos] <= 122) { | |||
| if word[i] >= 97 and word[i] <= 122: | word[pos] = word[pos] ^ 32; | |||
| word[i] = word[i] ^ 32 | } | |||
| i = i + 1 | return 1; | |||
| else if word[i] < 224: | } else if (word[pos] < 224) { | |||
| if i + 1 < length(word): | if (pos + 1 < word_len) { | |||
| word[i + 1] = word[i + 1] ^ 32 | word[pos + 1] = word[pos + 1] ^ 32; | |||
| i = i + 2 | } | |||
| else: | return 2; | |||
| if i + 2 < length(word): | } else { | |||
| word[i + 2] = word[i + 2] ^ 5 | if (pos + 2 < word_len) { | |||
| i = i + 3 | word[pos + 2] = word[pos + 2] ^ 5; | |||
| } | ||||
| return 3; | ||||
| } | ||||
| } | ||||
| For UppercaseFirst, the same algorithm is used, but the loop is | void FermentFirst(uint8_t* word, int word_len) { | |||
| executed only once. | if (word_len > 0) { | |||
| Ferment(word, word_len, 0); | ||||
| } | ||||
| } | ||||
| void FermentAll(uint8_t* word, int word_len) { | ||||
| int i = 0; | ||||
| while (i < word_len) { | ||||
| i += Ferment(word, word_len, i); | ||||
| } | ||||
| } | ||||
| Appendix B. contains the list of transformations by specifying the | Appendix B. contains the list of transformations by specifying the | |||
| prefix, elementary transform and suffix components of each of them. | prefix, elementary transform and suffix components of each of them. | |||
| Note that the OmitFirst8 elementary transform is not used in the list | Note that the OmitFirst8 elementary transform is not used in the list | |||
| of transformations. The strings in Appendix B. are in C string format | of transformations. The strings in Appendix B. are in C string format | |||
| with respect to escape (backslash) characters. | with respect to escape (backslash) characters. | |||
| The maximum number of additional bytes that a transform may add to a | The maximum number of additional bytes that a transform may add to a | |||
| base word is 13. Since the largest base word is 24 bytes long, a | base word is 13. Since the largest base word is 24 bytes long, a | |||
| buffer of 38 bytes is sufficient to store any transformed words | buffer of 38 bytes is sufficient to store any transformed words | |||
| skipping to change at page 29, line 9 ¶ | skipping to change at page 29, line 22 ¶ | |||
| 9. Compressed data format | 9. Compressed data format | |||
| In this section we describe the format of the compressed data set in | In this section we describe the format of the compressed data set in | |||
| terms of the format of the individual data items described in the | terms of the format of the individual data items described in the | |||
| previous sections. | previous sections. | |||
| 9.1. Format of the stream header | 9.1. Format of the stream header | |||
| The stream header has only the following one field: | The stream header has only the following one field: | |||
| 1-7 bits: WBITS, a value in the range 10 - 24, encoded with | 1..7 bits: WBITS, a value in the range 10..24, encoded with | |||
| the following variable length code (as it appears in | the following variable length code (as it appears in | |||
| the compressed data, where the bits are parsed from | the compressed data, where the bits are parsed from | |||
| right to left): | right to left): | |||
| Value Bit Pattern | Value Bit Pattern | |||
| ----- ----------- | ----- ----------- | |||
| 10 0100001 | 10 0100001 | |||
| 11 0110001 | 11 0110001 | |||
| 12 1000001 | 12 1000001 | |||
| 13 1010001 | 13 1010001 | |||
| 14 1100001 | 14 1100001 | |||
| 15 1110001 | 15 1110001 | |||
| 16 0 | 16 0 | |||
| skipping to change at page 30, line 34 ¶ | skipping to change at page 30, line 49 ¶ | |||
| length | length | |||
| MSKIPBYTES x 8 bits: MSKIPLEN - 1, where MSKIPLEN is | MSKIPBYTES x 8 bits: MSKIPLEN - 1, where MSKIPLEN is | |||
| the number of metadata bytes; this field is | the number of metadata bytes; this field is | |||
| only present if MSKIPBYTES is positive, | only present if MSKIPBYTES is positive, | |||
| otherwise MSKIPLEN is 0 (if MSKIPBYTES is | otherwise MSKIPLEN is 0 (if MSKIPBYTES is | |||
| greater than 1, and the last byte is all | greater than 1, and the last byte is all | |||
| zeros, then the stream should be rejected | zeros, then the stream should be rejected | |||
| as invalid) | as invalid) | |||
| 0 - 7 bits: fill bits until the next byte boundary, | 0..7 bits: fill bits until the next byte boundary, | |||
| must be all zeros | must be all zeros | |||
| MSKIPLEN bytes of metadata, not part of the | MSKIPLEN bytes of metadata, not part of the | |||
| uncompressed data or the sliding window | uncompressed data or the sliding window | |||
| MNIBBLES x 4 bits: MLEN - 1, where MLEN is the length | MNIBBLES x 4 bits: MLEN - 1, where MLEN is the length | |||
| of the meta-block uncompressed data in bytes (if | of the meta-block uncompressed data in bytes (if | |||
| MNIBBLES is greater than 4, and the last | MNIBBLES is greater than 4, and the last | |||
| nibble is all zeros, then the stream should be | nibble is all zeros, then the stream should be | |||
| rejected as invalid) | rejected as invalid) | |||
| 1 bit: ISUNCOMPRESSED, if set to 1, any bits of compressed | 1 bit: ISUNCOMPRESSED, if set to 1, any bits of compressed | |||
| data up to the next byte boundary are ignored, and | data up to the next byte boundary are ignored, and | |||
| the rest of the meta-block contains MLEN bytes of | the rest of the meta-block contains MLEN bytes of | |||
| literal data; this field is only present if the | literal data; this field is only present if the | |||
| ISLAST bit is not set (if the ignored bits are not | ISLAST bit is not set (if the ignored bits are not | |||
| all zeros, the stream should be rejected as invalid) | all zeros, the stream should be rejected as invalid) | |||
| 1-11 bits: NBLTYPESL, # of literal block types, encoded with | 1..11 bits: NBLTYPESL, # of literal block types, encoded with | |||
| the following variable length code (as it appears in | the following variable length code (as it appears in | |||
| the compressed data, where the bits are parsed from | the compressed data, where the bits are parsed from | |||
| right to left, so 0110111 has the value 12): | right to left, so 0110111 has the value 12): | |||
| Value Bit Pattern | Value Bit Pattern | |||
| ----- ----------- | ----- ----------- | |||
| 1 0 | 1 0 | |||
| 2 0001 | 2 0001 | |||
| 3-4 x0011 | 3..4 x0011 | |||
| 5-8 xx0101 | 5..8 xx0101 | |||
| 9-16 xxx0111 | 9..16 xxx0111 | |||
| 17-32 xxxx1001 | 17..32 xxxx1001 | |||
| 33-64 xxxxx1011 | 33..64 xxxxx1011 | |||
| 65-128 xxxxxx1101 | 65..128 xxxxxx1101 | |||
| 129-256 xxxxxxx1111 | 129..256 xxxxxxx1111 | |||
| Prefix code over the block type code alphabet for | Prefix code over the block type code alphabet for | |||
| literal block types, appears only if NBLTYPESL >= 2 | literal block types, appears only if NBLTYPESL >= 2 | |||
| Prefix code over the block count code alphabet for | Prefix code over the block count code alphabet for | |||
| literal block counts, appears only if NBLTYPESL >= 2 | literal block counts, appears only if NBLTYPESL >= 2 | |||
| Block count code + extra bits for first literal | Block count code + extra bits for first literal | |||
| block count, appears only if NBLTYPESL >= 2 | block count, appears only if NBLTYPESL >= 2 | |||
| 1-11 bits: NBLTYPESI, # of insert-and-copy block types, encoded | 1..11 bits: NBLTYPESI, # of insert-and-copy block types, encoded | |||
| with the same variable length code as above | with the same variable length code as above | |||
| Prefix code over the block type code alphabet for | Prefix code over the block type code alphabet for | |||
| insert-and-copy block types, appears only if NBLTYPESI >= 2 | insert-and-copy block types, appears only if NBLTYPESI >= 2 | |||
| Prefix code over the block count code alphabet for | Prefix code over the block count code alphabet for | |||
| insert-and-copy block counts, appears only if NBLTYPESI >= 2 | insert-and-copy block counts, appears only if NBLTYPESI >= 2 | |||
| Block count code + extra bits for first insert-and-copy | Block count code + extra bits for first insert-and-copy | |||
| block count, appears only if NBLTYPESI >= 2 | block count, appears only if NBLTYPESI >= 2 | |||
| 1-11 bits: NBLTYPESD, # of distance block types, encoded | 1..11 bits: NBLTYPESD, # of distance block types, encoded | |||
| with the same variable length code as above | with the same variable length code as above | |||
| Prefix code over the block type code alphabet for | Prefix code over the block type code alphabet for | |||
| distance block types, appears only if NBLTYPESD >= 2 | distance block types, appears only if NBLTYPESD >= 2 | |||
| Prefix code over the block count code alphabet for | Prefix code over the block count code alphabet for | |||
| distance block counts, appears only if NBLTYPESD >= 2 | distance block counts, appears only if NBLTYPESD >= 2 | |||
| Block count code + extra bits for first distance | Block count code + extra bits for first distance | |||
| block count, appears only if NBLTYPESD >= 2 | block count, appears only if NBLTYPESD >= 2 | |||
| 2 bits: NPOSTFIX, parameter used in the distance coding | 2 bits: NPOSTFIX, parameter used in the distance coding | |||
| 4 bits: four most-significant bits of NDIRECT, to get the | 4 bits: four most-significant bits of NDIRECT, to get the | |||
| actual value of the parameter NDIRECT, left-shift | actual value of the parameter NDIRECT, left-shift | |||
| this four-bit number by NPOSTFIX bits | this four-bit number by NPOSTFIX bits | |||
| NBLTYPESL x 2 bits: context mode for each literal block type | NBLTYPESL x 2 bits: context mode for each literal block type | |||
| 1-11 bits: NTREESL, # of literal prefix trees, encoded | 1..11 bits: NTREESL, # of literal prefix trees, encoded | |||
| with the same variable length code as NBLTYPESL | with the same variable length code as NBLTYPESL | |||
| Literal context map, encoded as described in Section 7.3., | Literal context map, encoded as described in Section 7.3., | |||
| appears only if NTREESL >= 2, otherwise the context map | appears only if NTREESL >= 2, otherwise the context map | |||
| has only zero values | has only zero values | |||
| 1-11 bits: NTREESD, # of distance prefix trees, encoded | 1..11 bits: NTREESD, # of distance prefix trees, encoded | |||
| with the same variable length code as NBLTYPESD | with the same variable length code as NBLTYPESD | |||
| Distance context map, encoded as described in Section 7.3., | Distance context map, encoded as described in Section 7.3., | |||
| appears only if NTREESD >= 2, otherwise the context map | appears only if NTREESD >= 2, otherwise the context map | |||
| has only zero values | has only zero values | |||
| NTREESL prefix codes for literals | NTREESL prefix codes for literals | |||
| NBLTYPESI prefix codes for insert-and-copy lengths | NBLTYPESI prefix codes for insert-and-copy lengths | |||
| NTREESD prefix codes for distances | NTREESD prefix codes for distances | |||
| skipping to change at page 36, line 15 ¶ | skipping to change at page 36, line 30 ¶ | |||
| Note that a duplicated string reference may refer to a string in a | Note that a duplicated string reference may refer to a string in a | |||
| previous meta-block, i.e. the backward distance may cross one or more | previous meta-block, i.e. the backward distance may cross one or more | |||
| meta-block boundaries. However a backward copy distance will not | meta-block boundaries. However a backward copy distance will not | |||
| refer past the beginning of the uncompressed stream or the window | refer past the beginning of the uncompressed stream or the window | |||
| size; any such distance is interpreted as a reference to a static | size; any such distance is interpreted as a reference to a static | |||
| dictionary word. Also note that the referenced string may overlap the | dictionary word. Also note that the referenced string may overlap the | |||
| current position, for example, if the last 2 bytes decoded have | current position, for example, if the last 2 bytes decoded have | |||
| values X and Y, a string reference with <length = 5, distance = 2> | values X and Y, a string reference with <length = 5, distance = 2> | |||
| adds X,Y,X,Y,X to the uncompressed stream. | adds X,Y,X,Y,X to the uncompressed stream. | |||
| 11. Security Considerations | 11. Considerations for compressor implementations | |||
| Since the intent of this document is to define the brotli compressed | ||||
| data format without reference to any particular compression | ||||
| algorithm, the material in this section is not part of the definition | ||||
| of the format, and a compressor need not follow it in order to be | ||||
| compliant. | ||||
| 11.1. Trivial compressor | ||||
| In this section we present a very simple algorithm that produces a | ||||
| valid brotli stream representing an arbitrary sequence of | ||||
| uncompressed bytes in the form of the following C++ language | ||||
| function. | ||||
| string BrotliCompressTrivial(const string& u) { | ||||
| if (u.empty()) { | ||||
| return string(1, 6); | ||||
| } | ||||
| int i; | ||||
| string c; | ||||
| c.append(1, 12); | ||||
| for (i = 0; i + 65535 < u.size(); i += 65536) { | ||||
| c.append(1, 248); | ||||
| c.append(1, 255); | ||||
| c.append(1, 15); | ||||
| c.append(&u[i], 65536); | ||||
| } | ||||
| if (i < u.size()) { | ||||
| int r = u.size() - i - 1; | ||||
| c.append(1, (r & 31) << 3); | ||||
| c.append(1, r >> 5); | ||||
| c.append(1, 8 + (r >> 13)); | ||||
| c.append(&u[i], r + 1); | ||||
| } | ||||
| c.append(1, 3); | ||||
| return c; | ||||
| } | ||||
| Note that this simple algorithm does not actually compress data, that | ||||
| is, the brotli representation will always be bigger than the | ||||
| original, but it shows that every sequence of N uncompressed bytes | ||||
| can be represented with a valid brotli stream that is not longer than | ||||
| N + (3 * (N >> 16) + 5) bytes. | ||||
| 11.2. Aligning compressed meta-blocks to byte boundaries | ||||
| As described in Section 9., only those meta-blocks that immediately | ||||
| follow an uncompressed meta-block or a metadata meta-block are | ||||
| guaranteed to start on a byte boundary. In some applications, it | ||||
| might be required that every non-metadata meta-block starts on a byte | ||||
| boundary. This can be achieved by appending an empty metadata meta- | ||||
| block after every non-metadata meta-block that does not end on a byte | ||||
| boundary. | ||||
| 11.3. Creating self-contained parts within the compressed data | ||||
| In some encoder implementations it might be required to make a | ||||
| sequence of bytes within a brotli stream self-contained, that is, | ||||
| such that they can be decompressed independently from previous parts | ||||
| of the compressed data. This is a useful feature for three reasons. | ||||
| First, if a large compressed file is damaged, it is possible to | ||||
| recover some of the file after the damage. Second, it is useful when | ||||
| doing differential transfer of compressed data. If a sequence of | ||||
| uncompressed bytes is unchanged and compressed independently from | ||||
| previous data, then the compressed representation may also be | ||||
| unchanged and can therefore be transferred very cheaply. Third, if | ||||
| sequences of uncompressed bytes are compressed independently, it | ||||
| allows for parallel compression of these byte sequences within the | ||||
| same file, in addition to parallel compression of multiple files. | ||||
| Given two sequences of uncompressed bytes, U0 and U1, we will now | ||||
| describe how to create two sequences of compressed bytes, C0 and C1, | ||||
| such that the concatenation of C0 and C1 is a valid brotli stream, | ||||
| and that C0 and C1 (together with the first byte of C0 that contains | ||||
| the window size) can be decompressed independently from each other to | ||||
| U0 and U1. | ||||
| When compressing the byte sequence U0 to produce C0, we can use any | ||||
| compressor that works on the complete set of uncompressed bytes U0, | ||||
| with the following two changes. First, the ISLAST bit of the last | ||||
| meta-block of C0 must not be set. Second, C0 must end at a byte- | ||||
| boundary, which can be ensured by appending an empty metadata meta- | ||||
| block to it, as in Section 11.2. | ||||
| When compressing the byte sequence U1 to produce C1, we can use any | ||||
| compressor that starts a new meta-block at the beginning of U1 within | ||||
| the U0+U1 input stream, with the following two changes. First, | ||||
| backward distances in C1 must not refer to static dictionary words or | ||||
| uncompressed bytes in U0. Even if a sequence of bytes in U1 would | ||||
| match a static dictionary word, or a sequence of bytes that overlaps | ||||
| U0, the compressor must represent this sequence of bytes with a | ||||
| combination of literal insertions and backward references to bytes in | ||||
| U1 instead. Second, the ring buffer of last four distances must be | ||||
| replenished first with distances in C1 before using it to encode | ||||
| other distances in C1. Note that both compressors producing C0 and C1 | ||||
| have to use the same window size, but the stream header is emitted | ||||
| only by the compressor that produces C0. | ||||
| Note that this method can be easily generalized to more than two | ||||
| sequences of uncompressed bytes. | ||||
| 12. Security Considerations | ||||
| As with any compressed file formats, decompressor implementations | As with any compressed file formats, decompressor implementations | |||
| should handle all compressed data byte sequences, not only those that | should handle all compressed data byte sequences, not only those that | |||
| conform to this specification, where non-conformant compressed data | conform to this specification, where non-conformant compressed data | |||
| sequences should be discarded. A possible attack against a system | sequences should be rejected as invalid. | |||
| containing a decompressor implementation (e.g. a web browser) is to | ||||
| exploit a buffer overflow caused by an invalid compressed data. | ||||
| Therefore decompressor implementations should perform bounds-checking | ||||
| for each memory access that result from values decoded from the | ||||
| compressed stream. | ||||
| 12. IANA Considerations | A possible attack against a system containing a decompressor | |||
| implementation (e.g. a web browser) is to exploit a buffer overflow | ||||
| triggered by invalid compressed data. Therefore decompressor | ||||
| implementations should perform bounds-checking for each memory access | ||||
| that result from values decoded from the compressed stream and | ||||
| derivatives therof. | ||||
| Another possible attack against a system containing a decompressor | ||||
| implementation is to provide it (either valid or invalid) compressed | ||||
| data that can make the decompressor system's resource consumption | ||||
| (cpu, memory, or storage) to be disproportionately large compared to | ||||
| the size of the compressed data. In addition to the size of the | ||||
| compressed data, the amount of cpu, memory and storage required to | ||||
| decompress a single compressed meta-block within a brotli stream is | ||||
| controlled by the following two paramters: the size of the | ||||
| uncompressed meta-block, which is encoded at the start of the | ||||
| compressed meta-block, and the size of the sliding window, which is | ||||
| encoded at the start of the brotli stream. Decompressor | ||||
| implementations in systems where memory or storage is constrained | ||||
| should perform a sanity-check on these two parameters. The | ||||
| uncompressed meta-block size that was decoded from the compressed | ||||
| stream should be compared against either a hard limit, given by the | ||||
| system's constraints or some expectation about the uncompressed data, | ||||
| or against a certain multiple of the size of the compressed data. If | ||||
| the uncompressed meta-block size is determined to be too high, the | ||||
| compressed data should be rejected. Likewise, when the complete | ||||
| uncompressed stream is kept in the system containing the decompressor | ||||
| implementation, the total uncompressed size of the stream should be | ||||
| checked before decompressing each additional meta-block. If the size | ||||
| of the sliding window that was decoded from the start of the | ||||
| compressed stream is greater than a certain soft limit, then the | ||||
| decompressor implementation should, at first, allocate a smaller | ||||
| sliding window that fits the first uncompressed meta-block, and | ||||
| afterwards, before decompressing each additional meta-block, it | ||||
| should increase the size of the sliding window until the sliding | ||||
| window size specified in the compressed data is reached. | ||||
| Correspondingly, possible attacks against a system containing a | ||||
| compressor implementation (e.g. a web server) are to exploit a buffer | ||||
| overflow or cause disproportionately large resource consumption by | ||||
| providing e.g. uncompressible data. As described in Section 11.1., | ||||
| an output buffer of | ||||
| S(N) = N + (3 * (N >> 16) + 5) | ||||
| bytes is sufficient to hold a valid compressed brotli stream | ||||
| representing an arbitrary sequence of N uncompressed bytes. | ||||
| Therefore compressor implementations should allocate at least S(N) | ||||
| bytes of output buffer before compressing N bytes of data with | ||||
| unknown compressibility and should perform bounds-checking for each | ||||
| write into this output buffer. If their output buffer is full, | ||||
| compresor implementations should revert to the trivial compression | ||||
| algorithm described in Section 11.1. The resourse consumption of a | ||||
| compressor implementation for a particular input data depends mostly | ||||
| on the algorithm used to find backward matches and on the algorithm | ||||
| used to construct context maps and prefix codes and only to a lesser | ||||
| extent on the input data itself. If the system containing a | ||||
| compressor implementation is overloaded, a possible way to reduce | ||||
| resource usage is to switch to more simple algorithms for backward | ||||
| reference search and prefix code construction, or to fall back to the | ||||
| trivial compression algorithm described in Section 11.1. | ||||
| A possible attack against a system that sends compressed data over an | ||||
| encrypted channel is the following. An attacker who can repeatedly | ||||
| mix arbitrary (attacker-supplied) data with secret data (passwords, | ||||
| cookies) and observe the length of the ciphertext can potentially | ||||
| reconstruct the secret data. To protect against this kind of attack, | ||||
| applications should not mix sensitive data with non-sensitive, | ||||
| potentially attacker-supplied data in the same compressed stream. | ||||
| 13. IANA Considerations | ||||
| The "HTTP Content Coding Registry" has been updated with the | The "HTTP Content Coding Registry" has been updated with the | |||
| registration below: | registration below: | |||
| +-------+-------------------------------------+------------+ | +-------+-------------------------------------+------------+ | |||
| | Name | Description | Reference | | | Name | Description | Reference | | |||
| +-------+-------------------------------------+------------+ | +-------+-------------------------------------+------------+ | |||
| | br | Brotli Compressed Data Format | RFCXXXX | | | br | Brotli Compressed Data Format | RFCXXXX | | |||
| +-------+-------------------------------------+------------+ | +-------+-------------------------------------+------------+ | |||
| 13. Informative References | 14. Informative References | |||
| [HUFFMAN] Huffman, D. A., "A Method for the Construction of Minimum | [HUFFMAN] Huffman, D. A., "A Method for the Construction of Minimum | |||
| Redundancy Codes", Proceedings of the Institute of Radio | Redundancy Codes", Proceedings of the Institute of Radio | |||
| Engineers, September 1952, Volume 40, Number 9, pp. | Engineers, September 1952, Volume 40, Number 9, pp. | |||
| 1098-1101. | 1098-1101. | |||
| [LZ77] Ziv J., Lempel A., "A Universal Algorithm for Sequential | [LZ77] Ziv J., Lempel A., "A Universal Algorithm for Sequential | |||
| Data Compression", IEEE Transactions on Information | Data Compression", IEEE Transactions on Information | |||
| Theory, Vol. 23, No. 3, pp. 337-343. | Theory, Vol. 23, No. 3, pp. 337-343. | |||
| [RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification | [RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification | |||
| version 1.3", RFC 1951, Aladdin Enterprises, May 1996. | version 1.3", RFC 1951, Aladdin Enterprises, May 1996. | |||
| http://www.ietf.org/rfc/rfc1951.txt | http://www.ietf.org/rfc/rfc1951.txt | |||
| [WOFF2] Levantovsky, V. (ed.), Levien, R. (ed.), "WOFF File Format | [WOFF2] Levantovsky, V. (ed.), Levien, R. (ed.), "WOFF File Format | |||
| 2.0", W3C WebFonts Working Group, | 2.0", W3C WebFonts Working Group, | |||
| http://www.w3.org/TR/WOFF2/ | http://www.w3.org/TR/WOFF2/ | |||
| 14. Source code | 15. Source code | |||
| Source code for a C language implementation of a brotli compliant | Source code for a C language implementation of a brotli compliant | |||
| decompressor and a C++ language implementation of a compressor is | decompressor and a C++ language implementation of a compressor is | |||
| available in the brotli open-source project: | available in the brotli open-source project: | |||
| https://github.com/google/brotli | https://github.com/google/brotli | |||
| 15. Acknowledgments | 16. Acknowledgments | |||
| The authors would like to thank Mark Adler, Robert Obryk, Thomas | The authors would like to thank Mark Adler, Robert Obryk, Thomas | |||
| Pickert, and Joe Tsai for providing helpful review comments, | Pickert, and Joe Tsai for providing helpful review comments, | |||
| validating the specification by writing an independent decompressor, | validating the specification by writing an independent decompressor, | |||
| and suggesting improvements to the format and the text of the | and suggesting improvements to the format and the text of the | |||
| specification. | specification. | |||
| Appendix A. Static dictionary data | Appendix A. Static dictionary data | |||
| The hexadecimal form of the DICT array is the following, where the | The hexadecimal form of the DICT array is the following, where the | |||
| length is 122,784 bytes and the zlib CRC-32 of the byte sequence is | length is 122,784 bytes and the CRC-32 of the byte sequence is | |||
| 0x5136cb04. | 0x5136cb04. | |||
| 74696d65646f776e6c6966656c6566746261636b636f64656461746173686f77 | 74696d65646f776e6c6966656c6566746261636b636f64656461746173686f77 | |||
| 6f6e6c7973697465636974796f70656e6a7573746c696b6566726565776f726b | 6f6e6c7973697465636974796f70656e6a7573746c696b6566726565776f726b | |||
| 74657874796561726f766572626f64796c6f7665666f726d626f6f6b706c6179 | 74657874796561726f766572626f64796c6f7665666f726d626f6f6b706c6179 | |||
| 6c6976656c696e6568656c70686f6d65736964656d6f7265776f72646c6f6e67 | 6c6976656c696e6568656c70686f6d65736964656d6f7265776f72646c6f6e67 | |||
| 7468656d7669657766696e64706167656461797366756c6c686561647465726d | 7468656d7669657766696e64706167656461797366756c6c686561647465726d | |||
| 656163686172656166726f6d747275656d61726b61626c6575706f6e68696768 | 656163686172656166726f6d747275656d61726b61626c6575706f6e68696768 | |||
| 646174656c616e646e6577736576656e6e65787463617365626f7468706f7374 | 646174656c616e646e6577736576656e6e65787463617365626f7468706f7374 | |||
| 757365646d61646568616e6468657265776861746e616d654c696e6b626c6f67 | 757365646d61646568616e6468657265776861746e616d654c696e6b626c6f67 | |||
| skipping to change at page 117, line 44 ¶ | skipping to change at page 121, line 33 ¶ | |||
| Appendix B. List of word transformations | Appendix B. List of word transformations | |||
| The string literals are in C format, with respect to the use of | The string literals are in C format, with respect to the use of | |||
| backslash escape characters. | backslash escape characters. | |||
| In order to generate a length and check value, the transforms can be | In order to generate a length and check value, the transforms can be | |||
| converted to a series of bytes, where each transform is the prefix | converted to a series of bytes, where each transform is the prefix | |||
| sequence of bytes plus a terminating zero byte, a single byte value | sequence of bytes plus a terminating zero byte, a single byte value | |||
| identifying the transform, and the suffix sequence of bytes plus a | identifying the transform, and the suffix sequence of bytes plus a | |||
| terminating zero. The value for the transforms are 0 for Identity, 1 for | terminating zero. The value for the transforms are 0 for Identity, 1 for | |||
| UppercaseFirst, 2 for UppercaseAll, 3 to 11 for OmitFirst1 to | FermentFirst, 2 for FermentAll, 3 to 11 for OmitFirst1 to OmitFirst9, | |||
| OmitFirst9, and 12 to 20 for OmitLast1 to OmitLast9. The byte sequences | and 12 to 20 for OmitLast1 to OmitLast9. The byte sequences that | |||
| that represent the 121 transforms are then concatenated to a single | represent the 121 transforms are then concatenated to a single sequence | |||
| sequence of bytes. The length of that sequence is 648 bytes, and the | of bytes. The length of that sequence is 648 bytes, and the CRC-32 is | |||
| zlib CRC is 0x3d965f81. | 0x3d965f81. | |||
| ID Prefix Transform Suffix | ID Prefix Transform Suffix | |||
| -- ------ --------- ------ | -- ------ --------- ------ | |||
| 0 "" Identity "" | 0 "" Identity "" | |||
| 1 "" Identity " " | 1 "" Identity " " | |||
| 2 " " Identity " " | 2 " " Identity " " | |||
| 3 "" OmitFirst1 "" | 3 "" OmitFirst1 "" | |||
| 4 "" UppercaseFirst " " | 4 "" FermentFirst " " | |||
| 5 "" Identity " the " | 5 "" Identity " the " | |||
| 6 " " Identity "" | 6 " " Identity "" | |||
| 7 "s " Identity " " | 7 "s " Identity " " | |||
| 8 "" Identity " of " | 8 "" Identity " of " | |||
| 9 "" UppercaseFirst "" | 9 "" FermentFirst "" | |||
| 10 "" Identity " and " | 10 "" Identity " and " | |||
| 11 "" OmitFirst2 "" | 11 "" OmitFirst2 "" | |||
| 12 "" OmitLast1 "" | 12 "" OmitLast1 "" | |||
| 13 ", " Identity " " | 13 ", " Identity " " | |||
| 14 "" Identity ", " | 14 "" Identity ", " | |||
| 15 " " UppercaseFirst " " | 15 " " FermentFirst " " | |||
| 16 "" Identity " in " | 16 "" Identity " in " | |||
| 17 "" Identity " to " | 17 "" Identity " to " | |||
| 18 "e " Identity " " | 18 "e " Identity " " | |||
| 19 "" Identity "\"" | 19 "" Identity "\"" | |||
| 20 "" Identity "." | 20 "" Identity "." | |||
| 21 "" Identity "\">" | 21 "" Identity "\">" | |||
| 22 "" Identity "\n" | 22 "" Identity "\n" | |||
| 23 "" OmitLast3 "" | 23 "" OmitLast3 "" | |||
| 24 "" Identity "]" | 24 "" Identity "]" | |||
| 25 "" Identity " for " | 25 "" Identity " for " | |||
| 26 "" OmitFirst3 "" | 26 "" OmitFirst3 "" | |||
| 27 "" OmitLast2 "" | 27 "" OmitLast2 "" | |||
| 28 "" Identity " a " | 28 "" Identity " a " | |||
| 29 "" Identity " that " | 29 "" Identity " that " | |||
| 30 " " UppercaseFirst "" | 30 " " FermentFirst "" | |||
| 31 "" Identity ". " | 31 "" Identity ". " | |||
| 32 "." Identity "" | 32 "." Identity "" | |||
| 33 " " Identity ", " | 33 " " Identity ", " | |||
| 34 "" OmitFirst4 "" | 34 "" OmitFirst4 "" | |||
| 35 "" Identity " with " | 35 "" Identity " with " | |||
| 36 "" Identity "'" | 36 "" Identity "'" | |||
| 37 "" Identity " from " | 37 "" Identity " from " | |||
| 38 "" Identity " by " | 38 "" Identity " by " | |||
| 39 "" OmitFirst5 "" | 39 "" OmitFirst5 "" | |||
| 40 "" OmitFirst6 "" | 40 "" OmitFirst6 "" | |||
| 41 " the " Identity "" | 41 " the " Identity "" | |||
| 42 "" OmitLast4 "" | 42 "" OmitLast4 "" | |||
| 43 "" Identity ". The " | 43 "" Identity ". The " | |||
| 44 "" UppercaseAll "" | 44 "" FermentAll "" | |||
| 45 "" Identity " on " | 45 "" Identity " on " | |||
| 46 "" Identity " as " | 46 "" Identity " as " | |||
| 47 "" Identity " is " | 47 "" Identity " is " | |||
| 48 "" OmitLast7 "" | 48 "" OmitLast7 "" | |||
| 49 "" OmitLast1 "ing " | 49 "" OmitLast1 "ing " | |||
| 50 "" Identity "\n\t" | 50 "" Identity "\n\t" | |||
| 51 "" Identity ":" | 51 "" Identity ":" | |||
| 52 " " Identity ". " | 52 " " Identity ". " | |||
| 53 "" Identity "ed " | 53 "" Identity "ed " | |||
| 54 "" OmitFirst9 "" | 54 "" OmitFirst9 "" | |||
| 55 "" OmitFirst7 "" | 55 "" OmitFirst7 "" | |||
| 56 "" OmitLast6 "" | 56 "" OmitLast6 "" | |||
| 57 "" Identity "(" | 57 "" Identity "(" | |||
| 58 "" UppercaseFirst ", " | 58 "" FermentFirst ", " | |||
| 59 "" OmitLast8 "" | 59 "" OmitLast8 "" | |||
| 60 "" Identity " at " | 60 "" Identity " at " | |||
| 61 "" Identity "ly " | 61 "" Identity "ly " | |||
| 62 " the " Identity " of " | 62 " the " Identity " of " | |||
| 63 "" OmitLast5 "" | 63 "" OmitLast5 "" | |||
| 64 "" OmitLast9 "" | 64 "" OmitLast9 "" | |||
| 65 " " UppercaseFirst ", " | 65 " " FermentFirst ", " | |||
| 66 "" UppercaseFirst "\"" | 66 "" FermentFirst "\"" | |||
| 67 "." Identity "(" | 67 "." Identity "(" | |||
| 68 "" UppercaseAll " " | 68 "" FermentAll " " | |||
| 69 "" UppercaseFirst "\">" | 69 "" FermentFirst "\">" | |||
| 70 "" Identity "=\"" | 70 "" Identity "=\"" | |||
| 71 " " Identity "." | 71 " " Identity "." | |||
| 72 ".com/" Identity "" | 72 ".com/" Identity "" | |||
| 73 " the " Identity " of the " | 73 " the " Identity " of the " | |||
| 74 "" UppercaseFirst "'" | 74 "" FermentFirst "'" | |||
| 75 "" Identity ". This " | 75 "" Identity ". This " | |||
| 76 "" Identity "," | 76 "" Identity "," | |||
| 77 "." Identity " " | 77 "." Identity " " | |||
| 78 "" UppercaseFirst "(" | 78 "" FermentFirst "(" | |||
| 79 "" UppercaseFirst "." | 79 "" FermentFirst "." | |||
| 80 "" Identity " not " | 80 "" Identity " not " | |||
| 81 " " Identity "=\"" | 81 " " Identity "=\"" | |||
| 82 "" Identity "er " | 82 "" Identity "er " | |||
| 83 " " UppercaseAll " " | 83 " " FermentAll " " | |||
| 84 "" Identity "al " | 84 "" Identity "al " | |||
| 85 " " UppercaseAll "" | 85 " " FermentAll "" | |||
| 86 "" Identity "='" | 86 "" Identity "='" | |||
| 87 "" UppercaseAll "\"" | 87 "" FermentAll "\"" | |||
| 88 "" UppercaseFirst ". " | 88 "" FermentFirst ". " | |||
| 89 " " Identity "(" | 89 " " Identity "(" | |||
| 90 "" Identity "ful " | 90 "" Identity "ful " | |||
| 91 " " UppercaseFirst ". " | 91 " " FermentFirst ". " | |||
| 92 "" Identity "ive " | 92 "" Identity "ive " | |||
| 93 "" Identity "less " | 93 "" Identity "less " | |||
| 94 "" UppercaseAll "'" | 94 "" FermentAll "'" | |||
| 95 "" Identity "est " | 95 "" Identity "est " | |||
| 96 " " UppercaseFirst "." | 96 " " FermentFirst "." | |||
| 97 "" UppercaseAll "\">" | 97 "" FermentAll "\">" | |||
| 98 " " Identity "='" | 98 " " Identity "='" | |||
| 99 "" UppercaseFirst "," | 99 "" FermentFirst "," | |||
| 100 "" Identity "ize " | 100 "" Identity "ize " | |||
| 101 "" UppercaseAll "." | 101 "" FermentAll "." | |||
| 102 "\xc2\xa0" Identity "" | 102 "\xc2\xa0" Identity "" | |||
| 103 " " Identity "," | 103 " " Identity "," | |||
| 104 "" UppercaseFirst "=\"" | 104 "" FermentFirst "=\"" | |||
| 105 "" UppercaseAll "=\"" | 105 "" FermentAll "=\"" | |||
| 106 "" Identity "ous " | 106 "" Identity "ous " | |||
| 107 "" UppercaseAll ", " | 107 "" FermentAll ", " | |||
| 108 "" UppercaseFirst "='" | 108 "" FermentFirst "='" | |||
| 109 " " UppercaseFirst "," | 109 " " FermentFirst "," | |||
| 110 " " UppercaseAll "=\"" | 110 " " FermentAll "=\"" | |||
| 111 " " UppercaseAll ", " | 111 " " FermentAll ", " | |||
| 112 "" UppercaseAll "," | 112 "" FermentAll "," | |||
| 113 "" UppercaseAll "(" | 113 "" FermentAll "(" | |||
| 114 "" UppercaseAll ". " | 114 "" FermentAll ". " | |||
| 115 " " UppercaseAll "." | 115 " " FermentAll "." | |||
| 116 "" UppercaseAll "='" | 116 "" FermentAll "='" | |||
| 117 " " UppercaseAll ". " | 117 " " FermentAll ". " | |||
| 118 " " UppercaseFirst "=\"" | 118 " " FermentFirst "=\"" | |||
| 119 " " UppercaseAll "='" | 119 " " FermentAll "='" | |||
| 120 " " UppercaseFirst "='" | 120 " " FermentFirst "='" | |||
| Appendix C. Computing CRC-32 check values | ||||
| For the purpose of this specification, we define the CRC-32 check value | ||||
| of a byte sequence with the following C language function: | ||||
| uint32_t CRC32(const uint8_t* v, const int len) { | ||||
| const uint32_t poly = 0xedb88320UL; | ||||
| uint32_t crc, c; | ||||
| int i, k; | ||||
| crc = 0xffffffffUL; | ||||
| for (i = 0; i < len; ++i) { | ||||
| c = (crc ^ v[i]) & 0xff; | ||||
| for (k = 0; k < 8; k++) c = c & 1 ? poly ^ (c >> 1) : c >> 1; | ||||
| crc = c ^ (crc >> 8); | ||||
| } | ||||
| return crc ^ 0xffffffffUL; | ||||
| } | ||||
| Authors' Addresses | Authors' Addresses | |||
| Jyrki Alakuijala | Jyrki Alakuijala | |||
| Google, Inc | Google, Inc | |||
| Email: jyrki@google.com | Email: jyrki@google.com | |||
| Zoltan Szabadka | Zoltan Szabadka | |||
| Google, Inc | Google, Inc | |||
| End of changes. 72 change blocks. | ||||
| 198 lines changed or deleted | 422 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||