| < draft-deutsch-deflate-spec-00.txt | draft-deutsch-deflate-spec-01.txt > | |||
|---|---|---|---|---|
| INTERNET-DRAFT L. P. Deutsch | INTERNET-DRAFT L. P. Deutsch | |||
| DEFLATE 1.3 Aladdin Enterprises | DEFLATE 1.3 Aladdin Enterprises | |||
| Expires: 06 Aug 1996 01 Feb 1996 | Expires: 17 Aug 1996 12 Feb 1996 | |||
| DEFLATE Compressed Data Format Specification version 1.3 | DEFLATE Compressed Data Format Specification version 1.3 | |||
| File draft-deutsch-deflate-spec-00.txt | File draft-deutsch-deflate-spec-01.txt | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft. Internet-Drafts are working | This document is an Internet-Draft. Internet-Drafts are working | |||
| documents of the Internet Engineering Task Force (IETF), its areas, | documents of the Internet Engineering Task Force (IETF), its areas, | |||
| and its working groups. Note that other groups may also distribute | and its working groups. Note that other groups may also distribute | |||
| working documents as Internet-Drafts. | working documents as Internet-Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| skipping to change at line 308 ¶ | skipping to change at line 308 ¶ | |||
| A parser can decode the next symbol from an encoded input | A parser can decode the next symbol from an encoded input | |||
| stream by walking down the tree from the root, at each step | stream by walking down the tree from the root, at each step | |||
| choosing the edge corresponding to the next input bit. | choosing the edge corresponding to the next input bit. | |||
| Given an alphabet with known symbol frequencies, the Huffman | Given an alphabet with known symbol frequencies, the Huffman | |||
| algorithm allows the construction of an optimal prefix code | algorithm allows the construction of an optimal prefix code | |||
| (one which represents strings with those symbol frequencies | (one which represents strings with those symbol frequencies | |||
| using the fewest bits of any possible prefix codes for that | using the fewest bits of any possible prefix codes for that | |||
| alphabet). Such a code is called a Huffman code. (See | alphabet). Such a code is called a Huffman code. (See | |||
| reference [HUFFMAN] in Chapter 5, references for additional | reference [1] in Chapter 5, references for additional | |||
| information on Huffman codes.) | information on Huffman codes.) | |||
| Note that in the 'deflate' format, the Huffman codes for the | Note that in the 'deflate' format, the Huffman codes for the | |||
| various alphabets must not exceed certain maximum code lengths. | various alphabets must not exceed certain maximum code lengths. | |||
| This constraint complicates the algorithm for computing code | This constraint complicates the algorithm for computing code | |||
| lengths from symbol frequencies. Again, see Chapter 5, | lengths from symbol frequencies. Again, see Chapter 5, | |||
| references for details. | references for details. | |||
| 3.2.2. Use of Huffman coding in the 'deflate' format | 3.2.2. Use of Huffman coding in the 'deflate' format | |||
| skipping to change at line 471 ¶ | skipping to change at line 471 ¶ | |||
| move backwards distance bytes in the output | move backwards distance bytes in the output | |||
| stream, and copy length bytes from this | stream, and copy length bytes from this | |||
| position to the output stream. | position to the output stream. | |||
| end loop | end loop | |||
| while not last block | while not last block | |||
| Note that a duplicated string reference may refer to a string | Note that a duplicated string reference may refer to a string | |||
| in a previous block; i.e., the backward distance may cross one | in a previous block; i.e., the backward distance may cross one | |||
| or more block boundaries. However a distance cannot refer past | or more block boundaries. However a distance cannot refer past | |||
| the beginning of the output stream. Note also that the | the beginning of the output stream. (An application using a | |||
| referenced string may overlap the current position; for | preset dictionary might discard part of the output stream; a | |||
| example, if the last 2 bytes decoded have values X and Y, a | distance can refer to that part of the output stream anyway) | |||
| string reference with <length = 5, distance = 2> adds X,Y,X,Y,X | Note also that the referenced string may overlap the current | |||
| to the output stream. | position; for example, if the last 2 bytes decoded have values | |||
| X and Y, a string reference with <length = 5, distance = 2> | ||||
| adds X,Y,X,Y,X to the output stream. | ||||
| Deutsch [Page 9] | ||||
| We now specify each compression method in turn. | We now specify each compression method in turn. | |||
| Deutsch [Page 9] | ||||
| 3.2.4. Non-compressed blocks (BTYPE=00) | 3.2.4. Non-compressed blocks (BTYPE=00) | |||
| Any bits of input up to the next byte boundary are ignored. | Any bits of input up to the next byte boundary are ignored. | |||
| The rest of the block consists of the following information: | The rest of the block consists of the following information: | |||
| 0 1 2 3 4... | 0 1 2 3 4... | |||
| +---+---+---+---+=================================+ | +---+---+---+---+=================================+ | |||
| | LEN | NLEN |... LEN bytes of literal data...| | | LEN | NLEN |... LEN bytes of literal data...| | |||
| +---+---+---+---+=================================+ | +---+---+---+---+=================================+ | |||
| skipping to change at line 653 ¶ | skipping to change at line 655 ¶ | |||
| A compliant decompressor must accept the full range of possible | A compliant decompressor must accept the full range of possible | |||
| values defined in the previous section, and must accept blocks of | values defined in the previous section, and must accept blocks of | |||
| arbitrary size. | arbitrary size. | |||
| 4. Compression algorithm details | 4. Compression algorithm details | |||
| While it is the intent of this document to define the 'deflate' | While it is the intent of this document to define the 'deflate' | |||
| compressed data format without reference to any particular | compressed data format without reference to any particular | |||
| compression algorithm, the format is related to the compressed | compression algorithm, the format is related to the compressed | |||
| formats produced by LZ77 (Lempel-Ziv 1977, see reference [LZ77] | formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below); | |||
| below); since many variations of LZ77 are patented, it is strongly | since many variations of LZ77 are patented, it is strongly | |||
| recommended that the implementor of a compressor follow the general | recommended that the implementor of a compressor follow the general | |||
| algorithm presented here, which is known not to be patented per se. | algorithm presented here, which is known not to be patented per se. | |||
| The material in this section is not part of the definition of the | The material in this section is not part of the definition of the | |||
| specification per se, and a compressor need not follow it in order to | specification per se, and a compressor need not follow it in order to | |||
| be compliant. | be compliant. | |||
| The compressor terminates a block when it determines that starting a | The compressor terminates a block when it determines that starting a | |||
| new block with fresh trees would be useful, or when the block size | new block with fresh trees would be useful, or when the block size | |||
| fills up the compressor's block buffer. | fills up the compressor's block buffer. | |||
| skipping to change at line 709 ¶ | skipping to change at line 711 ¶ | |||
| complete second search regardless of the length of the first match. | complete second search regardless of the length of the first match. | |||
| In the normal case, if the current match is "long enough", the | In the normal case, if the current match is "long enough", the | |||
| compressor reduces the search for a longer match, thus speeding up | compressor reduces the search for a longer match, thus speeding up | |||
| the process. If speed is most important, the compressor inserts new | the process. If speed is most important, the compressor inserts new | |||
| strings in the hash table only when no match was found, or when the | strings in the hash table only when no match was found, or when the | |||
| match is not "too long". This degrades the compression ratio but | match is not "too long". This degrades the compression ratio but | |||
| saves time since there are both fewer insertions and fewer searches. | saves time since there are both fewer insertions and fewer searches. | |||
| 5. References | 5. References | |||
| [GZIP] Gailly, J.-L., and Adler, M., gzip documentation and sources, | [1] Huffman, D. A., "A Method for the Construction of Minimum | |||
| available in prep.ai.mit.edu:/pub/gnu/gzip-*.tar | Redundancy Codes", Proceedings of the Institute of Radio Engineers, | |||
| September 1952, Volume 40, Number 9, pp. 1098-1101. | ||||
| [ZLIB] Gailly, J.-L., and Adler, M., zlib documentation and sources, | ||||
| available in ftp.uu.net:/pub/archiving/zip/doc/zlib* | ||||
| [LZ77] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data | [2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data | |||
| Compression", IEEE Transactions on Information Theory", Vol. 23, No. | Compression", IEEE Transactions on Information Theory", Vol. 23, No. | |||
| 3, pp. 337-343. | 3, pp. 337-343. | |||
| [HUFFMAN] Huffman, D. A., 'A Method for the Construction of Minimum | [3] Gailly, J.-L., and Adler, M., zlib documentation and sources, | |||
| Redundancy Codes', Proceedings of the Institute of Radio Engineers, | available in ftp.uu.net:/pub/archiving/zip/doc/zlib* | |||
| September 1952, Volume 40, Number 9, pp. 1098-1101. | ||||
| [SCHWARTZ] Schwartz, E. S., and Kallick, B. "Generating a canonical | [4] Gailly, J.-L., and Adler, M., gzip documentation and sources, | |||
| prefix encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169. | available in prep.ai.mit.edu:/pub/gnu/gzip-*.tar | |||
| [HIRSCHBERG] "Efficient decoding of prefix codes", Hirschberg and | [5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix | |||
| Lelewer, Comm. ACM, 33,4, April 1990, pp. 449-459. | encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169. | |||
| [6] "Efficient decoding of prefix codes", Hirschberg and Lelewer, | ||||
| Comm. ACM, 33,4, April 1990, pp. 449-459. | ||||
| 6. Security considerations | 6. Security considerations | |||
| Any data compression method involves the reduction of redundancy in | Any data compression method involves the reduction of redundancy in | |||
| the data. Consequently, any corruption of the data is likely to have | the data. Consequently, any corruption of the data is likely to have | |||
| severe effects and be difficult to correct. Uncompressed text, on | severe effects and be difficult to correct. Uncompressed text, on | |||
| the other hand, will probably still be readable despite the presence | the other hand, will probably still be readable despite the presence | |||
| of some corrupted bytes. | of some corrupted bytes. | |||
| It is recommended that systems using this data format provide some | It is recommended that systems using this data format provide some | |||
| Deutsch [Page 14] | Deutsch [Page 14] | |||
| means of validating the integrity of the compressed data. See | means of validating the integrity of the compressed data. See | |||
| reference [ZLIB], for example. | reference [3], for example. | |||
| 7. Source code | 7. Source code | |||
| Source code for a C language implementation of a 'deflate' compliant | Source code for a C language implementation of a 'deflate' compliant | |||
| compressor and decompressor is available within the zlib package at | compressor and decompressor is available within the zlib package at | |||
| ftp.uu.net:/pub/archiving/zip/zlib/zlib*. | ftp.uu.net:/pub/archiving/zip/zlib/zlib*. | |||
| 8. Acknowledgements | 8. Acknowledgements | |||
| Trademarks cited in this document are the property of their | Trademarks cited in this document are the property of their | |||
| skipping to change at line 780 ¶ | skipping to change at line 782 ¶ | |||
| Questions about the technical content of this specification can be | Questions about the technical content of this specification can be | |||
| sent by email to | sent by email to | |||
| Jean-loup Gailly <gzip@prep.ai.mit.edu> and | Jean-loup Gailly <gzip@prep.ai.mit.edu> and | |||
| Mark Adler <madler@alumni.caltech.edu> | Mark Adler <madler@alumni.caltech.edu> | |||
| Editorial comments on this specification can be sent by email to | Editorial comments on this specification can be sent by email to | |||
| L. Peter Deutsch <ghost@aladdin.com> and | L. Peter Deutsch <ghost@aladdin.com> and | |||
| Glenn Randers-Pehrson <glennrp@arl.mil> | Glenn Randers-Pehrson <randeg@alumni.rpi.edu> | |||
| Deutsch [Page 15] | Deutsch [Page 15] | |||
| End of changes. 14 change blocks. | ||||
| 26 lines changed or deleted | 28 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||