< draft-deutsch-deflate-spec-00.txt   draft-deutsch-deflate-spec-01.txt >
INTERNET-DRAFT L. P. Deutsch INTERNET-DRAFT L. P. Deutsch
DEFLATE 1.3 Aladdin Enterprises DEFLATE 1.3 Aladdin Enterprises
Expires: 06 Aug 1996 01 Feb 1996 Expires: 17 Aug 1996 12 Feb 1996
DEFLATE Compressed Data Format Specification version 1.3 DEFLATE Compressed Data Format Specification version 1.3
File draft-deutsch-deflate-spec-00.txt File draft-deutsch-deflate-spec-01.txt
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
skipping to change at line 308 skipping to change at line 308
A parser can decode the next symbol from an encoded input A parser can decode the next symbol from an encoded input
stream by walking down the tree from the root, at each step stream by walking down the tree from the root, at each step
choosing the edge corresponding to the next input bit. choosing the edge corresponding to the next input bit.
Given an alphabet with known symbol frequencies, the Huffman Given an alphabet with known symbol frequencies, the Huffman
algorithm allows the construction of an optimal prefix code algorithm allows the construction of an optimal prefix code
(one which represents strings with those symbol frequencies (one which represents strings with those symbol frequencies
using the fewest bits of any possible prefix codes for that using the fewest bits of any possible prefix codes for that
alphabet). Such a code is called a Huffman code. (See alphabet). Such a code is called a Huffman code. (See
reference [HUFFMAN] in Chapter 5, references for additional reference [1] in Chapter 5, references for additional
information on Huffman codes.) information on Huffman codes.)
Note that in the 'deflate' format, the Huffman codes for the Note that in the 'deflate' format, the Huffman codes for the
various alphabets must not exceed certain maximum code lengths. various alphabets must not exceed certain maximum code lengths.
This constraint complicates the algorithm for computing code This constraint complicates the algorithm for computing code
lengths from symbol frequencies. Again, see Chapter 5, lengths from symbol frequencies. Again, see Chapter 5,
references for details. references for details.
3.2.2. Use of Huffman coding in the 'deflate' format 3.2.2. Use of Huffman coding in the 'deflate' format
skipping to change at line 471 skipping to change at line 471
move backwards distance bytes in the output move backwards distance bytes in the output
stream, and copy length bytes from this stream, and copy length bytes from this
position to the output stream. position to the output stream.
end loop end loop
while not last block while not last block
Note that a duplicated string reference may refer to a string Note that a duplicated string reference may refer to a string
in a previous block; i.e., the backward distance may cross one in a previous block; i.e., the backward distance may cross one
or more block boundaries. However a distance cannot refer past or more block boundaries. However a distance cannot refer past
the beginning of the output stream. Note also that the the beginning of the output stream. (An application using a
referenced string may overlap the current position; for preset dictionary might discard part of the output stream; a
example, if the last 2 bytes decoded have values X and Y, a distance can refer to that part of the output stream anyway)
string reference with <length = 5, distance = 2> adds X,Y,X,Y,X Note also that the referenced string may overlap the current
to the output stream. position; for example, if the last 2 bytes decoded have values
X and Y, a string reference with <length = 5, distance = 2>
adds X,Y,X,Y,X to the output stream.
Deutsch [Page 9]
We now specify each compression method in turn. We now specify each compression method in turn.
Deutsch [Page 9]
3.2.4. Non-compressed blocks (BTYPE=00) 3.2.4. Non-compressed blocks (BTYPE=00)
Any bits of input up to the next byte boundary are ignored. Any bits of input up to the next byte boundary are ignored.
The rest of the block consists of the following information: The rest of the block consists of the following information:
0 1 2 3 4... 0 1 2 3 4...
+---+---+---+---+=================================+ +---+---+---+---+=================================+
| LEN | NLEN |... LEN bytes of literal data...| | LEN | NLEN |... LEN bytes of literal data...|
+---+---+---+---+=================================+ +---+---+---+---+=================================+
skipping to change at line 653 skipping to change at line 655
A compliant decompressor must accept the full range of possible A compliant decompressor must accept the full range of possible
values defined in the previous section, and must accept blocks of values defined in the previous section, and must accept blocks of
arbitrary size. arbitrary size.
4. Compression algorithm details 4. Compression algorithm details
While it is the intent of this document to define the 'deflate' While it is the intent of this document to define the 'deflate'
compressed data format without reference to any particular compressed data format without reference to any particular
compression algorithm, the format is related to the compressed compression algorithm, the format is related to the compressed
formats produced by LZ77 (Lempel-Ziv 1977, see reference [LZ77] formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below);
below); since many variations of LZ77 are patented, it is strongly since many variations of LZ77 are patented, it is strongly
recommended that the implementor of a compressor follow the general recommended that the implementor of a compressor follow the general
algorithm presented here, which is known not to be patented per se. algorithm presented here, which is known not to be patented per se.
The material in this section is not part of the definition of the The material in this section is not part of the definition of the
specification per se, and a compressor need not follow it in order to specification per se, and a compressor need not follow it in order to
be compliant. be compliant.
The compressor terminates a block when it determines that starting a The compressor terminates a block when it determines that starting a
new block with fresh trees would be useful, or when the block size new block with fresh trees would be useful, or when the block size
fills up the compressor's block buffer. fills up the compressor's block buffer.
skipping to change at line 709 skipping to change at line 711
complete second search regardless of the length of the first match. complete second search regardless of the length of the first match.
In the normal case, if the current match is "long enough", the In the normal case, if the current match is "long enough", the
compressor reduces the search for a longer match, thus speeding up compressor reduces the search for a longer match, thus speeding up
the process. If speed is most important, the compressor inserts new the process. If speed is most important, the compressor inserts new
strings in the hash table only when no match was found, or when the strings in the hash table only when no match was found, or when the
match is not "too long". This degrades the compression ratio but match is not "too long". This degrades the compression ratio but
saves time since there are both fewer insertions and fewer searches. saves time since there are both fewer insertions and fewer searches.
5. References 5. References
[GZIP] Gailly, J.-L., and Adler, M., gzip documentation and sources, [1] Huffman, D. A., "A Method for the Construction of Minimum
available in prep.ai.mit.edu:/pub/gnu/gzip-*.tar Redundancy Codes", Proceedings of the Institute of Radio Engineers,
September 1952, Volume 40, Number 9, pp. 1098-1101.
[ZLIB] Gailly, J.-L., and Adler, M., zlib documentation and sources,
available in ftp.uu.net:/pub/archiving/zip/doc/zlib*
[LZ77] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data [2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data
Compression", IEEE Transactions on Information Theory", Vol. 23, No. Compression", IEEE Transactions on Information Theory", Vol. 23, No.
3, pp. 337-343. 3, pp. 337-343.
[HUFFMAN] Huffman, D. A., 'A Method for the Construction of Minimum [3] Gailly, J.-L., and Adler, M., zlib documentation and sources,
Redundancy Codes', Proceedings of the Institute of Radio Engineers, available in ftp.uu.net:/pub/archiving/zip/doc/zlib*
September 1952, Volume 40, Number 9, pp. 1098-1101.
[SCHWARTZ] Schwartz, E. S., and Kallick, B. "Generating a canonical [4] Gailly, J.-L., and Adler, M., gzip documentation and sources,
prefix encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169. available in prep.ai.mit.edu:/pub/gnu/gzip-*.tar
[HIRSCHBERG] "Efficient decoding of prefix codes", Hirschberg and [5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix
Lelewer, Comm. ACM, 33,4, April 1990, pp. 449-459. encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169.
[6] "Efficient decoding of prefix codes", Hirschberg and Lelewer,
Comm. ACM, 33,4, April 1990, pp. 449-459.
6. Security considerations 6. Security considerations
Any data compression method involves the reduction of redundancy in Any data compression method involves the reduction of redundancy in
the data. Consequently, any corruption of the data is likely to have the data. Consequently, any corruption of the data is likely to have
severe effects and be difficult to correct. Uncompressed text, on severe effects and be difficult to correct. Uncompressed text, on
the other hand, will probably still be readable despite the presence the other hand, will probably still be readable despite the presence
of some corrupted bytes. of some corrupted bytes.
It is recommended that systems using this data format provide some It is recommended that systems using this data format provide some
Deutsch [Page 14] Deutsch [Page 14]
means of validating the integrity of the compressed data. See means of validating the integrity of the compressed data. See
reference [ZLIB], for example. reference [3], for example.
7. Source code 7. Source code
Source code for a C language implementation of a 'deflate' compliant Source code for a C language implementation of a 'deflate' compliant
compressor and decompressor is available within the zlib package at compressor and decompressor is available within the zlib package at
ftp.uu.net:/pub/archiving/zip/zlib/zlib*. ftp.uu.net:/pub/archiving/zip/zlib/zlib*.
8. Acknowledgements 8. Acknowledgements
Trademarks cited in this document are the property of their Trademarks cited in this document are the property of their
skipping to change at line 780 skipping to change at line 782
Questions about the technical content of this specification can be Questions about the technical content of this specification can be
sent by email to sent by email to
Jean-loup Gailly <gzip@prep.ai.mit.edu> and Jean-loup Gailly <gzip@prep.ai.mit.edu> and
Mark Adler <madler@alumni.caltech.edu> Mark Adler <madler@alumni.caltech.edu>
Editorial comments on this specification can be sent by email to Editorial comments on this specification can be sent by email to
L. Peter Deutsch <ghost@aladdin.com> and L. Peter Deutsch <ghost@aladdin.com> and
Glenn Randers-Pehrson <glennrp@arl.mil> Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
Deutsch [Page 15] Deutsch [Page 15]
 End of changes. 14 change blocks. 
26 lines changed or deleted 28 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/