idnits 2.17.1 draft-deutsch-zlib-spec-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 9 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (01 Feb 1996) is 10306 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '

' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GZIP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DEFLATE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PNG'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FLETCHER'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU-T'


     Summary: 8 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT                                          L. Peter Deutsch
2	ZLIB 3.3                                             Aladdin Enterprises
3	Expires: 06 Aug 1996                                    Jean-Loup Gailly
4	                                                                Info-Zip
5	                                                             01 Feb 1996

7	         ZLIB Compressed Data Format Specification version 3.3

9	File draft-deutsch-zlib-spec-00.txt

11	Status of this Memo

13	   This document is an Internet-Draft.  Internet-Drafts are working
14	   documents of the Internet Engineering Task Force (IETF), its areas,
15	   and its working groups.  Note that other groups may also distribute
16	   working documents as Internet-Drafts.

18	   Internet-Drafts are draft documents valid for a maximum of six months
19	   and may be updated, replaced, or obsoleted by other documents at any
20	   time.  It is inappropriate to use Internet- Drafts as reference
21	   material or to cite them other than as ``work in progress.''

23	   To learn the current status of any Internet-Draft, please check the
24	   ``1id-abstracts.txt'' listing contained in the Internet- Drafts
25	   Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
26	   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
27	   ftp.isi.edu (US West Coast).

29	   Distribution of this memo is unlimited.

31	 Notices

33	   Copyright (C) 1996 L. Peter Deutsch and Jean-loup Gailly

35	   Permission is granted to copy and distribute this document for any
36	   purpose and without charge, including translations into other
37	   languages and incorporation into compilations, provided that it is
38	   copied as a whole (including the copyright notice and this notice)
39	   and with no changes.

41	Abstract

43	   This specification defines a lossless compressed data format.  The
44	   data can be produced or consumed, even for an arbitrarily long
45	   sequentially presented input data stream, using only an a priori
46	   bounded amount of intermediate storage.  The format presently uses
47	   the DEFLATE compression method but can be easily extended to use
48	   other compression methods.  It can be implemented readily in a manner

50	Deutsch and Gailly                                             [Page  1]
51	   not covered by patents.  This specification also defines the ADLER-32
52	   checksum (an extension and improvement of the Fletcher checksum),
53	   used for detection of data corruption, and provides an algorithm for
54	   computing it.

56	Table of contents

58	   1. Introduction ................................................... 2
59	      1.1 Purpose .................................................... 2
60	      1.2 Intended audience .......................................... 2
61	      1.3 Scope ...................................................... 3
62	      1.4 Compliance ................................................. 3
63	      1.5  Definitions of terms and conventions used ................. 3
64	      1.6 Changes from previous versions ............................. 3
65	   2. Detailed specification ......................................... 3
66	      2.1 Overall conventions ........................................ 3
67	      2.2 Data format ................................................ 4
68	      2.3 Compliance ................................................. 6
69	   3. References ..................................................... 7
70	   4. Source code .................................................... 7
71	   5. Security considerations ........................................ 7
72	   6. Acknowledgements ............................................... 7
73	   7. Authors' addresses ............................................. 7
74	   8. Appendix: Rationale ............................................ 8
75	   9. Appendix: Sample code .......................................... 9

77	1. Introduction

79	   1.1. Purpose

81	      The purpose of this specification is to define a lossless
82	      compressed data format that:

84	          o Is independent of CPU type, operating system, file system,
85	            and character set, and hence can be used for interchange;

87	          o Can be produced or consumed, even for an arbitrarily long
88	            sequentially presented input data stream, using only an a
89	            priori bounded amount of intermediate storage, and hence can
90	            be used in data communications or similar structures such as
91	            Unix filters;

93	          o Can use a number of different compression methods;

95	          o Can be implemented readily in a manner not covered by
96	            patents, and hence can be practiced freely.

98	      The data format defined by this specification does not attempt to
99	      allow random access to compressed data.
100	   1.2. Intended audience

102	      This specification is intended for use by implementors of software

104	Deutsch and Gailly                                             [Page  2]
105	      to compress data into zlib format and/or decompress data from zlib
106	      format.

108	      The text of the specification assumes a basic background in
109	      programming at the level of bits and other primitive data
110	      representations.

112	   1.3. Scope

114	      The specification specifies a compressed data format that can be
115	      used for in-memory compression of a sequence of arbitrary bytes.

117	   1.4. Compliance

119	      Unless otherwise indicated below, a compliant decompressor must be
120	      able to accept and decompress any data set that conforms to all
121	      the specifications presented here; a compliant compressor must
122	      produce data sets that conform to all the specifications presented
123	      here.

125	   1.5.  Definitions of terms and conventions used

127	      byte: 8 bits stored or transmitted as a unit (same as an octet).
128	      (For this specification, a byte is exactly 8 bits, even on
129	      machines which store a character on a number of bits different
130	      from 8.)  See Section 2.1, below, for the numbering of bits within
131	      a byte.

133	   1.6. Changes from previous versions

135	      Version 3.1 was the first public release of this specification.
136	      In version 3.2, some terminology was changed and the Adler-32
137	      sample code was rewritten for clarity.  In version 3.3, the
138	      support for a preset dictionary was introduced, and the
139	      specification was converted to Internet Draft style.

141	2. Detailed specification

143	   2.1. Overall conventions

145	      In the diagrams below, a box like this:

147	         +---+
148	         |   | <-- the vertical bars might be missing
149	         +---+

151	      represents one byte; a box like this:

153	         +==============+
154	         |              |
155	         +==============+

157	Deutsch and Gailly                                             [Page  3]
158	      represents a variable number of bytes.

160	      Bytes stored within a computer do not have a 'bit order', since
161	      they are always treated as a unit.  However, a byte considered as
162	      an integer between 0 and 255 does have a most- and least-
163	      significant bit, and since we write numbers with the most-
164	      significant digit on the left, we also write bytes with the most-
165	      significant bit on the left.  In the diagrams below, we number the
166	      bits of a byte so that bit 0 is the least-significant bit, i.e.,
167	      the bits are numbered:

169	         +--------+
170	         |76543210|
171	         +--------+

173	      Within a computer, a number may occupy multiple bytes.  All
174	      multi-byte numbers in the format described here are stored with
175	      the MOST-significant byte first (at the lower memory address).
176	      For example, the decimal number 520 is stored as:

178	             0        1
179	         +--------+--------+
180	         |00000010|00001000|
181	         +--------+--------+
182	          ^        ^
183	          |        |
184	          |        + less significant byte = 8
185	          + more significant byte = 2 x 256

187	   2.2. Data format

189	      A zlib stream has the following structure:

191	           0   1
192	         +---+---+
193	         |CMF|FLG|   (more-->)
194	         +---+---+

196	      (if FLG.FDICT set)

198	           0   1   2   3
199	         +---+---+---+---+
200	         |     DICTID    |   (more-->)
201	         +---+---+---+---+

203	         +=====================+---+---+---+---+
204	         |...compressed data...|    ADLER32    |
205	         +=====================+---+---+---+---+

207	      Any data which may appear after ADLER32 are not part of the zlib
208	      stream.

210	Deutsch and Gailly                                             [Page  4]
211	      CMF (Compression Method and flags)

213	         This byte is divided into a 4-bit compression method and a 4-
214	         bit information field depending on the compression method.

216	            bits 0 to 3  CM     Compression method
217	            bits 4 to 7  CINFO  Compression info

219	      CM (Compression method)

221	         This identifies the compression method used in the file. CM = 8
222	         denotes the 'deflate' compression method with a window size up
223	         to 32K.  This is the method used by gzip and PNG (see
224	         references [GZIP] and [PNG] in Chapter 3, below, for the
225	         reference documents).  CM = 15 is reserved.  It might be used
226	         in a future version of this specification to indicate the
227	         presence of an extra field before the compressed data.

229	      CINFO (Compression info)

231	         For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
232	         size, minus eight (CINFO=7 indicates a 32K window size). Values
233	         of CINFO above 7 are not allowed in this version of the
234	         specification.  CINFO is not defined in this specification for
235	         CM not equal to 8.

237	      FLG (FLaGs)

239	         This flag byte is divided as follows:

241	            bits 0 to 4  FCHECK  (check bits for CMF and FLG)
242	            bit  5       FDICT   (preset dictionary)
243	            bits 6 to 7  FLEVEL  (compression level)

245	         The FCHECK value must be such that CMF and FLG, when viewed as
246	         a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
247	         is a multiple of 31.

249	      FDICT (Preset dictionary)

251	         If FDICT is set, a DICT dictionary identifier is present
252	         immediately after the FLG byte. The dictionary is a sequence of
253	         bytes which are initially fed to the compressor without
254	         producing any compressed output. DICT is the Adler-32 checksum
255	         of this sequence of bytes (see the definition of ADLER32
256	         below).  The decompressor can use this identifier to determine
257	         which dictionary has been used by the compressor.

259	      FLEVEL (Compression level)

261	         These flags are available for use by specific compression
262	         methods.  The 'deflate' method (CM = 8) sets these flags as

264	Deutsch and Gailly                                             [Page  5]
265	         follows:

267	            0 - compressor used fastest algorithm
268	            1 - compressor used fast algorithm
269	            2 - compressor used default algorithm
270	            3 - compressor used maximum compression, slowest algorithm

272	         The information in FLEVEL is not needed for decompression; it
273	         is there to indicate if recompression might be worthwhile.

275	      compressed data

277	         For compression method 8, the compressed data is stored in the
278	         deflate compressed data format as described in the document
279	         "'Deflate' Compressed Data Format Specification" by L. Peter
280	         Deutsch. (See reference [DEFLATE] in Chapter 3, below)

282	         Other compressed data formats are not specified in this version
283	         of the zlib specification.

285	      ADLER32 (Adler-32 checksum)

287	         This contains a checksum value of the uncompressed data
288	         (excluding any dictionary data) computed according to Adler-32
289	         algorithm. This algorithm is a 32-bit extension and improvement
290	         of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
291	         standard. See references [FLETCHER] and [ITU-T] in Chapter 3,
292	         below)

294	         Adler-32 is composed of two sums accumulated per byte: s1 is
295	         the sum of all bytes, s2 is the sum of all s1 values. Both sums
296	         are done modulo 65521. s1 is initialized to 1, s2 to zero.  The
297	         Adler-32 checksum is stored as s2*65536 + s1 in most-
298	         significant-byte first (network) order.

300	   2.3. Compliance

302	      A compliant compressor must produce streams with correct CMF, FLG
303	      and ADLER32, but need not support preset dictionaries.  When the
304	      zlib data format is used as part of another standard data format,
305	      the compressor may use only preset dictionaries that are specified
306	      by this other data format.  If this other format does not use the
307	      preset dictionary feature, the compressor must not set the FDICT
308	      flag.

310	      A compliant decompressor must check CMF, FLG, and ADLER32, and
311	      provide an error indication if any of these have incorrect values.
312	      A compliant decompressor must give an error indication if CM is
313	      not one of the values defined in this specification (only the
314	      value 8 is permitted in this version), since another value could
315	      indicate the presence of new features that would cause subsequent
316	      data to be interpreted incorrectly.  A compliant decompressor must

318	Deutsch and Gailly                                             [Page  6]
319	      give an error indication if FDICT is set and DICTID is not the
320	      identifier of a known preset dictionary.  A decompressor may
321	      ignore FLEVEL and still be compliant.  When the zlib data format
322	      is being used as a part of another standard format, a compliant
323	      decompressor must support all the preset dictionaries specified by
324	      the other format. When the other format does not use the preset
325	      dictionary feature, a compliant decompressor must reject any
326	      stream in which the FDICT flag is set.

328	3. References

330	   [GZIP] Deutsch, L.P.,"'Gzip' Compressed Data Format Specification".
331	   available in ftp.uu.net:/pub/archiving/zip/doc/gzip-*.doc

333	   [DEFLATE] Deutsch, L.P.,"'Deflate' Compressed Data Format
334	   Specification".  available in
335	   ftp.uu.net:/pub/archiving/zip/doc/deflate-*.doc

337	   [PNG] Thomas Boutell, "PNG (Portable Network Graphics)
338	   specification".  available in ftp://ftp.uu.net/graphics/png/png*

340	   [FLETCHER] Fletcher, J. G., "An Arithmetic Checksum for Serial
341	   Transmissions," IEEE Transactions on Communications, Vol. COM-30, No.
342	   1, January 1982, pp. 247-252.

344	   [ITU-T] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
345	   November, 1993, pp. 144, 145. (Available from gopher://info.itu.ch).
346	   ITU-T X.244 is also the same as ISO 8073.

348	4. Source code

350	   Source code for a C language implementation of a 'zlib' compliant
351	   library is available at ftp.uu.net:/pub/archiving/zip/zlib/zlib*.

353	5. Security considerations

355	   A decoder that fails to check the ADLER32 checksum value may be
356	   subject to undetected data corruption.

358	6. Acknowledgements

360	   Trademarks cited in this document are the property of their
361	   respective owners.

363	   Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
364	   the related software described in this specification.  Glenn
365	   Randers-Pehrson converted this document to Internet Draft and HTML
366	   format.

368	7. Authors' addresses L. Peter Deutsch

370	      Aladdin Enterprises

372	Deutsch and Gailly                                             [Page  7]
373	      203 Santa Margarita Ave.
374	      Menlo Park, CA 94025

376	      Phone: (415) 322-0103 (AM only)
377	      FAX:   (415) 322-1734
378	      EMail: 

380	      Jean-loup Gailly
381	      EMail: 

383	   Questions about the technical content of this specification can be
384	   sent by email to

386	      Jean-loup Gailly  and
387	      Mark Adler 

389	   Editorial comments on this specification can be sent by email to

391	      L. Peter Deutsch  and
392	      Glenn Randers-Pehrson 

394	8. Appendix: Rationale

396	   8.1. Preset dictionaries

398	      A preset dictionary is specially useful to compress short input
399	      sequences. The compressor can take advantage of the dictionary
400	      context to encode the input in a more compact manner. The
401	      decompressor can be initialized with the appropriate context by
402	      virtually decompressing a compressed version of the dictionary
403	      without producing any output. However for certain compression
404	      algorithms such as the deflate algorithm this operation be
405	      optimized without actually performing any decompression.

407	      The compressor and the decompressor must use exactly the same
408	      dictionary. The dictionary may be fixed or may be chosen among a
409	      certain number of predefined dictionaries, according to the kind
410	      of input data. The decompressor can determine which dictionary has
411	      been chosen by the compressor by checking the dictionary
412	      identifier. This document does not specify the contents of
413	      predefined dictionaries, since the optimal dictionaries are
414	      application specific. Standard data formats using this feature of
415	      the zlib specification must precisely define the allowed
416	      dictionaries.

418	   8.2. The Adler-32 algorithm

420	      The Adler-32 algorithm is much faster than the CRC32 algorithm yet
421	      still provides an extremely low probability of undetected errors.

423	      The modulo on unsigned long accumulators can be delayed for 5552
424	      bytes, so the modulo operation time is negligible.  If the bytes

426	Deutsch and Gailly                                             [Page  8]
427	      are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
428	      and order sensitive, unlike the first sum, which is just a
429	      checksum.  That 65521 is prime is important to avoid a possible
430	      large class of two-byte errors that leave the check unchanged.
431	      (The Fletcher checksum uses 255, which is not prime and which also
432	      makes the Fletcher check insensitive to single byte changes 0
433	      255.)

435	      The sum s1 is initialized to 1 instead of zero to make the length
436	      of the sequence part of s2, so that the length does not have to be
437	      checked separately. (Any sequence of zeroes has a Fletcher
438	      checksum of zero.)

440	9. Appendix: Sample code

442	   The following C code computes the Adler-32 checksum of a data buffer.
443	   It is written for clarity, not for speed.  The sample code is in the
444	   ANSI C programming language. Non C users may find it easier to read
445	   with these hints:

447	      &      Bitwise AND operator.
448	      >>     Bitwise right shift operator. When applied to an
449	             unsigned quantity, as here, right shift inserts zero bit(s)
450	             at the left.
451	      <<     Bitwise left shift operator. Left shift inserts zero
452	             bit(s) at the right.
453	      ++     "n++" increments the variable n.
454	      %      modulo operator: a % b is the remainder of a divided by b.

456	      #define BASE 65521 /* largest prime smaller than 65536 */

458	      /*
459	         Update a running Adler-32 checksum with the bytes buf[0..len-1]
460	       and return the updated checksum. The Adler-32 checksum should be
461	       initialized to 1.

463	       Usage example:

465	         unsigned long adler = 1L;

467	         while (read_buffer(buffer, length) != EOF) {
468	           adler = update_adler32(adler, buffer, length);
469	         }
470	         if (adler != original_adler) error();
471	      */
472	      unsigned long update_adler32(unsigned long adler,
473	         unsigned char *buf, int len)
474	      {
475	        unsigned long s1 = adler & 0xffff;
476	        unsigned long s2 = (adler >> 16) & 0xffff;
477	        int n;

479	Deutsch and Gailly                                             [Page  9]
480	        for (n = 0; n < len; n++) {
481	          s1 = (s1 + buf[n]) % BASE;
482	          s2 = (s2 + s1)     % BASE;
483	        }
484	        return (s2 << 16) + s1;
485	      }

487	      /* Return the adler32 of the bytes buf[0..len-1] */

489	      unsigned long adler32(unsigned char *buf, int len)
490	      {
491	        return update_adler32(1L, buf, len);
492	      }

494	Deutsch and Gailly                                            [Page  10]