idnits 2.17.1 draft-deutsch-zlib-spec-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-18) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 9 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (21 Mar 1996) is 10255 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '

' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'


     Summary: 8 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT                                          L. Peter Deutsch
2	ZLIB 3.3                                             Aladdin Enterprises
3	Expires: 26 Sep 1996                                    Jean-Loup Gailly
4	                                                                Info-ZIP
5	                                                             21 Mar 1996

7	         ZLIB Compressed Data Format Specification version 3.3

9	File draft-deutsch-zlib-spec-03.txt

11	Status of this Memo

13	   This document is an Internet-Draft.  Internet-Drafts are working
14	   documents of the Internet Engineering Task Force (IETF), its areas,
15	   and its working groups.  Note that other groups may also distribute
16	   working documents as Internet-Drafts.

18	   Internet-Drafts are draft documents valid for a maximum of six months
19	   and may be updated, replaced, or obsoleted by other documents at any
20	   time.  It is inappropriate to use Internet- Drafts as reference
21	   material or to cite them other than as ``work in progress.''

23	   To learn the current status of any Internet-Draft, please check the
24	   ``1id-abstracts.txt'' listing contained in the Internet- Drafts
25	   Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
26	   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
27	   ftp.isi.edu (US West Coast).

29	   Distribution of this memo is unlimited.

31	   A pointer to the latest version of this and related documentation in
32	   HTML format can be found at the URL
33	   .

35	Notices

37	   Copyright (c) 1996 L. Peter Deutsch and Jean-loup Gailly

39	   Permission is granted to copy and distribute this document for any
40	   purpose and without charge, including translations into other
41	   languages and incorporation into compilations, provided that the
42	   copyright notice and this notice are preserved, and that any
43	   substantive changes or deletions from the original are clearly
44	   marked.

46	Deutsch and Gailly                                             [Page  1]
47	Abstract

49	   This specification defines a lossless compressed data format.  The
50	   data can be produced or consumed, even for an arbitrarily long
51	   sequentially presented input data stream, using only an a priori
52	   bounded amount of intermediate storage.  The format presently uses
53	   the DEFLATE compression method but can be easily extended to use
54	   other compression methods.  It can be implemented readily in a manner
55	   not covered by patents.  This specification also defines the ADLER-32
56	   checksum (an extension and improvement of the Fletcher checksum),
57	   used for detection of data corruption, and provides an algorithm for
58	   computing it.

60	Table of Contents

62	   1. Introduction ................................................... 2
63	      1.1. Purpose ................................................... 2
64	      1.2. Intended audience ......................................... 3
65	      1.3. Scope ..................................................... 3
66	      1.4. Compliance ................................................ 3
67	      1.5.  Definitions of terms and conventions used ................ 3
68	      1.6. Changes from previous versions ............................ 3
69	   2. Detailed specification ......................................... 4
70	      2.1. Overall conventions ....................................... 4
71	      2.2. Data format ............................................... 4
72	      2.3. Compliance ................................................ 6
73	   3. References ..................................................... 7
74	   4. Source code .................................................... 7
75	   5. Security considerations ........................................ 8
76	   6. Acknowledgements ............................................... 8
77	   7. Authors' addresses ............................................. 8
78	   8. Appendix: Rationale ............................................ 8
79	   9. Appendix: Sample code .......................................... 9

81	1. Introduction

83	   1.1. Purpose

85	      The purpose of this specification is to define a lossless
86	      compressed data format that:

88	          * Is independent of CPU type, operating system, file system,
89	            and character set, and hence can be used for interchange;

91	          * Can be produced or consumed, even for an arbitrarily long
92	            sequentially presented input data stream, using only an a
93	            priori bounded amount of intermediate storage, and hence can
94	            be used in data communications or similar structures such as
95	            Unix filters;
96	          * Can use a number of different compression methods;

98	          * Can be implemented readily in a manner not covered by

100	Deutsch and Gailly                                             [Page  2]
101	            patents, and hence can be practiced freely.

103	      The data format defined by this specification does not attempt to
104	      allow random access to compressed data.

106	   1.2. Intended audience

108	      This specification is intended for use by implementors of software
109	      to compress data into zlib format and/or decompress data from zlib
110	      format.

112	      The text of the specification assumes a basic background in
113	      programming at the level of bits and other primitive data
114	      representations.

116	   1.3. Scope

118	      The specification specifies a compressed data format that can be
119	      used for in-memory compression of a sequence of arbitrary bytes.

121	   1.4. Compliance

123	      Unless otherwise indicated below, a compliant decompressor must be
124	      able to accept and decompress any data set that conforms to all
125	      the specifications presented here; a compliant compressor must
126	      produce data sets that conform to all the specifications presented
127	      here.

129	   1.5.  Definitions of terms and conventions used

131	      byte: 8 bits stored or transmitted as a unit (same as an octet).
132	      (For this specification, a byte is exactly 8 bits, even on
133	      machines which store a character on a number of bits different
134	      from 8.) See below, for the numbering of bits within a byte.

136	   1.6. Changes from previous versions

138	      Version 3.1 was the first public release of this specification.
139	      In version 3.2, some terminology was changed and the Adler-32
140	      sample code was rewritten for clarity.  In version 3.3, the
141	      support for a preset dictionary was introduced, and the
142	      specification was converted to Internet Draft style.

144	Deutsch and Gailly                                             [Page  3]
145	2. Detailed specification

147	   2.1. Overall conventions

149	      In the diagrams below, a box like this:

151	         +---+
152	         |   | <-- the vertical bars might be missing
153	         +---+

155	      represents one byte; a box like this:

157	         +==============+
158	         |              |
159	         +==============+

161	      represents a variable number of bytes.

163	      Bytes stored within a computer do not have a 'bit order', since
164	      they are always treated as a unit.  However, a byte considered as
165	      an integer between 0 and 255 does have a most- and least-
166	      significant bit, and since we write numbers with the most-
167	      significant digit on the left, we also write bytes with the most-
168	      significant bit on the left.  In the diagrams below, we number the
169	      bits of a byte so that bit 0 is the least-significant bit, i.e.,
170	      the bits are numbered:

172	         +--------+
173	         |76543210|
174	         +--------+

176	      Within a computer, a number may occupy multiple bytes.  All
177	      multi-byte numbers in the format described here are stored with
178	      the MOST-significant byte first (at the lower memory address).
179	      For example, the decimal number 520 is stored as:

181	             0        1
182	         +--------+--------+
183	         |00000010|00001000|
184	         +--------+--------+
185	          ^        ^
186	          |        |
187	          |        + less significant byte = 8
188	          + more significant byte = 2 x 256

190	   2.2. Data format

192	      A zlib stream has the following structure:

194	Deutsch and Gailly                                             [Page  4]
195	           0   1
196	         +---+---+
197	         |CMF|FLG|   (more-->)
198	         +---+---+

200	      (if FLG.FDICT set)

202	           0   1   2   3
203	         +---+---+---+---+
204	         |     DICTID    |   (more-->)
205	         +---+---+---+---+

207	         +=====================+---+---+---+---+
208	         |...compressed data...|    ADLER32    |
209	         +=====================+---+---+---+---+

211	      Any data which may appear after ADLER32 are not part of the zlib
212	      stream.

214	      CMF (Compression Method and flags)
215	         This byte is divided into a 4-bit compression method and a 4-
216	         bit information field depending on the compression method.

218	            bits 0 to 3  CM     Compression method
219	            bits 4 to 7  CINFO  Compression info

221	      CM (Compression method)
222	         This identifies the compression method used in the file. CM = 8
223	         denotes the 'deflate' compression method with a window size up
224	         to 32K.  This is the method used by gzip and PNG (see
225	         references [1] and [2] in Chapter 3, below, for the reference
226	         documents).  CM = 15 is reserved.  It might be used in a future
227	         version of this specification to indicate the presence of an
228	         extra field before the compressed data.

230	      CINFO (Compression info)
231	         For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
232	         size, minus eight (CINFO=7 indicates a 32K window size). Values
233	         of CINFO above 7 are not allowed in this version of the
234	         specification.  CINFO is not defined in this specification for
235	         CM not equal to 8.

237	      FLG (FLaGs)
238	         This flag byte is divided as follows:

240	            bits 0 to 4  FCHECK  (check bits for CMF and FLG)
241	            bit  5       FDICT   (preset dictionary)
242	            bits 6 to 7  FLEVEL  (compression level)

244	         The FCHECK value must be such that CMF and FLG, when viewed as
245	         a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
246	         is a multiple of 31.

248	Deutsch and Gailly                                             [Page  5]
249	      FDICT (Preset dictionary)
250	         If FDICT is set, a DICT dictionary identifier is present
251	         immediately after the FLG byte. The dictionary is a sequence of
252	         bytes which are initially fed to the compressor without
253	         producing any compressed output. DICT is the Adler-32 checksum
254	         of this sequence of bytes (see the definition of ADLER32
255	         below).  The decompressor can use this identifier to determine
256	         which dictionary has been used by the compressor.

258	      FLEVEL (Compression level)
259	         These flags are available for use by specific compression
260	         methods.  The 'deflate' method (CM = 8) sets these flags as
261	         follows:

263	            0 - compressor used fastest algorithm
264	            1 - compressor used fast algorithm
265	            2 - compressor used default algorithm
266	            3 - compressor used maximum compression, slowest algorithm

268	         The information in FLEVEL is not needed for decompression; it
269	         is there to indicate if recompression might be worthwhile.

271	      compressed data
272	         For compression method 8, the compressed data is stored in the
273	         deflate compressed data format as described in the document
274	         "'Deflate' Compressed Data Format Specification" by L. Peter
275	         Deutsch. (See reference [3] in Chapter 3, below)

277	         Other compressed data formats are not specified in this version
278	         of the zlib specification.

280	      ADLER32 (Adler-32 checksum)
281	         This contains a checksum value of the uncompressed data
282	         (excluding any dictionary data) computed according to Adler-32
283	         algorithm. This algorithm is a 32-bit extension and improvement
284	         of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
285	         standard. See references [4] and [5] in Chapter 3, below)

287	         Adler-32 is composed of two sums accumulated per byte: s1 is
288	         the sum of all bytes, s2 is the sum of all s1 values. Both sums
289	         are done modulo 65521. s1 is initialized to 1, s2 to zero.  The
290	         Adler-32 checksum is stored as s2*65536 + s1 in most-
291	         significant-byte first (network) order.

293	   2.3. Compliance

295	      A compliant compressor must produce streams with correct CMF, FLG
296	      and ADLER32, but need not support preset dictionaries.  When the
297	      zlib data format is used as part of another standard data format,
298	      the compressor may use only preset dictionaries that are specified
299	      by this other data format.  If this other format does not use the
300	      preset dictionary feature, the compressor must not set the FDICT

302	Deutsch and Gailly                                             [Page  6]
303	      flag.

305	      A compliant decompressor must check CMF, FLG, and ADLER32, and
306	      provide an error indication if any of these have incorrect values.
307	      A compliant decompressor must give an error indication if CM is
308	      not one of the values defined in this specification (only the
309	      value 8 is permitted in this version), since another value could
310	      indicate the presence of new features that would cause subsequent
311	      data to be interpreted incorrectly.  A compliant decompressor must
312	      give an error indication if FDICT is set and DICTID is not the
313	      identifier of a known preset dictionary.  A decompressor may
314	      ignore FLEVEL and still be compliant.  When the zlib data format
315	      is being used as a part of another standard format, a compliant
316	      decompressor must support all the preset dictionaries specified by
317	      the other format. When the other format does not use the preset
318	      dictionary feature, a compliant decompressor must reject any
319	      stream in which the FDICT flag is set.

321	3. References

323	   [1] Deutsch, L.P.,"'Gzip' Compressed Data Format Specification",
324	       available in ftp.uu.net:/pub/archiving/zip/doc/gzip-*.doc

326	   [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
327	       available in ftp://ftp.uu.net/graphics/png/png*

329	   [3] Deutsch, L.P.,"'Deflate' Compressed Data Format Specification",
330	       available in ftp.uu.net:/pub/archiving/zip/doc/deflate-*.doc

332	   [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
333	       Transmissions," IEEE Transactions on Communications, Vol. COM-30,
334	       No. 1, January 1982, pp. 247-252.

336	   [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
337	       November, 1993, pp. 144, 145. (Available from
338	       gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.

340	4. Source code

342	   Source code for a C language implementation of a 'zlib' compliant
343	   library is available at ftp.uu.net:/pub/archiving/zip/zlib/zlib*.

345	Deutsch and Gailly                                             [Page  7]
346	5. Security considerations

348	   A decoder that fails to check the ADLER32 checksum value may be
349	   subject to undetected data corruption.

351	6. Acknowledgements

353	   Trademarks cited in this document are the property of their
354	   respective owners.

356	   Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
357	   the related software described in this specification.  Glenn
358	   Randers-Pehrson converted this document to Internet Draft and HTML
359	   format.

361	7. Authors' addresses L. Peter Deutsch

363	      Aladdin Enterprises
364	      203 Santa Margarita Ave.
365	      Menlo Park, CA 94025

367	      Phone: (415) 322-0103 (AM only)
368	      FAX:   (415) 322-1734
369	      EMail: 

371	   Jean-loup Gailly

373	      EMail: 

375	   Questions about the technical content of this specification can be
376	   sent by email to

378	      Jean-loup Gailly  and
379	      Mark Adler 

381	   Editorial comments on this specification can be sent by email to

383	      L. Peter Deutsch  and
384	      Glenn Randers-Pehrson 

386	8. Appendix: Rationale

388	   8.1. Preset dictionaries

390	      A preset dictionary is specially useful to compress short input
391	      sequences. The compressor can take advantage of the dictionary
392	      context to encode the input in a more compact manner. The
393	      decompressor can be initialized with the appropriate context by
394	      virtually decompressing a compressed version of the dictionary
395	      without producing any output. However for certain compression
396	      algorithms such as the deflate algorithm this operation can be
397	      achieved without actually performing any decompression.

399	Deutsch and Gailly                                             [Page  8]
400	      The compressor and the decompressor must use exactly the same
401	      dictionary. The dictionary may be fixed or may be chosen among a
402	      certain number of predefined dictionaries, according to the kind
403	      of input data. The decompressor can determine which dictionary has
404	      been chosen by the compressor by checking the dictionary
405	      identifier. This document does not specify the contents of
406	      predefined dictionaries, since the optimal dictionaries are
407	      application specific. Standard data formats using this feature of
408	      the zlib specification must precisely define the allowed
409	      dictionaries.

411	   8.2. The Adler-32 algorithm

413	      The Adler-32 algorithm is much faster than the CRC32 algorithm yet
414	      still provides an extremely low probability of undetected errors.

416	      The modulo on unsigned long accumulators can be delayed for 5552
417	      bytes, so the modulo operation time is negligible.  If the bytes
418	      are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
419	      and order sensitive, unlike the first sum, which is just a
420	      checksum.  That 65521 is prime is important to avoid a possible
421	      large class of two-byte errors that leave the check unchanged.
422	      (The Fletcher checksum uses 255, which is not prime and which also
423	      makes the Fletcher check insensitive to single byte changes 0
424	      255.)

426	      The sum s1 is initialized to 1 instead of zero to make the length
427	      of the sequence part of s2, so that the length does not have to be
428	      checked separately. (Any sequence of zeroes has a Fletcher
429	      checksum of zero.)

431	9. Appendix: Sample code

433	   The following C code computes the Adler-32 checksum of a data buffer.
434	   It is written for clarity, not for speed.  The sample code is in the
435	   ANSI C programming language. Non C users may find it easier to read
436	   with these hints:

438	      &      Bitwise AND operator.
439	      >>     Bitwise right shift operator. When applied to an
440	             unsigned quantity, as here, right shift inserts zero bit(s)
441	             at the left.
442	      <<     Bitwise left shift operator. Left shift inserts zero
443	             bit(s) at the right.
444	      ++     "n++" increments the variable n.
445	      %      modulo operator: a % b is the remainder of a divided by b.

447	      #define BASE 65521 /* largest prime smaller than 65536 */

449	Deutsch and Gailly                                             [Page  9]
450	      /*
451	         Update a running Adler-32 checksum with the bytes buf[0..len-1]
452	       and return the updated checksum. The Adler-32 checksum should be
453	       initialized to 1.

455	       Usage example:

457	         unsigned long adler = 1L;

459	         while (read_buffer(buffer, length) != EOF) {
460	           adler = update_adler32(adler, buffer, length);
461	         }
462	         if (adler != original_adler) error();
463	      */
464	      unsigned long update_adler32(unsigned long adler,
465	         unsigned char *buf, int len)
466	      {
467	        unsigned long s1 = adler & 0xffff;
468	        unsigned long s2 = (adler >> 16) & 0xffff;
469	        int n;

471	        for (n = 0; n < len; n++) {
472	          s1 = (s1 + buf[n]) % BASE;
473	          s2 = (s2 + s1)     % BASE;
474	        }
475	        return (s2 << 16) + s1;
476	      }

478	      /* Return the adler32 of the bytes buf[0..len-1] */

480	      unsigned long adler32(unsigned char *buf, int len)
481	      {
482	        return update_adler32(1L, buf, len);
483	      }

485	Deutsch and Gailly                                            [Page  10]