idnits 2.17.1 draft-ietf-rescap-blob-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 6 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 34 has weird spacing: '...-01.txt in an...' == Line 784 has weird spacing: '...abel xx conte...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (1 March 2002) is 8085 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 826, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  ** Obsolete normative reference: RFC 1832 (ref. '4') (Obsoleted by RFC 4506)

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  ** Obsolete normative reference: RFC 2234 (ref. '7') (Obsoleted by RFC 4234)


     Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        Keith Moore
2	Internet-Draft                                   University of Tennessee
3	Expires: 1 September 2002                                   1 March 2002

5	          The Binary Low-Overhead Block Presentation Protocol

7	                     draft-ietf-rescap-blob-01.txt

9	Status of this Memo

11	This document is an Internet-Draft and is in full conformance with all
12	provisions of Section 10 of RFC2026.

14	Internet-Drafts are working documents of the Internet Engineering Task
15	Force (IETF), its areas, and its working groups.  Note that other groups
16	may also distribute working documents as Internet-Drafts.

18	Internet-Drafts are draft documents valid for a maximum of six months
19	and may be updated, replaced, or obsoleted by other documents at any
20	time.  It is inappropriate to use Internet-Drafts as reference material
21	or to cite them other than as "work in progress."

23	The list of current Internet-Drafts can be accessed at
24	http://www.ietf.org/ietf/1id-abstracts.txt

26	The list of Internet-Draft Shadow Directories can be accessed at
27	http://www.ietf.org/shadow.html

29	This document is being submitted as a contribution to the IETF rescap
30	working group.  Comments regarding this internet-draft should be sent to
31	the rescap mailing list at rescap@cs.utk.edu, or to the author at the
32	address listed below.  Requests to subscribe to the rescap mailing list
33	should be sent to rescap-REQUEST@cs.utk.edu.  Please include the
34	document identifier draft-ietf-rescap-blob-01.txt  in any comments.

36	Known errata of this specification, as well as sample code, will be made
37	available at http://www.cs.utk.edu/~moore/blob/

39	This Internet-Draft will expire on 1 September 2002.

41	ABSTRACT

43	This memo describes the Binary Low-Overhead Block (BLOB) protocol for
44	on-the-wire presentation of data in the context of higher-level
45	protocols.  BLOB is designed to encode and decode data with low overhead
46	on most CPUs, to be reasonably space-efficient, and for its
47	representation to be sufficiently precise that it is suitable as a
48	canonical format for digital signatures.

50	1. Introduction

52	When designing applications-layer protocols there is sometimes a need to
53	have an efficient means of encoding protocol elements or protocol data
54	units.  Existing solutions in this space may be deemed inadequate, for
55	various reasons.  For example:

57	-    ASN.1 [2] and BER [3] are baroque both in terms of the abstract
58	     syntax and available on-the-wire representations, and complex to
59	     implement.

61	-    ONC XDR [4] requires a stub generator and support libraries which
62	     are not easily available on all platforms, and there are subtle
63	     differences between the APIs provided by different implementations.
64	     XDR is large enough that it's not usually feasible to write your
65	     own implementation, and it's difficult to write portable code that
66	     can work with the various implementations that are deployed.  Many
67	     XDR implementations have significant unnecessary processing
68	     overhead.  This impairs performance of applications based on XDR
69	     and gives the protocol itself a worse reputation than it otherwise
70	     deserves.

72	-    The design of MIME [5] was heavily influenced by the need to be
73	     able to operate over existing text-based mail systems which imposed
74	     a number of constraints.  This worked out well for email, but for
75	     other applications, MIME is neither efficient in terms of storage
76	     density nor easy to parse.

78	-    XML [6] is easier to parse than MIME, but still requires
79	     significant processing overhead.  There is also a large and growing
80	     body of "culture" regarding how XML should be used, which
81	     paradoxically imposes a significant barrier to use of XML.  (To be
82	     fair, MIME also has a fair amount of "culture" associated with it.)
83	     Finally, for small and regular data structures XML imposes a lot of
84	     overhead.

86	BLOB was designed to serve as an alternative to these presentation
87	layers for use in representing relatively simple structures, consisting
88	of a limited set of primitive data types, and where the structures can
89	reasonably be contained within a single protocol data unit.

91	BLOB is designed with the following considerations:

93	-    It should be easy and efficient to generate the encoded form.

95	-    The encoded form should require minimal processing to decode,
96	     ideally being usable in-place (without allocating memory or
97	     copying) on most platforms.

99	-    It should be easy to write programs which manipulate and exchange
100	     BLOBs, without needing significant external support in the form of
101	     libraries or stub generators.

103	-    The structure should be easy and efficient to verify for internal
104	     consistency.

106	-    For any structure to be represented there should be a unique
107	     (canonical) on-the-wire encoding which is always used.

109	-    It should be reasonably space-efficient.  However, this is
110	     secondary to minimizing processing overhead.

112	The BLOB approach is more feasible now than in years past because data
113	representations have become more uniform across different computing
114	platforms.  Essentially all widely-used computers now support 32-bit
115	integers, can address 32-bit integers which are not aligned on any
116	larger boundary, use word sizes which are a multiple of 8 bits, and can
117	directly address strings of 8-bit characters which are not aligned on
118	any boundary larger than an octet.  Such computers are termed "well-
119	behaved" with respect to BLOB.  BLOB is designed to be usable on
120	machines which do not have these characteristics, but such machines will
121	necessarily incur more data conversion overhead.

123	1.1. Notation

125	The word BLOB in upper case letters is used to refer to the protocol;
126	that is, the algorithm used to define the encoding and decoding of data
127	structures defined in this memo.  The word "blob" in lower case letters
128	refers to a data structure (sequence of octets) that has been produced
129	by, or can be decoded by, the BLOB protocol.

131	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
132	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
133	document, when spelled entirely in upper case letters, are to be
134	interpreted as described in [1].

136	2. BLOB Overview

138	A "blob" is a linear (octet-stream) encoding of some data structure,
139	which is used as a protocol data unit within some application.  The
140	structure encoded by a blob is a collection of "components".  Each of
141	the components of a blob is either a "scalar" (meaning that the
142	component consists of exactly one instance of that data type) or an
143	"array" (meaning that the component consists of a sequence of zero or
144	more "elements" of a uniform data type).

146	The data types which can appear as components of a blob are: unsigned
147	integer (32 bits in length), string (a variable-length sequence of
148	octets with arbitrary values), or blob.  Any of these types can occur as
149	a scalar or in an array.

151	Since one blob can contain other blobs, complex nesting of structures is
152	possible.  However the blob encoder and decoder treat "embedded" blobs
153	(blobs which occur as components of an outer blob) as opaque structures.
154	For example, embedded blobs are not automatically decoded along with
155	outer blobs, and a formatting error in an embedded blob does not create
156	a formatting error for any blob that contains it.

158	"Variable-length" here means that the lengths of arrays need not be pre-
159	determined by the protocol using BLOB.  The maximum lengths of strings
160	and arrays are constrained by the use of a 32-bit unsigned integer for
161	the length of the blob, and the representation of offsets of data
162	relative to the start of the blob as 32-bit unsigned integers.  Lengths
163	may be further constrained by the higher-level protocol's choice of
164	transmission medium - for instance, if the blob must fit into a UDP
165	datagram.  The number of array elements is limited to 255 arrays of each
166	data type, but this should be adequate for most data structures needed
167	in network protocols.

169	2.1 Use of Data Types Not Supported by BLOB

171	The primitive types (unsigned 32-bit integer and octet string) were
172	chosen because they represent the majority of data types used in network
173	protocols, they are directly supported by most computer hardware, and
174	because data types outside of this set are often specific to the higher-
175	level protocol anyway.  Having a small set of data types allows BLOB to
176	be a compact yet self-describing encoding, which is efficient to decode
177	and which does not require separate marshaling routines for each
178	protocol data unit used by an application.  A few additional types (in
179	particular, single- and double-precision floating point) are being
180	considered for future versions of BLOB.  The BLOB protocol is intended
181	to allow new primitive types to be added without changing the format of
182	blobs that do not include these types.

184	When a higher-level protocol needs to use a data type that is not
185	directly supported by BLOB, such data must be represented in terms of
186	the available types. The higher-level protocol specification must define
187	the representation of such data in terms of types supported by BLOB, and
188	the conversion between the blob representation and the native format
189	must be explicitly managed by the applications.  For instance:

191	-    A signed 32-bit integer may be transmitted as an unsigned 32-bit
192	     integer by encoding the signed integer in twos-complement format.
193	     On most modern machines no conversion will be necessary; however on
194	     machines for which the smallest integer representation is larger
195	     than 32 bits it will be necessary for the application to sign-
196	     extend the result.

198	-    A 64-bit integer may be transmitted as two consecutive 32-bit
199	     integers (with the most significant word first), which would
200	     require that the receiving application arrange those two integers
201	     according to its native byte ordering.  Alternatively a 64-bit
202	     integer may be transmitted as eight consecutive octets within a
203	     string (most significant byte first), which would require that the
204	     receiving application re-arrange those octets according to its
205	     local byte ordering.

207	-    A multi-dimensional array may be represented as a single-
208	     dimensional array with the dimensions of the array passed as
209	     separate integer components.

211	-    In the current version of BLOB, floating point numbers may be
212	     encoded in IEEE format and transmitted as either integers (modulo
213	     sign-extension issues) or strings (modulo alignment issues).
214	     Future versions of BLOB may support floating point numbers
215	     directly.

217	-    A small dense set may be represented as bits within a scalar
218	     integer.  A larger dense set may be encoded using individual bits
219	     of the elements of an integer array.

221	3. BLOB Organization

223	At the most basic level, the blob consists of an integer portion
224	followed by an opaque portion.  The integer portion is a sequence of
225	unsigned 32-bit (4-octet) quantities, represented on-the-wire in network
226	byte ("big-endian") order.  The opaque portion is a sequence of 8-bit
227	(1-octet) quantities.

229	The blob is separated into opaque and integer portions in order to
230	facilitate efficient decoding on little-endian machines, or on any
231	machine with a word size other than 32 bits.  Having all of the integers
232	within a blob co-located in a contiguous area allows an implementation
233	to efficiently convert all of the integers to local format at the same
234	time.  Strings of octets are assumed to have the same representation on
235	all platforms, so conversion is unlikely to be needed for the opaque
236	portion.

238	The integer portion of a blob is further divided into a header, a list
239	of array bases, and an integer pool.  The header is used to store
240	various data needed to decode the blob and check it for consistency.
241	The array bases portion contains the offsets (positions relative to the
242	start of the blob) of the each of the arrays in the blob (including the
243	arrays used to store scalar components).  The integer pool is used for
244	storing integer data as well as the offsets of embedded blobs and
245	strings.

247	The opaque portion is divided into a blob pool and a string pool.  The
248	blob pool is used to store embedded blobs; the string pool is used to
249	store strings.  The blob pool occurs immediately following the integer
250	pool in order to ensure that embedded blobs are always aligned on a
251	four-octet boundary (relative to the start of the blob).

253	Each embedded blob is padded with 0-3 zero octets until its length is an
254	exact multiple of 4 octets.  This ensures that all embedded blobs are
255	aligned to 4-octet boundaries, allowing the blob decoder to assume (if
256	the outer blob is on an aligned boundary) that each of the embedded
257	blobs is also aligned.

259	Each string is padded with a single octet with a value of zero, which is
260	not part of the string.  This is for convenience when strings are used
261	to store character data, with programming languages that use a zero-
262	valued octet as a string terminator.

264	Embedded blobs are opaque to their enclosing blob and are NOT
265	automatically parsed or decoded when the outer blob is decoded.  If the
266	receiving application wishes to examine contents of an inner blob, it
267	must decode it separately from the enclosing blob.

269	A blob can have both scalar and array components.  For simplicity in
270	decoding and to eliminate some edge cases, all of the scalar integers of
271	a blob are stored in a "scalar integer array" which immediately follows
272	the last integer array component of the blob.  Similarly, all of the
273	scalar (embedded) blob parameters) are stored in a "scalar blob array"
274	which immediately follows the last blob array component, and all of the
275	scalar string parameters are stored in a "scalar string array" which
276	follows the last string array component.

278	3.1 Representation of data types

280	In general, all components of a blob are elements of an array.  A
281	distinguished array of each type is used to store scalar components of
282	that type.  The base of any array (whether it is a numbered array
283	component or an array used to hold scalar components) can be determined
284	by decoding the array_counts_and_flags field of the blob header.

286	Since strings (and blobs) can be of varying length, an array of strings
287	(or blobs) is represented internally by an array of integers.  Each of
288	these integers indicates the storage location (within the blob) of the
289	contents of the string or blob.  These integers are consecutive; the
290	offset of element 2 of an array immediately follows the offset of
291	element 1.  Similarly, the array elements occupy consecutive storage -
292	the storage occupied by string 3 of an array immediately follows that
293	occupied by string 2.  This allows the size of array N to be computed by
294	subtracting its offset from that of the following array; this works for
295	any numbered array.  It also allows the length of element M to be
296	computed by subtracting its offset from that of the following element;
297	this works for elements (within bounds) of numbered arrays.  The last
298	scalar blob or string is a boundary case; these require an explicit test
299	to correctly determine their length.

301	The individual components of a blob are encoded as follows:

303	3.1.1 integers and integer arrays

305	An unsigned integer is represented as a 32-bit quantity in big-endian
306	format.  All integer components appear in the integer_pool section of a
307	blob.

309	An integer array is represented as zero or more contiguous 32-bit
310	integers, that are stored within the integer_pool section of the blob.
311	The location (or "base") of the array relative to the start of the blob
312	is stored as a 32-bit integer offset.  The base of this array is stored
313	in the array_bases portion of the blob.

315	Scalar integer components a blob are encoded in a scalar integer array.
316	The storage for the elements of this array is in the integer pool, and
317	immediately follows the storage used by the last numbered integer array.
318	The offset of the scalar integer array appears in the array_bases
319	portion of the blob.

321	3.1.2 (embedded) blobs and blob arrays

323	An embedded blob component is represented as a series of octets which is
324	an integral multiple of four octets long.  The storage for embedded
325	blobs is taken from the blob pool of the enclosing blob.  An integer
326	offset (relative to the beginning of the blob) indicates the starting
327	location of the embedded blob.  For scalar embedded blob components
328	these offsets are encoded in a scalar blob array.  This array (of blob
329	offsets) is stored in the integer pool and immediately follows the
330	offsets of the numbered blob arrays.

332	A blob array is represented as an integer base (stored in array_bases)
333	which points to an array of integers (stored in the integer pool), each
334	element of which is the offset of a blob (within the blob pool).

336	Each embedded blob (within the blob pool) is followed by from 0-3 octets
337	with the value zero, so that any subsequent blob will be aligned on a
338	four-octet boundary.  These padding octets are not considered part of
339	the blob; however, the length of the inner blob (as seen from the
340	enclosing blob) will include any padding.

342	3.1.3 strings and string arrays

344	A string is represented as a sequence of octets; these octets may have
345	arbitrary values.  The contets of strings are stored in the string_pool.
346	An integer offset (stored in integer_pool) indicates the location of the
347	contents of the string.

349	A string array is represented as an integer base (stored in array_bases)
350	which points to an array of integers (stored in the integer pool), each
351	element of which indicates the offset of a string (stored in string
352	pool).

354	Each string is followed in the string_pool by a zero octet which is not
355	part of the string.  Thus the length of any string (other than the last
356	scalar string component) can be calculated by subtracting its offset
357	from the offset of the subsequent string, minus 1.

359	Strings can be of zero length, in which case the corresponding offset
360	points to a zero octet which is immediately followed by the next string
361	in the string_pool.

363	3.2 Structure of a blob

365	The structure of a blob is as follows:

367	       octet offset                name

369	                  0 +--------------------------------+ \
370	                    |          blob_length           | |
371	                  4 +--------------------------------+ |
372	                    |      integer_pool_offset       | |
373	                  8 +--------------------------------+ |
374	                    |        blob_pool_offset        | |
375	                 12 +--------------------------------+ |
376	                    |      string_pool_offset        | |
377	                 16 +--------------------------------+ |
378	                    |     array_count_and_flags      | |
379	                 20 +--------------------------------+ + integer portion
380	                    :                                : |
381	                    :          array_bases           : |
382	                    :                                : |
383	integer_pool_offset +--------------------------------+ |
384	                    :                                : |
385	                    :          integer_pool          : |
386	                    :                                : /
387	   blob_pool_offset +--------------------------------+ \
388	                    :                                : |
389	                    :            blob_pool           : |
390	                    :                                : |
391	 string_pool_offset +--------------------------------+ + opaque portion
392	                    :                                : |
393	                    :           string_pool          : |
394	                    :                                : |
395	        blob_length +--------------------------------+ /

397	For this version of the BLOB protocol, the integer portion begins at
398	offset 0 and is blob_pool_offset octets in length.  The opaque portion
399	begins at blob_pool_offset and is (blob_length - blob_pool_offset)
400	octets in length.

402	Future versions of the BLOB protocol may add additional pools for other
403	data types, and therefore may change these formulas.  BLOB decoder
404	implementations MUST therefore decode 'array_count_and_flags' (see
405	below) and verify that the flags portion of this field is equal to zero,
406	before translating the remainder of the integer portion to the format
407	used by the local machine.

409	The following paragraphs describe the fields within a blob:

411	blob_length
412	     The blob_length is the length of the entire blob in octets.  The
413	     length includes the space occupied by blob_length.  blob_length
414	     does not include any padding which is added to make an embedded
415	     blob a multiple of four octets long.

417	integer_pool_offset
418	     The integer_pool_offset is the octet offset (relative to the start
419	     of the blob) of the integer_pool field of the blob.
420	     integer_pool_offset MUST be a multiple of four, greater than or
421	     equal to 24, and less than or equal to blob_pool_offset.  If the
422	     length of integer_pool is zero, integer_pool_offset will be equal
423	     to blob_pool_offset.

425	blob_pool_offset
426	     The blob_pool_offset is the offset (relative to the start of the
427	     blob) of the blob_pool field of the blob.  blob_pool_offset MUST be
428	     a multiple of four, greater than or equal to integer_pool_offset,
429	     and less than or equal to string_pool_offset.  If the length of the
430	     blob_pool is zero, blob_pool_offset will be equal to
431	     string_pool_offset.

433	string_pool_offset
434	     The string_pool_offset is the offset (relative to the start of the
435	     blob) of the string_pool portion of the blob.  It MUST be a
436	     multiple of four, greater than or equal to blob_pool_offset, and
437	     less than or equal to blob_length.  If the length of the
438	     string_pool is zero, string_pool_offset will be equal to
439	     blob_length.

441	array_counts_and_flags
442	     The array_counts_and_flags field indicates how many of each kind of
443	     array element are contained within the blob.  This field is
444	     calculated as follows:

446	          array_counts_and_flags = (num_int_arrays) +
447	                                   (num_blob_arrays << 8) +
448	                                   (num_string_arrays << 16) +
449	                                   (flags << 24)

451	     where num_xxx_args is the number of array arguments of type xxx.

453	     The "flags" portion of this field is used to indicate extensions to
454	     this format.  Blobs that do not use these extensions will have a
455	     flags field of zero.  For this version of the BLOB protocol, the
456	     flags field MUST be zero.

458	array_basess
459	     The array_bases field contains the bases (offsets relative to the
460	     start of the blob) of each of the arrays in the blob, including
461	     those arrays which contain the scalar components of the blob (using
462	     separate arrays for scalar integer, struct, and string components).
463	     Specifically the array_bases field contains, in order:

465	     1.   The base of each integer array.  There are num_int_arrays
466	          (possibly zero) of these.

468	     2.   The base of the scalar integer array.  This base is always
469	          present, even if there are no scalar integer components.  If
470	          there are no scalar integer components of the blob, the scalar
471	          integer array base will be the same as the base of blob array
472	          0.  (If there are no blob arrays in the blob, the base of the
473	          scalar integer array will be the same as the base of the
474	          scalar blob array.)

476	     3.   The base of each blob array.  There are num_blob_arrays
477	          (possibly zero) of these.

479	     4.   The base of the scalar blob array.  This base is always
480	          present.  If there are no embedded scalar blob components in
481	          the blob, the scalar blob array base will have the same value
482	          as the base of string array 0.  (If there are no string arrays
483	          in this blob, this offset will be the same as the base of the
484	          scalar string array.)

486	     5.   The base of each string array.  There are num_string_arrays
487	          (possibly zero) of these.

489	     6.   The base of the scalar string array.  If there are no scalar
490	          string components of the blob, the base of the scalar string
491	          array will be equal to blob_length.

493	     7.   Any additional bases of arrays, or offsets of scalar
494	          components, which might be defined by future versions of this
495	          protocol.  The presence of additional data types not supported
496	          in this version of the BLOB protocol will be indicated by a
497	          nonzero value in the flags portion of the
498	          array_counts_and_flags field.

500	integer_pool
501	     The integer_pool contains 32-bit integers, assumed to be unsigned.
502	     These may be either scalar integer, elements of integer arrays,
503	     offsets of scalar blobs or strings, or bases of blob or string
504	     arrays The integers within the integer_pool MUST appear in the
505	     following order:

507	     1.   The elements of integer arrays.  The integer array components
508	          appear in order, and within each array, the elements appear in
509	          order.  The arrays and their elements are numbered from zero.
510	          Thus the 0th element of the 1st integer array immediately
511	          follows the last element of the 0th integer array.

513	     2.   The elements of the scalar integer array.  Thus integer scalar
514	          component 0 immediately follows the last element of the last
515	          integer array; followed by integer scalar component 1, etc.
516	          (If there are no integer arrays, the offset of integer scalar
517	          0 is integer_pool).

519	     3.   The offsets of elements of blob arrays.  Each blob offset MUST
520	          be an integral multiple of four, and each blob offset MUST
521	          point into the blob_pool.  The offset of the element 0 of blob
522	          array 0 MUST be equal to blob_pool_offset.  Each subsequent
523	          element of a blob array MUST have an offset equal to the
524	          offset of the preceding blob plus the declared length of the
525	          preceding blob (after padding).

527	          NOTE: The data within an embedded blob is considered opaque to
528	          the enclosing blob; the only reason for separating blobs from
529	          strings is to ensure padding of blobs to 4-octet boundaries.
530	          Blob encoders SHOULD NOT insist that the length field of an
531	          embedded blob is consistent with the length declared for that
532	          blob, and blob decoders SHOULD NOT check the length fields of
533	          embedded blobs when decoding the enclosing blob.

535	     4.   The offsets of elements of the scalar blob array.  Each blob
536	          offset MUST be a integral multiple of four, and MUST point
537	          into the blob_pool. The offset of scalar blob component 0 MUST
538	          immediately follow the last element of the last blob array.
539	          (If there are no blob arrays, the offset of scalar blob
540	          component 0 is blob_pool).  Each subsequent scalar blob
541	          component MUST have an offset equal to the offset of the
542	          preceding blob plus the length of the preceding blob (after
543	          padding).

545	     5.   The offsets of elements of string arrays.  These offsets MUST
546	          point into the string_pool.  Element 0 of string array 0 MUST
547	          have an offset equal to string_pool_offset, and each
548	          subsequent string MUST have an offset equal to the preceding
549	          string's offset, plus the length of the preceding string, plus
550	          1 (for the trailing zero octet).

552	     6.   The offsets of elements of the scalar string array.  These
553	          offsets MUST point into the string_pool.  The scalar string
554	          component 0 MUST have an offset equal to the offset of the
555	          preceding string, plus the length of the preceding string,
556	          plus 1 (for the trailing zero octet).  (If there are no string
557	          arrays, the offset of scalar string 0 is string_pool).

559	blob_pool
560	     The blob_pool contains structures which are encoded in blob format.
561	     These structures may be scalar blob components of the outer blob,
562	     or elements of scalar blob arrays of the outer blob.  The contents
563	     of blob_pool appear in the following order:

565	     1.   The contents of each element of each blob array.  Element 0 of
566	          blob array 0 appears first, followed by element 1 of blob
567	          array 0, etc.

569	     2.   The contents of each element of the scalar blob array, used to
570	          store scalar (embedded) blob components of the outer blob.

572	     Each blob in the blob pool MUST be padded with from zero to three
573	     octets, each with a value of zero, so that the length of each blob
574	     is an exact multiple of four octets.

576	string_pool
577	     The string_pool contains unaligned strings of arbitrary octets.
578	     These strings may be used for character data or for any other data
579	     which can be represented as a string of octets.  BLOB makes no
580	     assumptions regarding the format of data (character encoding
581	     scheme, etc.) that is stored in strings.

583	     The contents of the string_pool appear in the following order:

585	     1.   The contents of each element of each string array of the blob.

587	     2.   The contents of each element of the scalar string array.

589	     For compatibility with programming languages which terminate
590	     strings with a zero octet, a zero octet is automatically appended
591	     to each string in the string_pool.  This zero octet is not part of
592	     the string.  Since zero octets MAY appear within BLOB strings, the
593	     zero octet that is appended to each string MUST NOT be used as a
594	     string terminator except when the higher-level protocol has
595	     specified that they may be used in this way.

597	4. Use of blobs by higher-level protocols

599	Higher-level protocols using BLOB as an encoding mechanism need to
600	define their protocol data units in terms of blobs.  Since BLOB groups
601	all similarly-typed data together within the blob (for ease of
602	conversion), and since BLOB rigidly defines the order in which data must
603	appear, applications generally cannot refer to protocol elements within
604	a blob by a fixed offset.  Instead, the application code references
605	protocol elements in terms of "the second scalar string component", "the
606	third scalar integer component" or "the second element of the fourth
607	integer array component".  Macros or functions which allow these
608	elements to be accessed from a decoded blob structure are easily
609	constructed.

611	It is possible to design a simple specification language which allows
612	the elements of a blob to be specified in the order that makes the most
613	sense to an application, and which produces a list of macros which map
614	from protocol data element names to routines which can access those data
615	elements.  This hides the details of BLOB's reordering from the
616	application without significantly impairing efficiency.  An example of
617	such a language is given in Appendix B.

619	If higher-level protocols employ data types other than the BLOB
620	primitive data types, they must define how the application-specific data
621	types are represented as one or more BLOB primitive types, and
622	implementations of the protocol will be responsible for conversion.
623	Applications which require a canonical form (say for signing) should
624	specify the conversion from application data types to BLOB types so that
625	there is exactly one possible representation of each application data
626	type within BLOB.

628	Since each blob is self-contained with its own header, embedded blobs
629	add a bit of overhead.  Protocol designers should avoid unnecessary
630	nesting of structures.  For instance, what is conceptually an array of
631	structures to an application might be better represented within BLOB as
632	several parallel arrays.  However, nesting of blobs is useful when it is
633	desired that an inner blob be opaque to the layer of a protocol that
634	decodes the outer blob.

636	4.1. Encoding Issues

638	Most blobs will contain at least one variable-length data structure.
639	This implies that the offsets of the components within the blob will not
640	be known in advance, and a program that encodes a blob will usually be
641	unable to generate the elements of a blob in-place. The encoder routine
642	will generally need to copy the elements of a blob from their various
643	locations into a contiguous area of memory, in the order prescribed by
644	the BLOB specification.

646	4.2. Decoding Issues

648	On "well-behaved" machines it should be possible to use blobs in-place
649	after converting the integer portion of the blob to the local byte
650	order.  The protocol elements within the blob can then be accessed with
651	macros.

653	It is necessary to check the blob for consistency before using it.  In
654	particular:

656	-    The blob_length must be consistent with the length of the PDU or
657	     buffer in which the blob was received.  (For instance, it must not
658	     be less than the length of data received).

660	-    The blob_length must be at least 32 (which would be the length of
661	     an empty blob with no arguments).

663	-    The 'flags' portion of array_counts_and_flags MUST be zero.

665	-    The integer_pool_offset must be equal to the the number of
666	     arguments (decoded from array_counts_and_flags) multiplied by 4,
667	     plus 20.

669	-    The blob_pool_offset must be greater than or equal to
670	     integer_pool_offset.

672	-    The string_pool_offset must be greater than or equal to
673	     blob_pool_offset.

675	-    The string_pool_offset must be less than or equal to blob_length.

677	-    The base of each integer array and each blob array must be an
678	     integral multiple of 4.

680	-    The base of the first integer array (if any) must be equal to
681	     integer_pool_offset.

683	-    Each subsequent integer array base must be greater than or equal to
684	     the previous integer array base, and less than or equal to
685	     blob_pool_offset.

687	-    The offset of element 0 of the first blob array (if any) must be
688	     equal to blob_pool_offset.

690	-    Each subsequent blob offset must be greater than the previous blob
691	     offset.

693	-    The last blob offset must be less than string_pool_offset.

695	-    The first string component must have an offset equal to
696	     string_pool.

698	-    The offset of each subsequent string must be greater than the
699	     offset of the first element of the previous string.

701	-    Except for the first string, there must be a zero octet preceding
702	     each offset of each string component or string array element.

704	-    The last octet in the string_pool must be a zero.

706	4.3 Encoding and decoding code

708	A free software sample blob encoder and decoder have been written and
709	will be made available at the location listed in Appendix C.

711	5. Security Considerations

713	It is believed that the BLOB encoding is unique and can serve as a
714	useful 'canonical form' for a data structure.  However, if higher-level
715	protocols encode non-native data types as BLOB primitive types, they
716	must also define a unique representation for each quantity to be stored
717	in that data-type.

719	In order to prevent possible attacks by transmission of blobs containing
720	bogus offsets, it is essential to perform the bounds checks listed in
721	section 4.2 while decoding blobs.  While such attacks could not easily
722	overwrite memory with data chosen by an attacker, they could cause a
723	server to malfunction.

725	6. Author's Address

727	Keith Moore
728	University of Tennessee
729	1122 Volunteer Blvd, Suite 203
730	Knoxville TN 37996-3450
731	email: moore@cs.utk.edu

733	7. References

735	[1]. Bradner, S.  "Key words for use in RFCs to Indicate Requirement
736	     Levels", RFC 2119, March 1997.

738	[2]  "Information technology - Abstract Notation One (ASN.1):
739	     Specification of basic notation"  ITU-T recommendation X.680,
740	     December 1997.  Available from http://www.itu.int/ITU-
741	     T/studygroups/com17/languages/.

743	[3]  "Information technology - ASN.1 encoding rules: Specification of
744	     Basic Encoding Rules (BER) Canonical Encoding Rules (CER) and
745	     Distinguished Encoding Rules (DER)"  ITU-T recommendation X.690,
746	     December 1997.  Available from http://www.itu.int/ITU-
747	     T/studygroups/com17/languages/.

749	[4]  Srinivasan, R., "XDR: External Data Representation Standard", RFC
750	     1832, August 1995.

752	[5]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions
753	     (MIME) Part One: Format of Internet Message Bodies", RFC 2045,
754	     November 1996.

756	[6]  "Extensible Markup Language (XML) 1.0 (Second Edition)", W3C
757	     Recommendation, October 2000,
758	     .

760	[7]  Crocker, D. (ed.), Overell, P. "Augmented BNF for Syntax
761	     Specifications: ABNF.".  RFC 2234, November 1997.

763	Appendix A. ASCII-Art Picture of a blob

765	This diagram attempts to illustrate the ordering of the various elements
766	of a blob and the relationship of the offsets to the elements to which
767	they point.

769	The following is a dump, in an assembler-like notation, of a blob which
770	encodes:

772	     2 scalar integers with values 10, 20 (decimal)
773	     1 integer array, with elements { 1 2 3 4 }
774	     0 scalar blobs
775	     0 blob arrays
776	     1 scalar string with the value "string"
777	     2 string arrays, with elements { "a" "b" } and { "cc" "dd" "ee" }.

779	"label" denotes the name assigned to a particular offset; "xx" gives the
780	offset in hexadecimal; "contents" gives the value of the octet or octets
781	which appear at that offset; and "description" gives a description of
782	the value that appears in that location.

784	                        label xx  contents description
785	     ------------------------:--:---------:------------------------
786	                             :00: 00000070: blob_length
787	                             :04: 0000002c: integer_pool
788	                             :08: 0000005c: blob_pool
789	                             :0c: 0000005c: string_pool
790	                             :10: 00020002: array_count_and_flags
791	                             :14: 0000002c: int_array_base_0
792	                             :18: 0000003c: scalar_int_array_base
793	                             :1c: 00000044: scalar_blob_array_base
794	                             :20: 00000044: string_array_base_0
795	                             :24: 0000004c: string_array_base_1
796	                             :28: 00000058: scalar_string_array_base
797	                 integer_pool:
798	             int_array_base_0:2c: 00000001:
799	                             :30: 00000002:
800	                             :34: 00000003:
801	                             :38: 00000004:
802	        scalar_int_array_base:3c: 0000000a: (10 decimal)
803	                             :40: 00000014: (20 decimal)
804	       scalar_blob_array_base:
805	          string_array_base_0:44: 0000005c: ptr_to_str[0,0]
806	                             :48: 0000005e: ptr_to_str[0,1]
807	          string_array_base_1:4c: 00000060: ptr_to_str[1,0]
808	                             :50: 00000063: ptr_to_str[1,1]
809	                             :54: 00000066: ptr_to_str[1,2]
810	     scalar_string_array_base:58: 00000069: ptr_to_scalar_str[0]
811	                    blob_pool:
812	                  string_pool:
813	              ptr_to_str[0,0]:5c: 61: 'a'
814	                             :5d: 00:
815	              ptr_to_str[0,1]:5e: 62: 'b'
816	                             :5f: 00:
817	              ptr_to_str[0,0]:60: 63: 'c'
818	                             :61: 63: 'c'
819	                             :62: 00:
820	              ptr_to_str[0,0]:63: 64: 'd'
821	                             :64: 64: 'd'
822	                             :65: 00:
823	              ptr_to_str[0,0]:66: 65: 'e'
824	                             :67: 65: 'e'
825	                             :68: 00:
826	         ptr_to_scalar_str[0]:69: 73: 's'
827	                             :6a: 74: 't'
828	                             :6b: 72: 'r'
829	                             :6c: 69: 'i'
830	                             :6d: 6e: 'n'
831	                             :6e: 67: 'g'
832	                             :6f: 00:
833	                  blob_length:70:

835	Appendix B. Example Abstract Syntax

837	This syntax used to describe BLOB structures is described below using
838	the ABNF syntax from [7]:

840	     file = *(block / comment-line)

842	     block = "BEGIN" 1*space id [ 1*space comment ] CRLF
843	             *element
844	             END [ comment ] CRLF

846	     element = "int" 1*space identifier [ comment ] CRLF /
847	               "string" 1*space identifier [ comment ] CRLF /
848	               "int<>" 1*space identifier [ comment ] CRLF /
849	               "string<>" 1*space identifier [ comment ] CRLF /
850	               "struct" 1*space identifier [ comment ] CRLF
851	               "struct<>" 1*space identifier [ comment ] CRLF

853	     comment = *space "#" *char

855	     comment-line = comment CRLF

857	     id = letter *(letter / digit / "_")

859	     letter = "A".."Z"        # includes lower case also

861	     digit = "0".."9"

863	     space = %20 / %09

865	     char = %01..%09 / %0B / %0C / %0E..%FF

867	     CRLF = 0*1%0D 0*1%0A

869	Here is a simple awk program to interpret this syntax and produce a list
870	of C #define macros.  The macros are of the form

872	     #define structname_element_type number

874	where 'structname' is the name of the structure, 'element' is the name
875	of the element, and 'type' is a suffix indicating the type of the
876	element (i = int, b = blob, s = string, ia = integer array, ba = blob
877	array, sa = string array) for ease in visual type checking.

879	This program is quite simplistic and performs no error checking.

881	#!/bin/sh
882	# the sed line deletes comments
883	sed -e 's/[ ]*#.*//' | awk '
884	$1 == "BEGIN" {
885	        current_id = $2;
886	        nint = nblob = nstr = ninta = nbloba = nstra = 0;
887	}
888	$1 == "int" {
889	        inames[nint] = $2;
890	        nint++;
891	        next;
892	}
893	$1 == "string" {
894	        snames[nstr] = $2;
895	        nstr++;
896	        next;
897	}
898	$1 == "struct" {
899	        bnames[nblob] = $2;
900	        nblob++;
901	        next;
902	}
903	$1 == "int<>" {
904	        ianames[ninta] = $2;
905	        ninta++;
906	        next;
907	}
908	$1 == "string<>" {
909	        sanames[nstra] = $2;
910	        nstra++;
911	        next;
912	}
913	$1 == "struct<>" {
914	        banames[nbloba] = $2;
915	        nbloba++;
916	        next;
917	}
918	$1 == "END" {
919	        for (i = 0; i < nint; ++i)
920	                printf ("#define %s_%s_i %d\n", current_id, inames[i], i);
921	        for (i = 0; i < nblob; ++i)
922	                printf ("#define %s_%s_b %d\n", current_id, bnames[i], i);
923	        for (i = 0; i < nstr; ++i)
924	                printf ("#define %s_%s_s %d\n", current_id, snames[i], i);
925	        for (i = 0; i < ninta; ++i)
926	                printf ("#define %s_%s_ia %d\n", current_id, ianames[i], i);
927	        for (i = 0; i < nbloba; ++i)
928	                printf ("#define %s_%s_ba %d\n", current_id, banames[i], i);

930	        for (i = 0; i < nstra; ++i)
931	                printf ("#define %s_%s_sa %d\n", current_id, sanames[i], i);
932	        next;
933	}'

935	Appendix C. Example Encoding and Decoding Code

937	Check http://www.cs.utk.edu/~moore/blob for the latest version.