idnits 2.17.1 draft-ietf-rescap-blob-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 6 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 34 has weird spacing: '...-01.txt in an...' == Line 784 has weird spacing: '...abel xx conte...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (1 March 2002) is 8085 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 826, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' ** Obsolete normative reference: RFC 1832 (ref. '4') (Obsoleted by RFC 4506) -- Possible downref: Non-RFC (?) normative reference: ref. '6' ** Obsolete normative reference: RFC 2234 (ref. '7') (Obsoleted by RFC 4234) Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Keith Moore 2 Internet-Draft University of Tennessee 3 Expires: 1 September 2002 1 March 2002 5 The Binary Low-Overhead Block Presentation Protocol 7 draft-ietf-rescap-blob-01.txt 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with all 12 provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering Task 15 Force (IETF), its areas, and its working groups. Note that other groups 16 may also distribute working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference material 21 or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html 29 This document is being submitted as a contribution to the IETF rescap 30 working group. Comments regarding this internet-draft should be sent to 31 the rescap mailing list at rescap@cs.utk.edu, or to the author at the 32 address listed below. Requests to subscribe to the rescap mailing list 33 should be sent to rescap-REQUEST@cs.utk.edu. Please include the 34 document identifier draft-ietf-rescap-blob-01.txt in any comments. 36 Known errata of this specification, as well as sample code, will be made 37 available at http://www.cs.utk.edu/~moore/blob/ 39 This Internet-Draft will expire on 1 September 2002. 41 ABSTRACT 43 This memo describes the Binary Low-Overhead Block (BLOB) protocol for 44 on-the-wire presentation of data in the context of higher-level 45 protocols. BLOB is designed to encode and decode data with low overhead 46 on most CPUs, to be reasonably space-efficient, and for its 47 representation to be sufficiently precise that it is suitable as a 48 canonical format for digital signatures. 50 1. Introduction 52 When designing applications-layer protocols there is sometimes a need to 53 have an efficient means of encoding protocol elements or protocol data 54 units. Existing solutions in this space may be deemed inadequate, for 55 various reasons. For example: 57 - ASN.1 [2] and BER [3] are baroque both in terms of the abstract 58 syntax and available on-the-wire representations, and complex to 59 implement. 61 - ONC XDR [4] requires a stub generator and support libraries which 62 are not easily available on all platforms, and there are subtle 63 differences between the APIs provided by different implementations. 64 XDR is large enough that it's not usually feasible to write your 65 own implementation, and it's difficult to write portable code that 66 can work with the various implementations that are deployed. Many 67 XDR implementations have significant unnecessary processing 68 overhead. This impairs performance of applications based on XDR 69 and gives the protocol itself a worse reputation than it otherwise 70 deserves. 72 - The design of MIME [5] was heavily influenced by the need to be 73 able to operate over existing text-based mail systems which imposed 74 a number of constraints. This worked out well for email, but for 75 other applications, MIME is neither efficient in terms of storage 76 density nor easy to parse. 78 - XML [6] is easier to parse than MIME, but still requires 79 significant processing overhead. There is also a large and growing 80 body of "culture" regarding how XML should be used, which 81 paradoxically imposes a significant barrier to use of XML. (To be 82 fair, MIME also has a fair amount of "culture" associated with it.) 83 Finally, for small and regular data structures XML imposes a lot of 84 overhead. 86 BLOB was designed to serve as an alternative to these presentation 87 layers for use in representing relatively simple structures, consisting 88 of a limited set of primitive data types, and where the structures can 89 reasonably be contained within a single protocol data unit. 91 BLOB is designed with the following considerations: 93 - It should be easy and efficient to generate the encoded form. 95 - The encoded form should require minimal processing to decode, 96 ideally being usable in-place (without allocating memory or 97 copying) on most platforms. 99 - It should be easy to write programs which manipulate and exchange 100 BLOBs, without needing significant external support in the form of 101 libraries or stub generators. 103 - The structure should be easy and efficient to verify for internal 104 consistency. 106 - For any structure to be represented there should be a unique 107 (canonical) on-the-wire encoding which is always used. 109 - It should be reasonably space-efficient. However, this is 110 secondary to minimizing processing overhead. 112 The BLOB approach is more feasible now than in years past because data 113 representations have become more uniform across different computing 114 platforms. Essentially all widely-used computers now support 32-bit 115 integers, can address 32-bit integers which are not aligned on any 116 larger boundary, use word sizes which are a multiple of 8 bits, and can 117 directly address strings of 8-bit characters which are not aligned on 118 any boundary larger than an octet. Such computers are termed "well- 119 behaved" with respect to BLOB. BLOB is designed to be usable on 120 machines which do not have these characteristics, but such machines will 121 necessarily incur more data conversion overhead. 123 1.1. Notation 125 The word BLOB in upper case letters is used to refer to the protocol; 126 that is, the algorithm used to define the encoding and decoding of data 127 structures defined in this memo. The word "blob" in lower case letters 128 refers to a data structure (sequence of octets) that has been produced 129 by, or can be decoded by, the BLOB protocol. 131 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 132 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 133 document, when spelled entirely in upper case letters, are to be 134 interpreted as described in [1]. 136 2. BLOB Overview 138 A "blob" is a linear (octet-stream) encoding of some data structure, 139 which is used as a protocol data unit within some application. The 140 structure encoded by a blob is a collection of "components". Each of 141 the components of a blob is either a "scalar" (meaning that the 142 component consists of exactly one instance of that data type) or an 143 "array" (meaning that the component consists of a sequence of zero or 144 more "elements" of a uniform data type). 146 The data types which can appear as components of a blob are: unsigned 147 integer (32 bits in length), string (a variable-length sequence of 148 octets with arbitrary values), or blob. Any of these types can occur as 149 a scalar or in an array. 151 Since one blob can contain other blobs, complex nesting of structures is 152 possible. However the blob encoder and decoder treat "embedded" blobs 153 (blobs which occur as components of an outer blob) as opaque structures. 154 For example, embedded blobs are not automatically decoded along with 155 outer blobs, and a formatting error in an embedded blob does not create 156 a formatting error for any blob that contains it. 158 "Variable-length" here means that the lengths of arrays need not be pre- 159 determined by the protocol using BLOB. The maximum lengths of strings 160 and arrays are constrained by the use of a 32-bit unsigned integer for 161 the length of the blob, and the representation of offsets of data 162 relative to the start of the blob as 32-bit unsigned integers. Lengths 163 may be further constrained by the higher-level protocol's choice of 164 transmission medium - for instance, if the blob must fit into a UDP 165 datagram. The number of array elements is limited to 255 arrays of each 166 data type, but this should be adequate for most data structures needed 167 in network protocols. 169 2.1 Use of Data Types Not Supported by BLOB 171 The primitive types (unsigned 32-bit integer and octet string) were 172 chosen because they represent the majority of data types used in network 173 protocols, they are directly supported by most computer hardware, and 174 because data types outside of this set are often specific to the higher- 175 level protocol anyway. Having a small set of data types allows BLOB to 176 be a compact yet self-describing encoding, which is efficient to decode 177 and which does not require separate marshaling routines for each 178 protocol data unit used by an application. A few additional types (in 179 particular, single- and double-precision floating point) are being 180 considered for future versions of BLOB. The BLOB protocol is intended 181 to allow new primitive types to be added without changing the format of 182 blobs that do not include these types. 184 When a higher-level protocol needs to use a data type that is not 185 directly supported by BLOB, such data must be represented in terms of 186 the available types. The higher-level protocol specification must define 187 the representation of such data in terms of types supported by BLOB, and 188 the conversion between the blob representation and the native format 189 must be explicitly managed by the applications. For instance: 191 - A signed 32-bit integer may be transmitted as an unsigned 32-bit 192 integer by encoding the signed integer in twos-complement format. 193 On most modern machines no conversion will be necessary; however on 194 machines for which the smallest integer representation is larger 195 than 32 bits it will be necessary for the application to sign- 196 extend the result. 198 - A 64-bit integer may be transmitted as two consecutive 32-bit 199 integers (with the most significant word first), which would 200 require that the receiving application arrange those two integers 201 according to its native byte ordering. Alternatively a 64-bit 202 integer may be transmitted as eight consecutive octets within a 203 string (most significant byte first), which would require that the 204 receiving application re-arrange those octets according to its 205 local byte ordering. 207 - A multi-dimensional array may be represented as a single- 208 dimensional array with the dimensions of the array passed as 209 separate integer components. 211 - In the current version of BLOB, floating point numbers may be 212 encoded in IEEE format and transmitted as either integers (modulo 213 sign-extension issues) or strings (modulo alignment issues). 214 Future versions of BLOB may support floating point numbers 215 directly. 217 - A small dense set may be represented as bits within a scalar 218 integer. A larger dense set may be encoded using individual bits 219 of the elements of an integer array. 221 3. BLOB Organization 223 At the most basic level, the blob consists of an integer portion 224 followed by an opaque portion. The integer portion is a sequence of 225 unsigned 32-bit (4-octet) quantities, represented on-the-wire in network 226 byte ("big-endian") order. The opaque portion is a sequence of 8-bit 227 (1-octet) quantities. 229 The blob is separated into opaque and integer portions in order to 230 facilitate efficient decoding on little-endian machines, or on any 231 machine with a word size other than 32 bits. Having all of the integers 232 within a blob co-located in a contiguous area allows an implementation 233 to efficiently convert all of the integers to local format at the same 234 time. Strings of octets are assumed to have the same representation on 235 all platforms, so conversion is unlikely to be needed for the opaque 236 portion. 238 The integer portion of a blob is further divided into a header, a list 239 of array bases, and an integer pool. The header is used to store 240 various data needed to decode the blob and check it for consistency. 241 The array bases portion contains the offsets (positions relative to the 242 start of the blob) of the each of the arrays in the blob (including the 243 arrays used to store scalar components). The integer pool is used for 244 storing integer data as well as the offsets of embedded blobs and 245 strings. 247 The opaque portion is divided into a blob pool and a string pool. The 248 blob pool is used to store embedded blobs; the string pool is used to 249 store strings. The blob pool occurs immediately following the integer 250 pool in order to ensure that embedded blobs are always aligned on a 251 four-octet boundary (relative to the start of the blob). 253 Each embedded blob is padded with 0-3 zero octets until its length is an 254 exact multiple of 4 octets. This ensures that all embedded blobs are 255 aligned to 4-octet boundaries, allowing the blob decoder to assume (if 256 the outer blob is on an aligned boundary) that each of the embedded 257 blobs is also aligned. 259 Each string is padded with a single octet with a value of zero, which is 260 not part of the string. This is for convenience when strings are used 261 to store character data, with programming languages that use a zero- 262 valued octet as a string terminator. 264 Embedded blobs are opaque to their enclosing blob and are NOT 265 automatically parsed or decoded when the outer blob is decoded. If the 266 receiving application wishes to examine contents of an inner blob, it 267 must decode it separately from the enclosing blob. 269 A blob can have both scalar and array components. For simplicity in 270 decoding and to eliminate some edge cases, all of the scalar integers of 271 a blob are stored in a "scalar integer array" which immediately follows 272 the last integer array component of the blob. Similarly, all of the 273 scalar (embedded) blob parameters) are stored in a "scalar blob array" 274 which immediately follows the last blob array component, and all of the 275 scalar string parameters are stored in a "scalar string array" which 276 follows the last string array component. 278 3.1 Representation of data types 280 In general, all components of a blob are elements of an array. A 281 distinguished array of each type is used to store scalar components of 282 that type. The base of any array (whether it is a numbered array 283 component or an array used to hold scalar components) can be determined 284 by decoding the array_counts_and_flags field of the blob header. 286 Since strings (and blobs) can be of varying length, an array of strings 287 (or blobs) is represented internally by an array of integers. Each of 288 these integers indicates the storage location (within the blob) of the 289 contents of the string or blob. These integers are consecutive; the 290 offset of element 2 of an array immediately follows the offset of 291 element 1. Similarly, the array elements occupy consecutive storage - 292 the storage occupied by string 3 of an array immediately follows that 293 occupied by string 2. This allows the size of array N to be computed by 294 subtracting its offset from that of the following array; this works for 295 any numbered array. It also allows the length of element M to be 296 computed by subtracting its offset from that of the following element; 297 this works for elements (within bounds) of numbered arrays. The last 298 scalar blob or string is a boundary case; these require an explicit test 299 to correctly determine their length. 301 The individual components of a blob are encoded as follows: 303 3.1.1 integers and integer arrays 305 An unsigned integer is represented as a 32-bit quantity in big-endian 306 format. All integer components appear in the integer_pool section of a 307 blob. 309 An integer array is represented as zero or more contiguous 32-bit 310 integers, that are stored within the integer_pool section of the blob. 311 The location (or "base") of the array relative to the start of the blob 312 is stored as a 32-bit integer offset. The base of this array is stored 313 in the array_bases portion of the blob. 315 Scalar integer components a blob are encoded in a scalar integer array. 316 The storage for the elements of this array is in the integer pool, and 317 immediately follows the storage used by the last numbered integer array. 318 The offset of the scalar integer array appears in the array_bases 319 portion of the blob. 321 3.1.2 (embedded) blobs and blob arrays 323 An embedded blob component is represented as a series of octets which is 324 an integral multiple of four octets long. The storage for embedded 325 blobs is taken from the blob pool of the enclosing blob. An integer 326 offset (relative to the beginning of the blob) indicates the starting 327 location of the embedded blob. For scalar embedded blob components 328 these offsets are encoded in a scalar blob array. This array (of blob 329 offsets) is stored in the integer pool and immediately follows the 330 offsets of the numbered blob arrays. 332 A blob array is represented as an integer base (stored in array_bases) 333 which points to an array of integers (stored in the integer pool), each 334 element of which is the offset of a blob (within the blob pool). 336 Each embedded blob (within the blob pool) is followed by from 0-3 octets 337 with the value zero, so that any subsequent blob will be aligned on a 338 four-octet boundary. These padding octets are not considered part of 339 the blob; however, the length of the inner blob (as seen from the 340 enclosing blob) will include any padding. 342 3.1.3 strings and string arrays 344 A string is represented as a sequence of octets; these octets may have 345 arbitrary values. The contets of strings are stored in the string_pool. 346 An integer offset (stored in integer_pool) indicates the location of the 347 contents of the string. 349 A string array is represented as an integer base (stored in array_bases) 350 which points to an array of integers (stored in the integer pool), each 351 element of which indicates the offset of a string (stored in string 352 pool). 354 Each string is followed in the string_pool by a zero octet which is not 355 part of the string. Thus the length of any string (other than the last 356 scalar string component) can be calculated by subtracting its offset 357 from the offset of the subsequent string, minus 1. 359 Strings can be of zero length, in which case the corresponding offset 360 points to a zero octet which is immediately followed by the next string 361 in the string_pool. 363 3.2 Structure of a blob 365 The structure of a blob is as follows: 367 octet offset name 369 0 +--------------------------------+ \ 370 | blob_length | | 371 4 +--------------------------------+ | 372 | integer_pool_offset | | 373 8 +--------------------------------+ | 374 | blob_pool_offset | | 375 12 +--------------------------------+ | 376 | string_pool_offset | | 377 16 +--------------------------------+ | 378 | array_count_and_flags | | 379 20 +--------------------------------+ + integer portion 380 : : | 381 : array_bases : | 382 : : | 383 integer_pool_offset +--------------------------------+ | 384 : : | 385 : integer_pool : | 386 : : / 387 blob_pool_offset +--------------------------------+ \ 388 : : | 389 : blob_pool : | 390 : : | 391 string_pool_offset +--------------------------------+ + opaque portion 392 : : | 393 : string_pool : | 394 : : | 395 blob_length +--------------------------------+ / 397 For this version of the BLOB protocol, the integer portion begins at 398 offset 0 and is blob_pool_offset octets in length. The opaque portion 399 begins at blob_pool_offset and is (blob_length - blob_pool_offset) 400 octets in length. 402 Future versions of the BLOB protocol may add additional pools for other 403 data types, and therefore may change these formulas. BLOB decoder 404 implementations MUST therefore decode 'array_count_and_flags' (see 405 below) and verify that the flags portion of this field is equal to zero, 406 before translating the remainder of the integer portion to the format 407 used by the local machine. 409 The following paragraphs describe the fields within a blob: 411 blob_length 412 The blob_length is the length of the entire blob in octets. The 413 length includes the space occupied by blob_length. blob_length 414 does not include any padding which is added to make an embedded 415 blob a multiple of four octets long. 417 integer_pool_offset 418 The integer_pool_offset is the octet offset (relative to the start 419 of the blob) of the integer_pool field of the blob. 420 integer_pool_offset MUST be a multiple of four, greater than or 421 equal to 24, and less than or equal to blob_pool_offset. If the 422 length of integer_pool is zero, integer_pool_offset will be equal 423 to blob_pool_offset. 425 blob_pool_offset 426 The blob_pool_offset is the offset (relative to the start of the 427 blob) of the blob_pool field of the blob. blob_pool_offset MUST be 428 a multiple of four, greater than or equal to integer_pool_offset, 429 and less than or equal to string_pool_offset. If the length of the 430 blob_pool is zero, blob_pool_offset will be equal to 431 string_pool_offset. 433 string_pool_offset 434 The string_pool_offset is the offset (relative to the start of the 435 blob) of the string_pool portion of the blob. It MUST be a 436 multiple of four, greater than or equal to blob_pool_offset, and 437 less than or equal to blob_length. If the length of the 438 string_pool is zero, string_pool_offset will be equal to 439 blob_length. 441 array_counts_and_flags 442 The array_counts_and_flags field indicates how many of each kind of 443 array element are contained within the blob. This field is 444 calculated as follows: 446 array_counts_and_flags = (num_int_arrays) + 447 (num_blob_arrays << 8) + 448 (num_string_arrays << 16) + 449 (flags << 24) 451 where num_xxx_args is the number of array arguments of type xxx. 453 The "flags" portion of this field is used to indicate extensions to 454 this format. Blobs that do not use these extensions will have a 455 flags field of zero. For this version of the BLOB protocol, the 456 flags field MUST be zero. 458 array_basess 459 The array_bases field contains the bases (offsets relative to the 460 start of the blob) of each of the arrays in the blob, including 461 those arrays which contain the scalar components of the blob (using 462 separate arrays for scalar integer, struct, and string components). 463 Specifically the array_bases field contains, in order: 465 1. The base of each integer array. There are num_int_arrays 466 (possibly zero) of these. 468 2. The base of the scalar integer array. This base is always 469 present, even if there are no scalar integer components. If 470 there are no scalar integer components of the blob, the scalar 471 integer array base will be the same as the base of blob array 472 0. (If there are no blob arrays in the blob, the base of the 473 scalar integer array will be the same as the base of the 474 scalar blob array.) 476 3. The base of each blob array. There are num_blob_arrays 477 (possibly zero) of these. 479 4. The base of the scalar blob array. This base is always 480 present. If there are no embedded scalar blob components in 481 the blob, the scalar blob array base will have the same value 482 as the base of string array 0. (If there are no string arrays 483 in this blob, this offset will be the same as the base of the 484 scalar string array.) 486 5. The base of each string array. There are num_string_arrays 487 (possibly zero) of these. 489 6. The base of the scalar string array. If there are no scalar 490 string components of the blob, the base of the scalar string 491 array will be equal to blob_length. 493 7. Any additional bases of arrays, or offsets of scalar 494 components, which might be defined by future versions of this 495 protocol. The presence of additional data types not supported 496 in this version of the BLOB protocol will be indicated by a 497 nonzero value in the flags portion of the 498 array_counts_and_flags field. 500 integer_pool 501 The integer_pool contains 32-bit integers, assumed to be unsigned. 502 These may be either scalar integer, elements of integer arrays, 503 offsets of scalar blobs or strings, or bases of blob or string 504 arrays The integers within the integer_pool MUST appear in the 505 following order: 507 1. The elements of integer arrays. The integer array components 508 appear in order, and within each array, the elements appear in 509 order. The arrays and their elements are numbered from zero. 510 Thus the 0th element of the 1st integer array immediately 511 follows the last element of the 0th integer array. 513 2. The elements of the scalar integer array. Thus integer scalar 514 component 0 immediately follows the last element of the last 515 integer array; followed by integer scalar component 1, etc. 516 (If there are no integer arrays, the offset of integer scalar 517 0 is integer_pool). 519 3. The offsets of elements of blob arrays. Each blob offset MUST 520 be an integral multiple of four, and each blob offset MUST 521 point into the blob_pool. The offset of the element 0 of blob 522 array 0 MUST be equal to blob_pool_offset. Each subsequent 523 element of a blob array MUST have an offset equal to the 524 offset of the preceding blob plus the declared length of the 525 preceding blob (after padding). 527 NOTE: The data within an embedded blob is considered opaque to 528 the enclosing blob; the only reason for separating blobs from 529 strings is to ensure padding of blobs to 4-octet boundaries. 530 Blob encoders SHOULD NOT insist that the length field of an 531 embedded blob is consistent with the length declared for that 532 blob, and blob decoders SHOULD NOT check the length fields of 533 embedded blobs when decoding the enclosing blob. 535 4. The offsets of elements of the scalar blob array. Each blob 536 offset MUST be a integral multiple of four, and MUST point 537 into the blob_pool. The offset of scalar blob component 0 MUST 538 immediately follow the last element of the last blob array. 539 (If there are no blob arrays, the offset of scalar blob 540 component 0 is blob_pool). Each subsequent scalar blob 541 component MUST have an offset equal to the offset of the 542 preceding blob plus the length of the preceding blob (after 543 padding). 545 5. The offsets of elements of string arrays. These offsets MUST 546 point into the string_pool. Element 0 of string array 0 MUST 547 have an offset equal to string_pool_offset, and each 548 subsequent string MUST have an offset equal to the preceding 549 string's offset, plus the length of the preceding string, plus 550 1 (for the trailing zero octet). 552 6. The offsets of elements of the scalar string array. These 553 offsets MUST point into the string_pool. The scalar string 554 component 0 MUST have an offset equal to the offset of the 555 preceding string, plus the length of the preceding string, 556 plus 1 (for the trailing zero octet). (If there are no string 557 arrays, the offset of scalar string 0 is string_pool). 559 blob_pool 560 The blob_pool contains structures which are encoded in blob format. 561 These structures may be scalar blob components of the outer blob, 562 or elements of scalar blob arrays of the outer blob. The contents 563 of blob_pool appear in the following order: 565 1. The contents of each element of each blob array. Element 0 of 566 blob array 0 appears first, followed by element 1 of blob 567 array 0, etc. 569 2. The contents of each element of the scalar blob array, used to 570 store scalar (embedded) blob components of the outer blob. 572 Each blob in the blob pool MUST be padded with from zero to three 573 octets, each with a value of zero, so that the length of each blob 574 is an exact multiple of four octets. 576 string_pool 577 The string_pool contains unaligned strings of arbitrary octets. 578 These strings may be used for character data or for any other data 579 which can be represented as a string of octets. BLOB makes no 580 assumptions regarding the format of data (character encoding 581 scheme, etc.) that is stored in strings. 583 The contents of the string_pool appear in the following order: 585 1. The contents of each element of each string array of the blob. 587 2. The contents of each element of the scalar string array. 589 For compatibility with programming languages which terminate 590 strings with a zero octet, a zero octet is automatically appended 591 to each string in the string_pool. This zero octet is not part of 592 the string. Since zero octets MAY appear within BLOB strings, the 593 zero octet that is appended to each string MUST NOT be used as a 594 string terminator except when the higher-level protocol has 595 specified that they may be used in this way. 597 4. Use of blobs by higher-level protocols 599 Higher-level protocols using BLOB as an encoding mechanism need to 600 define their protocol data units in terms of blobs. Since BLOB groups 601 all similarly-typed data together within the blob (for ease of 602 conversion), and since BLOB rigidly defines the order in which data must 603 appear, applications generally cannot refer to protocol elements within 604 a blob by a fixed offset. Instead, the application code references 605 protocol elements in terms of "the second scalar string component", "the 606 third scalar integer component" or "the second element of the fourth 607 integer array component". Macros or functions which allow these 608 elements to be accessed from a decoded blob structure are easily 609 constructed. 611 It is possible to design a simple specification language which allows 612 the elements of a blob to be specified in the order that makes the most 613 sense to an application, and which produces a list of macros which map 614 from protocol data element names to routines which can access those data 615 elements. This hides the details of BLOB's reordering from the 616 application without significantly impairing efficiency. An example of 617 such a language is given in Appendix B. 619 If higher-level protocols employ data types other than the BLOB 620 primitive data types, they must define how the application-specific data 621 types are represented as one or more BLOB primitive types, and 622 implementations of the protocol will be responsible for conversion. 623 Applications which require a canonical form (say for signing) should 624 specify the conversion from application data types to BLOB types so that 625 there is exactly one possible representation of each application data 626 type within BLOB. 628 Since each blob is self-contained with its own header, embedded blobs 629 add a bit of overhead. Protocol designers should avoid unnecessary 630 nesting of structures. For instance, what is conceptually an array of 631 structures to an application might be better represented within BLOB as 632 several parallel arrays. However, nesting of blobs is useful when it is 633 desired that an inner blob be opaque to the layer of a protocol that 634 decodes the outer blob. 636 4.1. Encoding Issues 638 Most blobs will contain at least one variable-length data structure. 639 This implies that the offsets of the components within the blob will not 640 be known in advance, and a program that encodes a blob will usually be 641 unable to generate the elements of a blob in-place. The encoder routine 642 will generally need to copy the elements of a blob from their various 643 locations into a contiguous area of memory, in the order prescribed by 644 the BLOB specification. 646 4.2. Decoding Issues 648 On "well-behaved" machines it should be possible to use blobs in-place 649 after converting the integer portion of the blob to the local byte 650 order. The protocol elements within the blob can then be accessed with 651 macros. 653 It is necessary to check the blob for consistency before using it. In 654 particular: 656 - The blob_length must be consistent with the length of the PDU or 657 buffer in which the blob was received. (For instance, it must not 658 be less than the length of data received). 660 - The blob_length must be at least 32 (which would be the length of 661 an empty blob with no arguments). 663 - The 'flags' portion of array_counts_and_flags MUST be zero. 665 - The integer_pool_offset must be equal to the the number of 666 arguments (decoded from array_counts_and_flags) multiplied by 4, 667 plus 20. 669 - The blob_pool_offset must be greater than or equal to 670 integer_pool_offset. 672 - The string_pool_offset must be greater than or equal to 673 blob_pool_offset. 675 - The string_pool_offset must be less than or equal to blob_length. 677 - The base of each integer array and each blob array must be an 678 integral multiple of 4. 680 - The base of the first integer array (if any) must be equal to 681 integer_pool_offset. 683 - Each subsequent integer array base must be greater than or equal to 684 the previous integer array base, and less than or equal to 685 blob_pool_offset. 687 - The offset of element 0 of the first blob array (if any) must be 688 equal to blob_pool_offset. 690 - Each subsequent blob offset must be greater than the previous blob 691 offset. 693 - The last blob offset must be less than string_pool_offset. 695 - The first string component must have an offset equal to 696 string_pool. 698 - The offset of each subsequent string must be greater than the 699 offset of the first element of the previous string. 701 - Except for the first string, there must be a zero octet preceding 702 each offset of each string component or string array element. 704 - The last octet in the string_pool must be a zero. 706 4.3 Encoding and decoding code 708 A free software sample blob encoder and decoder have been written and 709 will be made available at the location listed in Appendix C. 711 5. Security Considerations 713 It is believed that the BLOB encoding is unique and can serve as a 714 useful 'canonical form' for a data structure. However, if higher-level 715 protocols encode non-native data types as BLOB primitive types, they 716 must also define a unique representation for each quantity to be stored 717 in that data-type. 719 In order to prevent possible attacks by transmission of blobs containing 720 bogus offsets, it is essential to perform the bounds checks listed in 721 section 4.2 while decoding blobs. While such attacks could not easily 722 overwrite memory with data chosen by an attacker, they could cause a 723 server to malfunction. 725 6. Author's Address 727 Keith Moore 728 University of Tennessee 729 1122 Volunteer Blvd, Suite 203 730 Knoxville TN 37996-3450 731 email: moore@cs.utk.edu 733 7. References 735 [1]. Bradner, S. "Key words for use in RFCs to Indicate Requirement 736 Levels", RFC 2119, March 1997. 738 [2] "Information technology - Abstract Notation One (ASN.1): 739 Specification of basic notation" ITU-T recommendation X.680, 740 December 1997. Available from http://www.itu.int/ITU- 741 T/studygroups/com17/languages/. 743 [3] "Information technology - ASN.1 encoding rules: Specification of 744 Basic Encoding Rules (BER) Canonical Encoding Rules (CER) and 745 Distinguished Encoding Rules (DER)" ITU-T recommendation X.690, 746 December 1997. Available from http://www.itu.int/ITU- 747 T/studygroups/com17/languages/. 749 [4] Srinivasan, R., "XDR: External Data Representation Standard", RFC 750 1832, August 1995. 752 [5] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions 753 (MIME) Part One: Format of Internet Message Bodies", RFC 2045, 754 November 1996. 756 [6] "Extensible Markup Language (XML) 1.0 (Second Edition)", W3C 757 Recommendation, October 2000, 758 . 760 [7] Crocker, D. (ed.), Overell, P. "Augmented BNF for Syntax 761 Specifications: ABNF.". RFC 2234, November 1997. 763 Appendix A. ASCII-Art Picture of a blob 765 This diagram attempts to illustrate the ordering of the various elements 766 of a blob and the relationship of the offsets to the elements to which 767 they point. 769 The following is a dump, in an assembler-like notation, of a blob which 770 encodes: 772 2 scalar integers with values 10, 20 (decimal) 773 1 integer array, with elements { 1 2 3 4 } 774 0 scalar blobs 775 0 blob arrays 776 1 scalar string with the value "string" 777 2 string arrays, with elements { "a" "b" } and { "cc" "dd" "ee" }. 779 "label" denotes the name assigned to a particular offset; "xx" gives the 780 offset in hexadecimal; "contents" gives the value of the octet or octets 781 which appear at that offset; and "description" gives a description of 782 the value that appears in that location. 784 label xx contents description 785 ------------------------:--:---------:------------------------ 786 :00: 00000070: blob_length 787 :04: 0000002c: integer_pool 788 :08: 0000005c: blob_pool 789 :0c: 0000005c: string_pool 790 :10: 00020002: array_count_and_flags 791 :14: 0000002c: int_array_base_0 792 :18: 0000003c: scalar_int_array_base 793 :1c: 00000044: scalar_blob_array_base 794 :20: 00000044: string_array_base_0 795 :24: 0000004c: string_array_base_1 796 :28: 00000058: scalar_string_array_base 797 integer_pool: 798 int_array_base_0:2c: 00000001: 799 :30: 00000002: 800 :34: 00000003: 801 :38: 00000004: 802 scalar_int_array_base:3c: 0000000a: (10 decimal) 803 :40: 00000014: (20 decimal) 804 scalar_blob_array_base: 805 string_array_base_0:44: 0000005c: ptr_to_str[0,0] 806 :48: 0000005e: ptr_to_str[0,1] 807 string_array_base_1:4c: 00000060: ptr_to_str[1,0] 808 :50: 00000063: ptr_to_str[1,1] 809 :54: 00000066: ptr_to_str[1,2] 810 scalar_string_array_base:58: 00000069: ptr_to_scalar_str[0] 811 blob_pool: 812 string_pool: 813 ptr_to_str[0,0]:5c: 61: 'a' 814 :5d: 00: 815 ptr_to_str[0,1]:5e: 62: 'b' 816 :5f: 00: 817 ptr_to_str[0,0]:60: 63: 'c' 818 :61: 63: 'c' 819 :62: 00: 820 ptr_to_str[0,0]:63: 64: 'd' 821 :64: 64: 'd' 822 :65: 00: 823 ptr_to_str[0,0]:66: 65: 'e' 824 :67: 65: 'e' 825 :68: 00: 826 ptr_to_scalar_str[0]:69: 73: 's' 827 :6a: 74: 't' 828 :6b: 72: 'r' 829 :6c: 69: 'i' 830 :6d: 6e: 'n' 831 :6e: 67: 'g' 832 :6f: 00: 833 blob_length:70: 835 Appendix B. Example Abstract Syntax 837 This syntax used to describe BLOB structures is described below using 838 the ABNF syntax from [7]: 840 file = *(block / comment-line) 842 block = "BEGIN" 1*space id [ 1*space comment ] CRLF 843 *element 844 END [ comment ] CRLF 846 element = "int" 1*space identifier [ comment ] CRLF / 847 "string" 1*space identifier [ comment ] CRLF / 848 "int<>" 1*space identifier [ comment ] CRLF / 849 "string<>" 1*space identifier [ comment ] CRLF / 850 "struct" 1*space identifier [ comment ] CRLF 851 "struct<>" 1*space identifier [ comment ] CRLF 853 comment = *space "#" *char 855 comment-line = comment CRLF 857 id = letter *(letter / digit / "_") 859 letter = "A".."Z" # includes lower case also 861 digit = "0".."9" 863 space = %20 / %09 865 char = %01..%09 / %0B / %0C / %0E..%FF 867 CRLF = 0*1%0D 0*1%0A 869 Here is a simple awk program to interpret this syntax and produce a list 870 of C #define macros. The macros are of the form 872 #define structname_element_type number 874 where 'structname' is the name of the structure, 'element' is the name 875 of the element, and 'type' is a suffix indicating the type of the 876 element (i = int, b = blob, s = string, ia = integer array, ba = blob 877 array, sa = string array) for ease in visual type checking. 879 This program is quite simplistic and performs no error checking. 881 #!/bin/sh 882 # the sed line deletes comments 883 sed -e 's/[ ]*#.*//' | awk ' 884 $1 == "BEGIN" { 885 current_id = $2; 886 nint = nblob = nstr = ninta = nbloba = nstra = 0; 887 } 888 $1 == "int" { 889 inames[nint] = $2; 890 nint++; 891 next; 892 } 893 $1 == "string" { 894 snames[nstr] = $2; 895 nstr++; 896 next; 897 } 898 $1 == "struct" { 899 bnames[nblob] = $2; 900 nblob++; 901 next; 902 } 903 $1 == "int<>" { 904 ianames[ninta] = $2; 905 ninta++; 906 next; 907 } 908 $1 == "string<>" { 909 sanames[nstra] = $2; 910 nstra++; 911 next; 912 } 913 $1 == "struct<>" { 914 banames[nbloba] = $2; 915 nbloba++; 916 next; 917 } 918 $1 == "END" { 919 for (i = 0; i < nint; ++i) 920 printf ("#define %s_%s_i %d\n", current_id, inames[i], i); 921 for (i = 0; i < nblob; ++i) 922 printf ("#define %s_%s_b %d\n", current_id, bnames[i], i); 923 for (i = 0; i < nstr; ++i) 924 printf ("#define %s_%s_s %d\n", current_id, snames[i], i); 925 for (i = 0; i < ninta; ++i) 926 printf ("#define %s_%s_ia %d\n", current_id, ianames[i], i); 927 for (i = 0; i < nbloba; ++i) 928 printf ("#define %s_%s_ba %d\n", current_id, banames[i], i); 930 for (i = 0; i < nstra; ++i) 931 printf ("#define %s_%s_sa %d\n", current_id, sanames[i], i); 932 next; 933 }' 935 Appendix C. Example Encoding and Decoding Code 937 Check http://www.cs.utk.edu/~moore/blob for the latest version.