idnits 2.17.1 draft-ietf-cbor-array-tags-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 20, 2019) is 1771 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '2' on line 369 -- Looks like a reference, but probably isn't: '3' on line 261 ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Bormann, Ed. 3 Internet-Draft Universitaet Bremen TZI 4 Intended status: Informational June 20, 2019 5 Expires: December 22, 2019 7 Concise Binary Object Representation (CBOR) Tags for Typed Arrays 8 draft-ietf-cbor-array-tags-05 10 Abstract 12 The Concise Binary Object Representation (CBOR, RFC 7049) is a data 13 format whose design goals include the possibility of extremely small 14 code size, fairly small message size, and extensibility without the 15 need for version negotiation. 17 The present document makes use of this extensibility to define a 18 number of CBOR tags for typed arrays of numeric data, as well as two 19 additional tags for multi-dimensional and homogeneous arrays. It is 20 intended as the reference document for the IANA registration of the 21 CBOR tags defined. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at https://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on December 22, 2019. 40 Copyright Notice 42 Copyright (c) 2019 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (https://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 59 2. Typed Arrays . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2.1. Types of numbers . . . . . . . . . . . . . . . . . . . . 3 61 3. Additional Array Tags . . . . . . . . . . . . . . . . . . . . 5 62 3.1. Multi-dimensional Array . . . . . . . . . . . . . . . . . 5 63 3.1.1. Row-major Order . . . . . . . . . . . . . . . . . . . 6 64 3.1.2. Column-Major order . . . . . . . . . . . . . . . . . 7 65 3.2. Homogeneous Array . . . . . . . . . . . . . . . . . . . . 8 66 4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 9 67 5. CDDL typenames . . . . . . . . . . . . . . . . . . . . . . . 10 68 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 69 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 70 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 71 8.1. Normative References . . . . . . . . . . . . . . . . . . 14 72 8.2. Informative References . . . . . . . . . . . . . . . . . 14 73 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 14 74 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 15 75 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 77 1. Introduction 79 The Concise Binary Object Representation (CBOR, [RFC7049]) provides 80 for the interchange of structured data without a requirement for a 81 pre-agreed schema. RFC 7049 defines a basic set of data types, as 82 well as a tagging mechanism that enables extending the set of data 83 types supported via an IANA registry. 85 Recently, a simple form of typed arrays of numeric data have received 86 interest both in the Web graphics community [TypedArray] and in the 87 JavaScript specification [TypedArrayES6], as well as in corresponding 88 implementations [ArrayBuffer]. 90 Since these typed arrays may carry significant amounts of data, there 91 is interest in interchanging them in CBOR without the need of lengthy 92 conversion of each number in the array. This also can save space 93 overhead with encoding a type for each element of an array. 95 This document defines a number of interrelated CBOR tags that cover 96 these typed arrays, as well as two additional tags for multi- 97 dimensional and homogeneous arrays. It is intended as the reference 98 document for the IANA registration of the tags defined. 100 Note that an application that generates CBOR with these tags has 101 considerable freedom in choosing variants, e.g., with respect to 102 endianness, embedded type (signed vs. unsigned), and number of bits 103 per element, or whether a tag defined in this specification is used 104 at all instead of more basic CBOR. In contrast to representation 105 variants of single CBOR numbers, there is no representation that 106 could be identified as "preferred". If deterministic encoding is 107 desired in a CBOR-based protocol making use of these tags, the 108 protocol has to define which of the encoding variants are used in 109 which case. 111 1.1. Terminology 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 115 "OPTIONAL" in this document are to be interpreted as described in 116 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 117 capitals, as shown here. 119 The term "byte" is used in its now customary sense as a synonym for 120 "octet". Where bit arithmetic is explained, this document uses the 121 notation familiar from the programming language C (including C++14's 122 0bnnn binary literals), except that the operator "**" stands for 123 exponentiation. 125 The term "array" is used in a general sense in this document, unless 126 further specified. The term "classical CBOR array" describes an 127 array represented with CBOR major type 4. A "homogeneous array" is 128 an array of elements that are all of the same type (the term is 129 neutral whether that is a representation type or an application data 130 model type). 132 2. Typed Arrays 134 Typed arrays are homogeneous arrays of numbers, all of which are 135 encoded in a single form of binary representation. The concatenation 136 of these representations is encoded as a single CBOR byte string 137 (major type 2), enclosed by a single tag indicating the type and 138 encoding of all the numbers represented in the byte string. 140 2.1. Types of numbers 142 Three classes of numbers are of interest: unsigned integers (uint), 143 signed integers (two's complement, sint), and IEEE 754 binary 144 floating point numbers (which are always signed). For each of these 145 classes, there are multiple representation lengths in active use: 147 +-----------+--------+--------+-----------+ 148 | Length ll | uint | sint | float | 149 +-----------+--------+--------+-----------+ 150 | 0 | uint8 | sint8 | binary16 | 151 | 1 | uint16 | sint16 | binary32 | 152 | 2 | uint32 | sint32 | binary64 | 153 | 3 | uint64 | sint64 | binary128 | 154 +-----------+--------+--------+-----------+ 156 Table 1: Length values 158 Here, sintN stands for a signed integer of exactly N bits (for 159 instance, sint16), and uintN stands for an unsigned integer of 160 exactly N bits (for instance, uint32). The name binaryN stands for 161 the number form of the same name defined in IEEE 754 [IEEE754]. 163 Since one objective of these tags is to be able to directly ship the 164 ArrayBuffers underlying the Typed Arrays without re-encoding them, 165 and these may be either in big endian (network byte order) or in 166 little endian form, we need to define tags for both variants. 168 In total, this leads to 24 variants. In the tag, we need to express 169 the choice between integer and floating point, the signedness (for 170 integers), the endianness, and one of the four length values. 172 In order to simplify implementation, a range of tags is being 173 allocated that allows retrieving all this information from the bits 174 of the tag: Tag values from 64 to 87. 176 The value is split up into 5 bit fields: 0b010_f_s_e_ll, as detailed 177 in Table 2. 179 +-------+-------------------------------------------------------+ 180 | Field | Use | 181 +-------+-------------------------------------------------------+ 182 | 0b010 | the constant bits 0, 1, 0 | 183 | f | 0 for integer, 1 for float | 184 | s | 0 for unsigned integer or float, 1 for signed integer | 185 | e | 0 for big endian, 1 for little endian | 186 | ll | A number for the length (Table 1). | 187 +-------+-------------------------------------------------------+ 189 Table 2: Bit fields in the low 8 bits of the tag 191 The number of bytes in each array element can then be calculated by 192 "2**(f + ll)" (or "1 << (f + ll)" in a typical programming language). 193 (Notice that 0f and ll are the two least significant bits, 194 respectively, of each nibble (4bit) in the byte.) 196 In the CBOR representation, the total number of elements in the array 197 is not expressed explicitly, but implied from the length of the byte 198 string and the length of each representation. It can be computed 199 inversely to the previous formula from the length of the byte string 200 in bytes: "bytelength >> (f + ll)". 202 For the uint8/sint8 values, the endianness is redundant. Only the 203 tag for the big endian variant is used and assigned as such. The Tag 204 that would signify the little endian variant of sint8 MUST NOT be 205 used, its tag number is marked as reserved. As a special case, the 206 Tag that would signify the little endian variant of uint8 is instead 207 assigned to signify that the numbers in the array are using clamped 208 conversion from integers, as described in more detail in 209 Section 7.1.11 ("ToUint8Clamp") of the ES6 JavaScript specification 210 [TypedArrayES6]; the assumption here is that a program-internal 211 representation of this array after decoding would be marked this way 212 for further processing, providing "roundtripping" of JavaScript typed 213 arrays through CBOR. 215 IEEE 754 binary floating numbers are always signed. Therefore, for 216 the float variants ("f" == 1), there is no need to distinguish 217 between signed and unsigned variants; the "s" bit is always zero. 218 The Tag numbers where "s" would be one (which would have Tag values 219 88 to 95) remain free to use by other specifications. 221 3. Additional Array Tags 223 This specification defines three additional array tags. The Multi- 224 dimensional Array tags can be combined with classical CBOR arrays as 225 well as with Typed Arrays in order to build multi-dimensional arrays 226 with constant numbers of elements in the sub-arrays. The Homogeneous 227 Array tag can be used as a signal by an application to identify a 228 classical CBOR array as a homogeneous array, even when a Typed Array 229 does not apply. 231 3.1. Multi-dimensional Array 233 A multi-dimensional array is represented as a tagged array that 234 contains two (one-dimensional) arrays. The first array defines the 235 dimensions of the multi-dimensional array (in the sequence of outer 236 dimensions towards inner dimensions) while the second array 237 represents the contents of the multi-dimensional array. If the 238 second array is itself tagged as a Typed Array then the element type 239 of the multi-dimensional array is known to be the same type as that 240 of the Typed Array. 242 Two tags are defined by this document, one for elements arranged in 243 row-major order, and one for column-major order. 245 3.1.1. Row-major Order 247 Tag: 40 249 Data Item: array (major type 4) of two arrays, one array (major type 250 4) of dimensions, which are unsigned integers distinct from zero, 251 and one array (either a CBOR array of major type 4, or a Typed 252 Array, or a Homogeneous Array) of elements 254 Data in the second array consists of consecutive values where the 255 last dimension is considered contiguous (row-major order). 257 Figure 1 shows a declaration of a two-dimensional array in the C 258 language, a representation of that in CBOR using both a 259 multidimensional array tag and a typed array tag. 261 uint16_t a[2][3] = { 262 {2, 4, 8}, /* row 0 */ 263 {4, 16, 256}, 264 }; 266 # multi-dimensional array tag 267 82 # array(2) 268 82 # array(2) 269 02 # unsigned(2) 1st Dimension 270 03 # unsigned(3) 2nd Dimension 271 # uint16 array 272 4c # byte string(12) 273 0002 # unsigned(2) 274 0004 # unsigned(4) 275 0008 # unsigned(8) 276 0004 # unsigned(4) 277 0010 # unsigned(16) 278 0100 # unsigned(256) 280 Figure 1: Multi-dimensional array in C and CBOR 282 Figure 2 shows the same two-dimensional array using the 283 multidimensional array tag in conjunction with a basic CBOR array 284 (which, with the small numbers chosen for the example, happens to be 285 shorter). 287 # multi-dimensional array tag 288 82 # array(2) 289 82 # array(2) 290 02 # unsigned(2) 1st Dimension 291 03 # unsigned(3) 2nd Dimension 292 86 # array(6) 293 02 # unsigned(2) 294 04 # unsigned(4) 295 08 # unsigned(8) 296 04 # unsigned(4) 297 10 # unsigned(16) 298 19 0100 # unsigned(256) 300 Figure 2: Multi-dimensional array using basic CBOR array 302 3.1.2. Column-Major order 304 The multidimensional arrays specified in the previous sub-subsection 305 are in "row major" order, which is the preferred order for the 306 purposes of this specification. An analogous representation that 307 uses "column major" order arrays is provided in this subsection under 308 the tag 1040, as illustrated in Figure 3. 310 Tag: 1040 312 Data Item: as with tag 40, except that the data in the second array 313 consists of consecutive values where the first dimension is 314 considered contiguous (column-major order). 316 # multi-dimensional array tag, column major order 317 82 # array(2) 318 82 # array(2) 319 02 # unsigned(2) 1st Dimension 320 03 # unsigned(3) 2nd Dimension 321 86 # array(6) 322 02 # unsigned(2) 323 04 # unsigned(4) 324 04 # unsigned(4) 325 10 # unsigned(16) 326 08 # unsigned(8) 327 19 0100 # unsigned(256) 329 Figure 3: Multi-dimensional array using basic CBOR array, column 330 major order 332 3.2. Homogeneous Array 334 Tag: 41 336 Data Item: array (major type 4) 338 This tag identifies the classical CBOR array (a one-dimensional 339 array) tagged by it as a homogeneous array, that is, it has elements 340 that are all of the same application model data type. The element 341 type of the array is thus determined by the application model data 342 type of the first array element. 344 This can be used in application data models that apply specific 345 semantics to homogeneous arrays. Also, in certain cases, 346 implementations in strongly typed languages may be able to create 347 native homogeneous arrays of specific types instead of ordered lists 348 while decoding. Which CBOR data items constitute elements of the 349 same application type is specific to the application. 351 Figure 4 shows an example for a homogeneous array of booleans in C++ 352 and CBOR. 354 bool boolArray[2] = { true, false }; 356 # Homogeneous Array Tag 357 82 #array(2) 358 F5 # true 359 F4 # false 361 Figure 4: Homogeneous array in C++ and CBOR 363 Figure 5 extends the example with a more complex structure. 365 typedef struct { 366 bool active; 367 int value; 368 } foo; 369 foo myArray[2] = { {true, 3}, {true, -4} }; 371 372 82 # array(2) 373 82 # array(2) 374 F5 # true 375 03 # 3 376 82 # array(2) 377 F5 # true 378 23 # -4 380 Figure 5: Homogeneous array in C++ and CBOR 382 4. Discussion 384 Support for both little- and big-endian representation may seem out 385 of character with CBOR, which is otherwise fully big endian. This 386 support is in line with the intended use of the typed arrays and the 387 objective not to require conversion of each array element. 389 This specification allocates a sizable chunk out of the single-byte 390 tag space. This use of code point space is justified by the wide use 391 of typed arrays in data interchange. 393 Providing a column-major order variant of the multi-dimensional array 394 may seem superfluous to some, and useful to others. It is cheap to 395 define the additional tag so it is available when actually needed. 396 Allocating it out of a different number space makes the preference 397 for row-major evident. 399 Applying a Homogeneous Array tag to a Typed Array would usually be 400 redundant and is therefore not provided by the present specification. 402 5. CDDL typenames 404 For the use with CDDL [RFC8610], the typenames defined in Figure 6 405 are recommended: 407 ta-uint8 = #6.64(bstr) 408 ta-uint16be = #6.65(bstr) 409 ta-uint32be = #6.66(bstr) 410 ta-uint64be = #6.67(bstr) 411 ta-uint8-clamped = #6.68(bstr) 412 ta-uint16le = #6.69(bstr) 413 ta-uint32le = #6.70(bstr) 414 ta-uint64le = #6.71(bstr) 415 ta-sint8 = #6.72(bstr) 416 ta-sint16be = #6.73(bstr) 417 ta-sint32be = #6.74(bstr) 418 ta-sint64be = #6.75(bstr) 419 ; reserved: #6.76(bstr) 420 ta-sint16le = #6.77(bstr) 421 ta-sint32le = #6.78(bstr) 422 ta-sint64le = #6.79(bstr) 423 ta-float16be = #6.80(bstr) 424 ta-float32be = #6.81(bstr) 425 ta-float64be = #6.82(bstr) 426 ta-float128be = #6.83(bstr) 427 ta-float16le = #6.84(bstr) 428 ta-float32le = #6.85(bstr) 429 ta-float64le = #6.86(bstr) 430 ta-float128le = #6.87(bstr) 431 homogeneous = #6.41(array) 432 multi-dim = #6.40([dim, array]) 433 multi-dim-column-major = #6.1040([dim, array]) 435 Figure 6: Recommended typenames for CDDL 437 6. IANA Considerations 439 IANA has allocated the tags in Table 3, with the present document as 440 the specification reference. (The reserved value is reserved for a 441 future revision of typed array tags.) 443 The allocations came out of the "specification required" space 444 (24..255), with the exception of 1040, which came out of the "first 445 come first served" space (256..). 447 +------+-------------------+----------------------------------------+ 448 | Tag | Data Item | Semantics | 449 +------+-------------------+----------------------------------------+ 450 | 64 | byte string | uint8 Typed Array | 451 | 65 | byte string | uint16, big endian, Typed Array | 452 | 66 | byte string | uint32, big endian, Typed Array | 453 | 67 | byte string | uint64, big endian, Typed Array | 454 | 68 | byte string | uint8 Typed Array, clamped arithmetic | 455 | 69 | byte string | uint16, little endian, Typed Array | 456 | 70 | byte string | uint32, little endian, Typed Array | 457 | 71 | byte string | uint64, little endian, Typed Array | 458 | 72 | byte string | sint8 Typed Array | 459 | 73 | byte string | sint16, big endian, Typed Array | 460 | 74 | byte string | sint32, big endian, Typed Array | 461 | 75 | byte string | sint64, big endian, Typed Array | 462 | 76 | byte string | (reserved) | 463 | 77 | byte string | sint16, little endian, Typed Array | 464 | 78 | byte string | sint32, little endian, Typed Array | 465 | 79 | byte string | sint64, little endian, Typed Array | 466 | 80 | byte string | IEEE 754 binary16, big endian, Typed | 467 | | | Array | 468 | 81 | byte string | IEEE 754 binary32, big endian, Typed | 469 | | | Array | 470 | 82 | byte string | IEEE 754 binary64, big endian, Typed | 471 | | | Array | 472 | 83 | byte string | IEEE 754 binary128, big endian, Typed | 473 | | | Array | 474 | 84 | byte string | IEEE 754 binary16, little endian, | 475 | | | Typed Array | 476 | 85 | byte string | IEEE 754 binary32, little endian, | 477 | | | Typed Array | 478 | 86 | byte string | IEEE 754 binary64, little endian, | 479 | | | Typed Array | 480 | 87 | byte string | IEEE 754 binary128, little endian, | 481 | | | Typed Array | 482 | 40 | array of two | Multi-dimensional Array, row-major | 483 | | arrays* | order | 484 | 1040 | array of two | Multi-dimensional Array, column-major | 485 | | arrays* | order | 486 | 41 | array | Homogeneous Array | 487 +------+-------------------+----------------------------------------+ 489 Table 3: Values for Tags 491 *) 40 or 1040 data item: second element of outer array in data item 492 is native CBOR array (major type 4) or Typed Array (one of Tag 493 64..87) 495 7. Security Considerations 497 The security considerations of RFC 7049 apply; special attention is 498 drawn to the second paragraph of Section 8 of RFC 7049. 500 The Tag for homogeneous arrays makes a promise about its tagged data 501 item that a maliciously constructed CBOR input can then choose to 502 ignore. As always, the decoder therefore has to ensure that it is 503 not driven into an undefined state by array elements that do not 504 fulfill the promise and that it does continue to fulfill its API 505 contract in this case as well. 507 8. References 509 8.1. Normative References 511 [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE 512 Std 754-2008. 514 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 515 Requirement Levels", BCP 14, RFC 2119, 516 DOI 10.17487/RFC2119, March 1997, 517 . 519 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 520 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 521 October 2013, . 523 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 524 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 525 May 2017, . 527 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 528 Definition Language (CDDL): A Notational Convention to 529 Express Concise Binary Object Representation (CBOR) and 530 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 531 June 2019, . 533 8.2. Informative References 535 [ArrayBuffer] 536 Mozilla Developer Network, "JavaScript typed arrays", 537 2013, . 540 [TypedArray] 541 Vukicevic, V. and K. Russell, "Typed Array Specification", 542 February 2011. 544 [TypedArrayES6] 545 "22.2 TypedArray Objects", in: ECMA-262 6th Edition, The 546 ECMAScript 2015 Language Specification, June 2015, 547 . 550 Contributors 552 The initial draft for this specification was written by Johnathan 553 Roatch (roatch@gmail.com). Many thanks for getting this ball 554 rolling. 556 Glenn Engel suggested the tags for multi-dimensional arrays and 557 homogeneous arrays. 559 Acknowledgements 561 Jim Schaad provided helpful comments and reminded us that column- 562 major order still is in use. Jeffrey Yaskin helped improve the 563 definition of homogeneous arrays. IANA helped correct an error in a 564 previous version. 566 Author's Address 568 Carsten Bormann (editor) 569 Universitaet Bremen TZI 570 Postfach 330440 571 Bremen D-28359 572 Germany 574 Phone: +49-421-218-63921 575 Email: cabo@tzi.org