idnits 2.17.1 draft-ietf-cbor-packed-01.txt: -(3): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 3 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (27 January 2021) is 1184 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Bormann 3 Internet-Draft Universität Bremen TZI 4 Intended status: Informational 27 January 2021 5 Expires: 31 July 2021 7 Packed CBOR 8 draft-ietf-cbor-packed-01 10 Abstract 12 The Concise Binary Object Representation (CBOR, RFC 8949) is a data 13 format whose design goals include the possibility of extremely small 14 code size, fairly small message size, and extensibility without the 15 need for version negotiation. 17 CBOR does not provide any forms of data compression. CBOR data 18 items, in particular when generated from legacy data models often 19 allow considerable gains in compactness when applying data 20 compression. While traditional data compression techniques such as 21 DEFLATE (RFC 1951) work well for CBOR, their disadvantage is that the 22 receiver needs to unpack the compressed form to make use of data. 24 This specification describes Packed CBOR, a simple transformation of 25 a CBOR data item into another CBOR data item that is almost as easy 26 to consume as the original CBOR data item. A separate decompression 27 step is therefore often not required at the receiver. 29 Note to Readers 31 This is a working-group draft of the CBOR working group of the IETF, 32 https://datatracker.ietf.org/wg/cbor/about/ 33 (https://datatracker.ietf.org/wg/cbor/about/). Discussion takes 34 places on the github repository https://github.com/cbor-wg/cbor- 35 packed (https://github.com/cbor-wg/cbor-packed) and on the CBOR WG 36 mailing list, https://www.ietf.org/mailman/listinfo/cbor 37 (https://www.ietf.org/mailman/listinfo/cbor). 39 Status of This Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at https://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on 31 July 2021. 56 Copyright Notice 58 Copyright (c) 2021 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 63 license-info) in effect on the date of publication of this document. 64 Please review these documents carefully, as they describe your rights 65 and restrictions with respect to this document. Code Components 66 extracted from this document must include Simplified BSD License text 67 as described in Section 4.e of the Trust Legal Provisions and are 68 provided without warranty as described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 73 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 74 2. Packed CBOR . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 2.1. Packing Tables . . . . . . . . . . . . . . . . . . . . . 4 76 2.2. Referencing Shared Items . . . . . . . . . . . . . . . . 4 77 2.3. Referencing Affix Items . . . . . . . . . . . . . . . . . 5 78 2.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . 7 79 3. Table Setup . . . . . . . . . . . . . . . . . . . . . . . . . 7 80 3.1. Basic Packed CBOR . . . . . . . . . . . . . . . . . . . . 8 81 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 82 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 83 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 84 6.1. Normative References . . . . . . . . . . . . . . . . . . 10 85 6.2. Informative References . . . . . . . . . . . . . . . . . 10 86 Appendix A. Example . . . . . . . . . . . . . . . . . . . . . . 11 87 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 12 88 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 90 1. Introduction 92 (TO DO, expand on text from abstract here; move references here and 93 neuter them in the abstract as per Section 4.3 of [RFC7322].) 94 The specification defines a transformation from a Packed CBOR data 95 item to the original CBOR data item; it does not define an algorithm 96 for an actual packer. Different packers can differ in the amount of 97 effort they invest in arriving at a minimal packed form. 99 Packed CBOR can employ two kinds of optimization: 101 * item sharing: substructures (data items) that occur repeatedly in 102 the original CBOR data item can be collapsed to a simple reference 103 to a common representation of that data item. The processing 104 required during consumption is limited to following that 105 reference. 107 * affix sharing: data items (strings, containers) that share a 108 prefix or suffix (affix) can be replaced by a reference to a 109 common affix plus the rest of the data item. For strings, the 110 processing required during consumption is similar to following the 111 affix reference plus that for an indefinite-length string. 113 A specific application protocol that employs Packed CBOR might allow 114 both kinds of optimization or limit the representation to item 115 sharing only. 117 1.1. Terminology 119 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 120 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 121 "OPTIONAL" in this document are to be interpreted as described in 122 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 123 capitals, as shown here. 125 Packed reference: A shared item reference or an affix reference 127 Shared item reference: A reference to a shared item as defined in 128 Section 2.2 130 Affix reference: A reference that combines an affix item as defined 131 in Section 2.3. 133 Affix: Prefix or suffix. 135 Packing tables: The triple of a shared item table, a prefix table, 136 and a suffix table. 138 Expansion: The result of applying a packed reference in the context 139 of given Packing tables. 141 The definitions of [RFC8949] apply. The term "byte" is used in its 142 now customary sense as a synonym for "octet". Where bit arithmetic 143 is explained, this document uses the notation familiar from the 144 programming language C (including C++14's 0bnnn binary literals), 145 except that, in the plain text form of this document, the operator 146 "^" stands for exponentiation. 148 2. Packed CBOR 150 Packed CBOR is defined in two parts: Referencing packing tables (this 151 section) and setting up packing tables (Section 3). 153 2.1. Packing Tables 155 At any point within a data item making use of Packed CBOR, there is a 156 Current Set of packing tables that applies. 158 There are three packing tables in a Current Set: 160 * Shared item table 162 * Prefix table 164 * Suffix table 166 Without any table setup, all these tables are empty arrays. 167 Table setup can cause these arrays to be non-empty, where the 168 elements are (potentially themselves packed) data items. In the 169 abstract, each of the tables is indexed by an unsigned integer 170 (starting from 0). 172 2.2. Referencing Shared Items 174 Shared items are stored in the shared item table of the Current Set. 176 The shared data items are referenced by using the data items in 177 Table 1. When reconstructing the original data item, such a 178 reference is replaced by the referenced data item, which is then 179 recursively unpacked. 181 +===========================+================+ 182 | reference | element number | 183 +===========================+================+ 184 | Simple value 0-15 | 0-15 | 185 +---------------------------+----------------+ 186 | Tag 6(unsigned integer N) | 16 + 2*N | 187 +---------------------------+----------------+ 188 | Tag 6(negative integer N) | 16 - 2*N - 1 | 189 +---------------------------+----------------+ 191 Table 1: Referencing Shared Values 193 Taking into account the encoding of these referring data items, there 194 are 16 one-byte references, 48 two-byte references, 512 three-byte 195 references, 131072 four-byte references, etc. As integers can grow 196 to very large (or small) values, there is no practical limit to how 197 many shared items might be used in a Packed CBOR item. 199 Note that the semantics of Tag 6 depend on its content: An integer 200 turns the tag into a shared item reference, a string or container 201 (map or array) into a prefix reference (see Table 2). 203 2.3. Referencing Affix Items 205 Prefix items are stored in the prefix table of the Current Set; 206 suffix items are stored in the suffix table of the Current Set. We 207 collectively call these items affix items; when referencing, which of 208 the tables is actually used depends on whether a prefix or a suffix 209 reference was used. 211 +===============================+==================+================+ 212 | prefix reference | suffix | element number | 213 | | reference | | 214 +===============================+==================+================+ 215 | Tag 6(suffix) | -- | 0 | 216 +-------------------------------+------------------+----------------+ 217 | Tag 224-255(suffix) | TBD | 1-32 | 218 +-------------------------------+------------------+----------------+ 219 | Tag 28672-32767(suffix) | TBD | 33-4128 | 220 +-------------------------------+------------------+----------------+ 221 | Tag | TBD | 4129-268439584 | 222 | 1879048192-2147483647(suffix) | | | 223 +-------------------------------+------------------+----------------+ 225 Table 2: Referencing Affix Values 227 Affix data items are referenced by using the data items in Table 2. 228 Each of these implies the table used (prefix or suffix), a table 229 index (an unsigned integer) and contains a "rump item". When 230 reconstructing the original data item, such a reference is replaced 231 by a data item constructed from the referenced affix data item 232 (affix, which might need to be recursively unpacked first) 233 "concatenated" with the tag content (rump, again possibly recursively 234 unpacked). 236 * For a rump of type array and map, the affix also needs to be an 237 array or a map. For an array, the elements from the prefix are 238 prepended, and the elements from a suffix are appended to the rump 239 array. For a map, the entries in the affix are added to those of 240 the rump; prefix and suffix references differ in how entries with 241 identical keys are combined: for prefix references, an entry in 242 the rump with the same key as an entry in the affix overrides the 243 one in the affix, while for suffix references, an entry in the 244 affix overrides an entry in the rump that has the same key. 246 | ISSUE: Not sure that we want to use the efficiencies of 247 | overriding, but having default values supplied out of a 248 | dictionary to be overridden by a rump sounds rather handy. 249 | Note that there is no way to remove a map entry from the table. 251 * For a rump of one of the string types, the affix also needs to be 252 one of the string types; the bytes of the strings are concatenated 253 as specified (prefix + rump, rump + suffix). The result of the 254 concatenation gets the type of the rump; this way a single affix 255 can be used to build both byte and text strings, depending on what 256 type of rump is being used. 258 Taking into account the encoding, there is one single-byte prefix 259 reference, 32 two-byte references, 4096 three-byte references, and 260 268435456 five-byte references. 268439585 261 (2^(28)+2^(12)+2^(5)+2^(0)) is an artificial limit, but should be 262 high enough that there, again, is no practical limit to how many 263 prefix items might be used in a Packed CBOR item. 265 | Issue: Table 2 assumes that the numbering system for prefixes 266 | and suffixes is the same, so the same sizes of allocations need 267 | to be made. However, experience suggests that prefix packing 268 | might be more likely than suffix packing. Also for this 269 | reason, there is no intent to spend a 1+0 tag value for suffix 270 | matching. This detail should be defined in the next version. 272 2.4. Discussion 274 This specification uses up a large number of Simple Values and Tags, 275 in particular one of the rare one-byte tags and half of the one-byte 276 simple values. Since the objective is compression, this is warranted 277 if and only if there is consensus that this specific format could be 278 useful for a wide area of applications, while maintaining reasonable 279 simplicity in particular at the side of the consumer. 281 A maliciously crafted Packed CBOR data item might contain a reference 282 loop. A consumer/decompressor MUST protect against that. 284 The current definition does nothing to help with packing CBOR 285 sequences [RFC8742]; maybe it should. 287 3. Table Setup 289 The packing references described in Section 2 assume that packing 290 tables have been set up. 292 By default, all three tables are empty (zero-length arrays). 294 Table setup can happen in one of two ways: 296 * By the application environment, e.g., a media type. These can 297 define tables that amount to a static dictionary that can be used 298 in a CBOR data item for this application environment. 300 * By one or more tags enclosing the packed content. These can be 301 defined to add to the packing tables that already apply to the 302 tag. Usually, the semantics of the tag will be to prepend items 303 to one of the tables. Note that it may be useful to leave a 304 particular efficiency tier alone and only prepend to a higher 305 tier; e.g., a tag could insert shared items at table index 16 and 306 shift anything that was already there further down in the array 307 while leaving index 0 to 15 alone. Explicit additions by tag can 308 combine with application-environment supplied tables that apply to 309 the entire CBOR data item. 311 The present specification only defines a single tag for prepending to 312 the (by default empty) tables. 314 | We could also define a tag for dictionary referencing (or 315 | include that in the basic packed CBOR), but the details are 316 | likely to vary considerably between applications. A URI-based 317 | reference would be easy to add, but might be too inefficient 318 | when used in the likely combination with an "ni:" URI 319 | [RFC6920]. 321 3.1. Basic Packed CBOR 323 A predefined tag for packing table setup is defined in CDDL [RFC8610] 324 as in Figure 1: 326 Basic-Packed-CBOR = #6.51([[*shared], [*prefix], [*suffix], rump]) 327 rump = any 328 prefix = any 329 suffix = any 330 shared = any 332 Figure 1: Packed CBOR in CDDL 334 (This assumes the allocation of tag number 51.) 336 The arrays given as the first, second, and third element of the 337 content of the tag 51 are prepended to the tables for shared items, 338 prefixes, and suffixes that apply to the entire tag (by default empty 339 tables). 341 The original CBOR data item can be reconstructed by recursively 342 replacing shared, prefix, and suffix references encountered in the 343 rump by their expansions. 345 4. IANA Considerations 347 In the registry [IANA.cbor-tags], IANA is requested to allocate the 348 tags defined in Table 3. 350 +=============+==============+===========+========================+ 351 | Tag | Data Item | Semantics | Reference | 352 +=============+==============+===========+========================+ 353 | 6 | integer, | Packed | draft-ietf-cbor-packed | 354 | | text string, | CBOR: | | 355 | | byte string, | shared/ | | 356 | | array, map | prefix | | 357 +-------------+--------------+-----------+------------------------+ 358 | 224-255 | text string, | Packed | draft-ietf-cbor-packed | 359 | | byte string, | CBOR: | | 360 | | array, map | prefix | | 361 +-------------+--------------+-----------+------------------------+ 362 | 28672-32767 | text string, | Packed | draft-ietf-cbor-packed | 363 | | byte string, | CBOR: | | 364 | | array, map | prefix | | 365 +-------------+--------------+-----------+------------------------+ 366 | 1879048192- | text string, | Packed | draft-ietf-cbor-packed | 367 | 2147483647 | byte string, | CBOR: | | 368 | | array, map | prefix | | 369 +-------------+--------------+-----------+------------------------+ 370 | TBD | text string, | Packed | draft-ietf-cbor-packed | 371 | | byte string, | CBOR: | | 372 | | array, map | suffix | | 373 +-------------+--------------+-----------+------------------------+ 375 Table 3: Values for Tag Numbers 377 In the registry [IANA.cbor-simple-values], IANA is requested to 378 allocate the simple values defined in Table 4. 380 +=======+=====================+========================+ 381 | Value | Semantics | Reference | 382 +=======+=====================+========================+ 383 | 0-15 | Packed CBOR: shared | draft-ietf-cbor-packed | 384 +-------+---------------------+------------------------+ 386 Table 4: Simple Values 388 5. Security Considerations 390 The security considerations of [RFC8949] apply. 392 Loops in the Packed CBOR can be used as a denial of service attack, 393 see Section 2.4. 395 As the unpacking is deterministic, packed forms can be used as 396 signing inputs. (Note that if external dictionaries are added to 397 cbor-packed, this requires additional consideration.) 399 6. References 401 6.1. Normative References 403 [IANA.cbor-simple-values] 404 IANA, "Concise Binary Object Representation (CBOR) Simple 405 Values", 406 . 408 [IANA.cbor-tags] 409 IANA, "Concise Binary Object Representation (CBOR) Tags", 410 . 412 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 413 Requirement Levels", BCP 14, RFC 2119, 414 DOI 10.17487/RFC2119, March 1997, 415 . 417 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 418 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 419 October 2013, . 421 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 422 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 423 May 2017, . 425 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 426 Definition Language (CDDL): A Notational Convention to 427 Express Concise Binary Object Representation (CBOR) and 428 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 429 June 2019, . 431 [RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object 432 Representation (CBOR)", STD 94, RFC 8949, 433 DOI 10.17487/RFC8949, December 2020, 434 . 436 6.2. Informative References 438 [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., 439 Keranen, A., and P. Hallam-Baker, "Naming Things with 440 Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013, 441 . 443 [RFC7322] Flanagan, H. and S. Ginoza, "RFC Style Guide", RFC 7322, 444 DOI 10.17487/RFC7322, September 2014, 445 . 447 [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) 448 Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, 449 . 451 Appendix A. Example 453 The (JSON-compatible) CBOR data structure depicted in Figure 2, 400 454 bytes of binary CBOR, could lead to a packed CBOR data item depicted 455 in Figure 3, ~309 bytes. Note that this particular example does not 456 lend itself to prefix compression. 458 { "store": { 459 "book": [ 460 { "category": "reference", 461 "author": "Nigel Rees", 462 "title": "Sayings of the Century", 463 "price": 8.95 464 }, 465 { "category": "fiction", 466 "author": "Evelyn Waugh", 467 "title": "Sword of Honour", 468 "price": 12.99 469 }, 470 { "category": "fiction", 471 "author": "Herman Melville", 472 "title": "Moby Dick", 473 "isbn": "0-553-21311-3", 474 "price": 8.99 475 }, 476 { "category": "fiction", 477 "author": "J. R. R. Tolkien", 478 "title": "The Lord of the Rings", 479 "isbn": "0-395-19395-8", 480 "price": 22.99 481 } 482 ], 483 "bicycle": { 484 "color": "red", 485 "price": 19.95 486 } 487 } 488 } 490 Figure 2: Example original CBOR data item 492 51(["price", "category", "author", "title", "fiction", 8.95, "isbn"], 493 / 0 1 2 3 4 5 6 / 494 [], [], 495 [{"store": { 496 "book": [ 497 {simple(1): "reference", simple(2): "Nigel Rees", 498 simple(3): "Sayings of the Century", simple(0): simple(5)}, 499 {simple(1): simple(4), simple(2): "Evelyn Waugh", 500 simple(3): "Sword of Honour", simple(0): 12.99}, 501 {simple(1): simple(4), simple(2): "Herman Melville", 502 simple(3): "Moby Dick", simple(6): "0-553-21311-3", 503 simple(0): simple(5)}, 504 {simple(1): simple(4), simple(2): "J. R. R. Tolkien", 505 simple(3): "The Lord of the Rings", 506 simple(6): "0-395-19395-8", simple(0): 22.99}], 507 "bicycle": {"color": "red", simple(0): 19.95}}}]) 509 Figure 3: Example packed CBOR data item 511 TBD: Do this for a W3C Thing Description again to get better packing 512 and to exercise prefix compression... 514 Acknowledgements 516 CBOR packing was originally invented with the rest of CBOR, but did 517 not make it into [RFC7049], the predecessor of [RFC8949]. Various 518 attempts to come up with a specification over the years didn't 519 proceed. In 2017, Sebastian Käbisch proposed investigating compact 520 representations of W3C Thing Descriptions, which prompted the author 521 to come up with essentially the present design. 523 Author's Address 525 Carsten Bormann 526 Universität Bremen TZI 527 Postfach 330440 528 D-28359 Bremen 529 Germany 531 Phone: +49-421-218-63921 532 Email: cabo@tzi.org