idnits 2.17.1 draft-bormann-cbor-packed-00.txt: -(3): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 3 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (13 July 2020) is 1375 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-16) exists of draft-ietf-cbor-7049bis-14 ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Bormann 3 Internet-Draft Universität Bremen TZI 4 Intended status: Informational 13 July 2020 5 Expires: 14 January 2021 7 Packed CBOR 8 draft-bormann-cbor-packed-00 10 Abstract 12 The Concise Binary Object Representation (CBOR, RFC 7049) is a data 13 format whose design goals include the possibility of extremely small 14 code size, fairly small message size, and extensibility without the 15 need for version negotiation. 17 CBOR does not provide any forms of data compression. CBOR data 18 items, in particular when generated from legacy data models often 19 allow considerable gains in compactness when applying data 20 compression. While traditional data compression techniques such as 21 DEFLATE (RFC 1951) work well for CBOR, their disadvantage is that the 22 receiver needs to unpack the compressed form to make use of data. 24 This specification describes Packed CBOR, a simple transformation of 25 a CBOR data item into another CBOR data item that is almost as easy 26 to consume as the original CBOR data item. A separate decompression 27 step is therefore often not required at the receiver. 29 Note to Readers 31 This is an individual submission to the CBOR working group of the 32 IETF, https://datatracker.ietf.org/wg/cbor/about/ 33 (https://datatracker.ietf.org/wg/cbor/about/). Discussion currently 34 takes places on the github repository https://github.com/cabo/cbor- 35 packed (https://github.com/cabo/cbor-packed). If the CBOR WG 36 believes this is a useful document, discussion is likely to move to 37 the CBOR WG mailing list and a github repository at the CBOR WG 38 github organization, https://github.com/cbor-wg (https://github.com/ 39 cbor-wg). 41 The current version is true work in progress; some of the sections 42 haven't been filled in yet, and in particular, permission has not 43 been obtained from tag definition authors to copy over their text. 45 Status of This Memo 47 This Internet-Draft is submitted in full conformance with the 48 provisions of BCP 78 and BCP 79. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF). Note that other groups may also distribute 52 working documents as Internet-Drafts. The list of current Internet- 53 Drafts is at https://datatracker.ietf.org/drafts/current/. 55 Internet-Drafts are draft documents valid for a maximum of six months 56 and may be updated, replaced, or obsoleted by other documents at any 57 time. It is inappropriate to use Internet-Drafts as reference 58 material or to cite them other than as "work in progress." 60 This Internet-Draft will expire on 14 January 2021. 62 Copyright Notice 64 Copyright (c) 2020 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 69 license-info) in effect on the date of publication of this document. 70 Please review these documents carefully, as they describe your rights 71 and restrictions with respect to this document. Code Components 72 extracted from this document must include Simplified BSD License text 73 as described in Section 4.e of the Trust Legal Provisions and are 74 provided without warranty as described in the Simplified BSD License. 76 Table of Contents 78 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 79 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 80 2. Packed CBOR . . . . . . . . . . . . . . . . . . . . . . . . . 3 81 2.1. Referencing Shared Items . . . . . . . . . . . . . . . . 4 82 2.2. Referencing Prefix Items . . . . . . . . . . . . . . . . 4 83 3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 5 84 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 85 5. Security Considerations . . . . . . . . . . . . . . . . . . . 6 86 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 87 6.1. Normative References . . . . . . . . . . . . . . . . . . 7 88 6.2. Informative References . . . . . . . . . . . . . . . . . 7 89 Appendix A. Example . . . . . . . . . . . . . . . . . . . . . . 8 90 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 9 91 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9 93 1. Introduction 95 (TO DO, expand on text from abstract here; move references here and 96 neuter them in the abstract as per Section 4.3 of [RFC7322].) 97 The specification defines a transformation from a Packed CBOR data 98 item to the original CBOR data item; it does not define an algorithm 99 for an actual packer. Different packers can differ in the amount of 100 effort they invest in arriving at a minimal packed form. 102 Packed CBOR can employs two kinds of optimization: 104 * structure sharing: substructures (data items) that occur 105 repeatedly in the original CBOR data item can be collapsed to a 106 simple reference to a common representation of that data item. 107 The processing required during consumption is limited to following 108 that reference. 110 * prefix sharing: strings that share a prefix can be replaced by a 111 reference to a common prefix plus the rest of the string. The 112 processing required during consumption is similar to following the 113 prefix reference plus that for an indefinite-length string. 115 A specific application protocol that employs Packed CBOR might allow 116 both kinds of optimization or limit the representation to structure 117 sharing only. 119 1.1. Terminology 121 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 122 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 123 document are to be interpreted as described in RFC 2119 [RFC2119]. 125 The definitions of [I-D.ietf-cbor-7049bis] apply. The term "byte" is 126 used in its now customary sense as a synonym for "octet". Where bit 127 arithmetic is explained, this document uses the notation familiar 128 from the programming language C (including C++14's 0bnnn binary 129 literals), except that the operator "**" stands for exponentiation. 131 2. Packed CBOR 133 Packed CBOR is defined in CDDL [RFC8610] as in Figure 1: 135 Packed-CBOR = #6.6([rump, [*prefix], *shared]) 136 rump = any 137 prefix = any 138 shared = any 140 Figure 1: Packed CBOR in CDDL 142 (This assumes the allocation of tag number 6, which is motivated 143 further below.) 144 The original CBOR data item can be reconstructed by recursively 145 replacing shared and prefix references encountered in the rump by 146 their defined values. 148 2.1. Referencing Shared Items 150 Shared items are stored in the third to last element of the array 151 used as tag content for tag number 6, numbered starting by 2. 153 The shared data items are referenced by using the data items in 154 Table 1. When reconstructing the original data item, such a 155 reference is replaced by the referenced data item, which is then 156 recursively unpacked. 158 +===========================+================+ 159 | reference | element number | 160 +===========================+================+ 161 | Simple value 0-15 | 2-17 | 162 +---------------------------+----------------+ 163 | Tag 6(unsigned integer N) | 18 + 2*N | 164 +---------------------------+----------------+ 165 | Tag 6(negative integer N) | 18 - 2*N - 1 | 166 +---------------------------+----------------+ 168 Table 1: Referencing Shared Values 170 Taking into account the encoding, there are 16 one-byte references, 171 48 two-byte references, 512 three-byte references, 131072 four-byte 172 references, etc. As integers can grow to very large (or small) 173 values, there is no practical limit to how many shared items might be 174 used in a Packed CBOR item. 176 2.2. Referencing Prefix Items 178 Shared items are stored in an array that is the second element of the 179 array used as tag content for tag number 6. This array is indexed 180 from 0. 182 Prefix data items are referenced by using the data items in Table 2. 183 When reconstructing the original data item, such a reference is 184 replaced by a string constructed from the referenced prefix data item 185 (prefix, which might need to be recursively unpacked first) 186 concatenated with the tag content (suffix, again possibly recursively 187 unpacked). The result gets the type of the suffix; this way a single 188 prefix can be used to build both byte and text strings, depending on 189 what type of suffix is being used. 191 +===================================+================+ 192 | reference | element number | 193 +===================================+================+ 194 | Tag 6(suffix) | 0 | 195 +-----------------------------------+----------------+ 196 | Tag 224-255(suffix) | 1-32 | 197 +-----------------------------------+----------------+ 198 | Tag 28672-32767(suffix) | 33-4128 | 199 +-----------------------------------+----------------+ 200 | Tag 1879048192-2147483647(suffix) | 4129-268439584 | 201 +-----------------------------------+----------------+ 203 Table 2: Referencing Prefix Values 205 Taking into account the encoding, there is one one-byte prefix 206 reference, 32 two-byte references, 4096 three-byte references, and 207 268435456 five-byte references. 268439585 (2**28+2**12+2**5+2**0) is 208 an artificial limit, but should be high enough that there, again, is 209 no practical limit to how many prefix items might be used in a Packed 210 CBOR item. 212 3. Discussion 214 This specification uses up a large number of Simple Values and Tags, 215 in particular one of the rare one-byte tags and half of the one-byte 216 simple values. Since the objective is compression, this is warranted 217 if and only if there is consensus that this specific format could be 218 useful for a wide area of applications, while maintaining reasonable 219 simplicity in particular at the side of the consumer. 221 Note that the semantics of Tag 6 depend on its content: An integer 222 turns the tag into a shared reference, a string into a prefix 223 reference, and an array into a complete Packed CBOR data item. 225 A maliciously crafted Packed CBOR data item might contain a reference 226 loop. A consumer/decompressor MUST protect against that. 228 The current definition does nothing to help with packing CBOR 229 sequences [RFC8742]; maybe it should. 231 Nesting packed CBOR data items is not useful; maybe it should. 233 4. IANA Considerations 235 In the registry [IANA.cbor-tags], IANA is requested to allocate the 236 tags defined in Table 3. 238 +===========+========+================+===========================+++ 239 | Tag | Data | Semantics | Reference ||| 240 | | Item | | ||| 241 +===========+========+================+===========================+++ 242 | 6 | array, | Packed CBOR: | draft-bormann-cbor-packed ||| 243 | |integer,| packed/shared/ | ||| 244 | | text | prefix | ||| 245 | |string, | | ||| 246 | | byte | | ||| 247 | | string | | ||| 248 +-----------+--------+----------------+---------------------------+++ 249 | 224-255 | text | Packed CBOR: | draft-bormann-cbor-packed ||| 250 | | string | prefix | ||| 251 | |or byte | | ||| 252 | | string | | ||| 253 +-----------+--------+----------------+---------------------------+++ 254 |28672-32767| text | Packed CBOR: | draft-bormann-cbor-packed ||| 255 | | string | prefix | ||| 256 | |or byte | | ||| 257 | | string | | ||| 258 +-----------+--------+----------------+---------------------------+++ 259 |1879048192-| text | Packed CBOR: | draft-bormann-cbor-packed ||| 260 | 2147483647| string | prefix | ||| 261 | |or byte | | ||| 262 | | string | | ||| 263 +-----------+--------+----------------+---------------------------+++ 265 Table 3: Values for Tag Numbers 267 In the registry [IANA.cbor-simple-values], IANA is requested to 268 allocate the simple values defined in Table 4. 270 +=======+=====================+===========================+ 271 | Value | Semantics | Reference | 272 +=======+=====================+===========================+ 273 | 0-15 | Packed CBOR: shared | draft-bormann-cbor-packed | 274 +-------+---------------------+---------------------------+ 276 Table 4: Simple Values 278 5. Security Considerations 280 The security considerations of RFC 7049 apply. 282 Loops in the Packed CBOR can be used as a denial of service attack, 283 see Section 3. 285 6. References 286 6.1. Normative References 288 [I-D.ietf-cbor-7049bis] 289 Bormann, C. and P. Hoffman, "Concise Binary Object 290 Representation (CBOR)", Work in Progress, Internet-Draft, 291 draft-ietf-cbor-7049bis-14, 16 June 2020, 292 . 295 [IANA.cbor-simple-values] 296 IANA, "Concise Binary Object Representation (CBOR) Simple 297 Values", 298 . 300 [IANA.cbor-tags] 301 IANA, "Concise Binary Object Representation (CBOR) Tags", 302 . 304 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 305 Requirement Levels", BCP 14, RFC 2119, 306 DOI 10.17487/RFC2119, March 1997, 307 . 309 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 310 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 311 October 2013, . 313 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 314 Definition Language (CDDL): A Notational Convention to 315 Express Concise Binary Object Representation (CBOR) and 316 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 317 June 2019, . 319 6.2. Informative References 321 [RFC7322] Flanagan, H. and S. Ginoza, "RFC Style Guide", RFC 7322, 322 DOI 10.17487/RFC7322, September 2014, 323 . 325 [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) 326 Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, 327 . 329 Appendix A. Example 331 The (JSON-compatible) CBOR data structure depicted in Figure 2, 400 332 bytes of binary CBOR, could lead to a packed CBOR data item depicted 333 in Figure 3, 307 bytes. Note that this example does not lend itself 334 to prefix compression. 336 { "store": { 337 "book": [ 338 { "category": "reference", 339 "author": "Nigel Rees", 340 "title": "Sayings of the Century", 341 "price": 8.95 342 }, 343 { "category": "fiction", 344 "author": "Evelyn Waugh", 345 "title": "Sword of Honour", 346 "price": 12.99 347 }, 348 { "category": "fiction", 349 "author": "Herman Melville", 350 "title": "Moby Dick", 351 "isbn": "0-553-21311-3", 352 "price": 8.99 353 }, 354 { "category": "fiction", 355 "author": "J. R. R. Tolkien", 356 "title": "The Lord of the Rings", 357 "isbn": "0-395-19395-8", 358 "price": 22.99 359 } 360 ], 361 "bicycle": { 362 "color": "red", 363 "price": 19.95 364 } 365 } 366 } 368 Figure 2: Example original CBOR data item 370 6([{"store": { 371 "book": [ 372 {simple(1): "reference", simple(2): "Nigel Rees", 373 simple(3): "Sayings of the Century", simple(0): simple(5)}, 374 {simple(1): simple(4), simple(2): "Evelyn Waugh", 375 simple(3): "Sword of Honour", simple(0): 12.99}, 376 {simple(1): simple(4), simple(2): "Herman Melville", 377 simple(3): "Moby Dick", simple(6): "0-553-21311-3", 378 simple(0): simple(5)}, 379 {simple(1): simple(4), simple(2): "J. R. R. Tolkien", 380 simple(3): "The Lord of the Rings", 381 simple(6): "0-395-19395-8", simple(0): 22.99}], 382 "bicycle": {"color": "red", simple(0): 19.95}}}, 383 [], 384 "price", "category", "author", "title", "fiction", 8.95, "isbn"]) 385 / 0 1 2 3 4 5 6 / 387 Figure 3: Example packed CBOR data item 389 TBD: Do this for a W3C Thing Description again to get better packing 390 and to exercise prefix compression... 392 Acknowledgements 394 CBOR packing was originally invented with the rest of CBOR, but did 395 not make it into [RFC7049]. Various attempts to come up with a 396 specification over the years didn't proceed. In 2017, Sebastian 397 Käbisch proposed investigating compact representations of W3C Thing 398 Descriptions, which prompted the author to come up with essentially 399 the present design. 401 Author's Address 403 Carsten Bormann 404 Universität Bremen TZI 405 Postfach 330440 406 D-28359 Bremen 407 Germany 409 Phone: +49-421-218-63921 410 Email: cabo@tzi.org