idnits 2.17.1 draft-bormann-cbor-packed-01.txt: -(3): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 3 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (26 July 2020) is 1369 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-16) exists of draft-ietf-cbor-7049bis-14 ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Bormann 3 Internet-Draft Universität Bremen TZI 4 Intended status: Informational 26 July 2020 5 Expires: 27 January 2021 7 Packed CBOR 8 draft-bormann-cbor-packed-01 10 Abstract 12 The Concise Binary Object Representation (CBOR, RFC 7049) is a data 13 format whose design goals include the possibility of extremely small 14 code size, fairly small message size, and extensibility without the 15 need for version negotiation. 17 CBOR does not provide any forms of data compression. CBOR data 18 items, in particular when generated from legacy data models often 19 allow considerable gains in compactness when applying data 20 compression. While traditional data compression techniques such as 21 DEFLATE (RFC 1951) work well for CBOR, their disadvantage is that the 22 receiver needs to unpack the compressed form to make use of data. 24 This specification describes Packed CBOR, a simple transformation of 25 a CBOR data item into another CBOR data item that is almost as easy 26 to consume as the original CBOR data item. A separate decompression 27 step is therefore often not required at the receiver. 29 Note to Readers 31 This is an individual submission to the CBOR working group of the 32 IETF, https://datatracker.ietf.org/wg/cbor/about/ 33 (https://datatracker.ietf.org/wg/cbor/about/). Discussion currently 34 takes places on the github repository https://github.com/cabo/cbor- 35 packed (https://github.com/cabo/cbor-packed). If the CBOR WG 36 believes this is a useful document, discussion is likely to move to 37 the CBOR WG mailing list and a github repository at the CBOR WG 38 github organization, https://github.com/cbor-wg (https://github.com/ 39 cbor-wg). 41 The current version is true work in progress; some of the sections 42 haven't been filled in yet, and in particular, permission has not 43 been obtained from tag definition authors to copy over their text. 45 Status of This Memo 47 This Internet-Draft is submitted in full conformance with the 48 provisions of BCP 78 and BCP 79. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF). Note that other groups may also distribute 52 working documents as Internet-Drafts. The list of current Internet- 53 Drafts is at https://datatracker.ietf.org/drafts/current/. 55 Internet-Drafts are draft documents valid for a maximum of six months 56 and may be updated, replaced, or obsoleted by other documents at any 57 time. It is inappropriate to use Internet-Drafts as reference 58 material or to cite them other than as "work in progress." 60 This Internet-Draft will expire on 27 January 2021. 62 Copyright Notice 64 Copyright (c) 2020 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 69 license-info) in effect on the date of publication of this document. 70 Please review these documents carefully, as they describe your rights 71 and restrictions with respect to this document. Code Components 72 extracted from this document must include Simplified BSD License text 73 as described in Section 4.e of the Trust Legal Provisions and are 74 provided without warranty as described in the Simplified BSD License. 76 Table of Contents 78 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 79 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 80 2. Packed CBOR . . . . . . . . . . . . . . . . . . . . . . . . . 3 81 2.1. Referencing Shared Items . . . . . . . . . . . . . . . . 4 82 2.2. Referencing Prefix Items . . . . . . . . . . . . . . . . 4 83 3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 5 84 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 85 5. Security Considerations . . . . . . . . . . . . . . . . . . . 6 86 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 87 6.1. Normative References . . . . . . . . . . . . . . . . . . 7 88 6.2. Informative References . . . . . . . . . . . . . . . . . 7 89 Appendix A. Example . . . . . . . . . . . . . . . . . . . . . . 8 90 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 9 91 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9 93 1. Introduction 95 (TO DO, expand on text from abstract here; move references here and 96 neuter them in the abstract as per Section 4.3 of [RFC7322].) 97 The specification defines a transformation from a Packed CBOR data 98 item to the original CBOR data item; it does not define an algorithm 99 for an actual packer. Different packers can differ in the amount of 100 effort they invest in arriving at a minimal packed form. 102 Packed CBOR can employ two kinds of optimization: 104 * structure sharing: substructures (data items) that occur 105 repeatedly in the original CBOR data item can be collapsed to a 106 simple reference to a common representation of that data item. 107 The processing required during consumption is limited to following 108 that reference. 110 * prefix sharing: strings that share a prefix can be replaced by a 111 reference to a common prefix plus the rest of the string. The 112 processing required during consumption is similar to following the 113 prefix reference plus that for an indefinite-length string. 115 A specific application protocol that employs Packed CBOR might allow 116 both kinds of optimization or limit the representation to structure 117 sharing only. 119 1.1. Terminology 121 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 122 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 123 document are to be interpreted as described in RFC 2119 [RFC2119]. 125 The definitions of [I-D.ietf-cbor-7049bis] apply. The term "byte" is 126 used in its now customary sense as a synonym for "octet". Where bit 127 arithmetic is explained, this document uses the notation familiar 128 from the programming language C (including C++14's 0bnnn binary 129 literals), except that, in the plain text form of this document. the 130 operator "^" stands for exponentiation. 132 2. Packed CBOR 134 Packed CBOR is defined in CDDL [RFC8610] as in Figure 1: 136 Packed-CBOR = #6.6([rump, [*prefix], *shared]) 137 rump = any 138 prefix = any 139 shared = any 141 Figure 1: Packed CBOR in CDDL 143 (This assumes the allocation of tag number 6, which is motivated 144 further below. Note that the semantics of Tag 6 depend on its 145 content: An integer turns the tag into a shared reference, a string 146 into a prefix reference, and an array into a complete Packed CBOR 147 data item as described above.) 149 The original CBOR data item can be reconstructed by recursively 150 replacing shared and prefix references encountered in the rump by 151 their defined values. 153 2.1. Referencing Shared Items 155 Shared items are stored in the third to last element of the array 156 used as tag content for tag number 6, numbered starting by 2. 158 The shared data items are referenced by using the data items in 159 Table 1. When reconstructing the original data item, such a 160 reference is replaced by the referenced data item, which is then 161 recursively unpacked. 163 +===========================+================+ 164 | reference | element number | 165 +===========================+================+ 166 | Simple value 0-15 | 2-17 | 167 +---------------------------+----------------+ 168 | Tag 6(unsigned integer N) | 18 + 2*N | 169 +---------------------------+----------------+ 170 | Tag 6(negative integer N) | 18 - 2*N - 1 | 171 +---------------------------+----------------+ 173 Table 1: Referencing Shared Values 175 Taking into account the encoding, there are 16 one-byte references, 176 48 two-byte references, 512 three-byte references, 131072 four-byte 177 references, etc. As integers can grow to very large (or small) 178 values, there is no practical limit to how many shared items might be 179 used in a Packed CBOR item. 181 2.2. Referencing Prefix Items 183 Shared items are stored in an array that is the second element of the 184 array used as tag content for tag number 6. This array is indexed 185 from 0. 187 Prefix data items are referenced by using the data items in Table 2. 188 When reconstructing the original data item, such a reference is 189 replaced by a string constructed from the referenced prefix data item 190 (prefix, which might need to be recursively unpacked first) 191 concatenated with the tag content (suffix, again possibly recursively 192 unpacked). The result gets the type of the suffix; this way a single 193 prefix can be used to build both byte and text strings, depending on 194 what type of suffix is being used. 196 +===================================+================+ 197 | reference | element number | 198 +===================================+================+ 199 | Tag 6(suffix) | 0 | 200 +-----------------------------------+----------------+ 201 | Tag 224-255(suffix) | 1-32 | 202 +-----------------------------------+----------------+ 203 | Tag 28672-32767(suffix) | 33-4128 | 204 +-----------------------------------+----------------+ 205 | Tag 1879048192-2147483647(suffix) | 4129-268439584 | 206 +-----------------------------------+----------------+ 208 Table 2: Referencing Prefix Values 210 Taking into account the encoding, there is one one-byte prefix 211 reference, 32 two-byte references, 4096 three-byte references, and 212 268435456 five-byte references. 268439585 213 (2^(28)+2^(12)+2^(5)+2^(0)) is an artificial limit, but should be 214 high enough that there, again, is no practical limit to how many 215 prefix items might be used in a Packed CBOR item. 217 3. Discussion 219 This specification uses up a large number of Simple Values and Tags, 220 in particular one of the rare one-byte tags and half of the one-byte 221 simple values. Since the objective is compression, this is warranted 222 if and only if there is consensus that this specific format could be 223 useful for a wide area of applications, while maintaining reasonable 224 simplicity in particular at the side of the consumer. 226 A maliciously crafted Packed CBOR data item might contain a reference 227 loop. A consumer/decompressor MUST protect against that. 229 The current definition does nothing to help with packing CBOR 230 sequences [RFC8742]; maybe it should. 232 Nesting packed CBOR data items is not useful; maybe it should. 234 4. IANA Considerations 236 In the registry [IANA.cbor-tags], IANA is requested to allocate the 237 tags defined in Table 3. 239 +===========+========+================+===========================+++ 240 | Tag | Data | Semantics | Reference ||| 241 | | Item | | ||| 242 +===========+========+================+===========================+++ 243 | 6 | array, | Packed CBOR: | draft-bormann-cbor-packed ||| 244 | |integer,| packed/shared/ | ||| 245 | | text | prefix | ||| 246 | |string, | | ||| 247 | | byte | | ||| 248 | | string | | ||| 249 +-----------+--------+----------------+---------------------------+++ 250 | 224-255 | text | Packed CBOR: | draft-bormann-cbor-packed ||| 251 | | string | prefix | ||| 252 | |or byte | | ||| 253 | | string | | ||| 254 +-----------+--------+----------------+---------------------------+++ 255 |28672-32767| text | Packed CBOR: | draft-bormann-cbor-packed ||| 256 | | string | prefix | ||| 257 | |or byte | | ||| 258 | | string | | ||| 259 +-----------+--------+----------------+---------------------------+++ 260 |1879048192-| text | Packed CBOR: | draft-bormann-cbor-packed ||| 261 | 2147483647| string | prefix | ||| 262 | |or byte | | ||| 263 | | string | | ||| 264 +-----------+--------+----------------+---------------------------+++ 266 Table 3: Values for Tag Numbers 268 In the registry [IANA.cbor-simple-values], IANA is requested to 269 allocate the simple values defined in Table 4. 271 +=======+=====================+===========================+ 272 | Value | Semantics | Reference | 273 +=======+=====================+===========================+ 274 | 0-15 | Packed CBOR: shared | draft-bormann-cbor-packed | 275 +-------+---------------------+---------------------------+ 277 Table 4: Simple Values 279 5. Security Considerations 281 The security considerations of RFC 7049 apply. 283 Loops in the Packed CBOR can be used as a denial of service attack, 284 see Section 3. 286 As the unpacking is deterministic, packed forms can be used as 287 signing inputs. (Note that if external dictionaries are added to 288 cbor-packed, this requires additional consideration.) 290 6. References 292 6.1. Normative References 294 [I-D.ietf-cbor-7049bis] 295 Bormann, C. and P. Hoffman, "Concise Binary Object 296 Representation (CBOR)", Work in Progress, Internet-Draft, 297 draft-ietf-cbor-7049bis-14, 16 June 2020, 298 . 301 [IANA.cbor-simple-values] 302 IANA, "Concise Binary Object Representation (CBOR) Simple 303 Values", 304 . 306 [IANA.cbor-tags] 307 IANA, "Concise Binary Object Representation (CBOR) Tags", 308 . 310 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 311 Requirement Levels", BCP 14, RFC 2119, 312 DOI 10.17487/RFC2119, March 1997, 313 . 315 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 316 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 317 October 2013, . 319 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 320 Definition Language (CDDL): A Notational Convention to 321 Express Concise Binary Object Representation (CBOR) and 322 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 323 June 2019, . 325 6.2. Informative References 327 [RFC7322] Flanagan, H. and S. Ginoza, "RFC Style Guide", RFC 7322, 328 DOI 10.17487/RFC7322, September 2014, 329 . 331 [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) 332 Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, 333 . 335 Appendix A. Example 337 The (JSON-compatible) CBOR data structure depicted in Figure 2, 400 338 bytes of binary CBOR, could lead to a packed CBOR data item depicted 339 in Figure 3, 307 bytes. Note that this example does not lend itself 340 to prefix compression. 342 { "store": { 343 "book": [ 344 { "category": "reference", 345 "author": "Nigel Rees", 346 "title": "Sayings of the Century", 347 "price": 8.95 348 }, 349 { "category": "fiction", 350 "author": "Evelyn Waugh", 351 "title": "Sword of Honour", 352 "price": 12.99 353 }, 354 { "category": "fiction", 355 "author": "Herman Melville", 356 "title": "Moby Dick", 357 "isbn": "0-553-21311-3", 358 "price": 8.99 359 }, 360 { "category": "fiction", 361 "author": "J. R. R. Tolkien", 362 "title": "The Lord of the Rings", 363 "isbn": "0-395-19395-8", 364 "price": 22.99 365 } 366 ], 367 "bicycle": { 368 "color": "red", 369 "price": 19.95 370 } 371 } 372 } 374 Figure 2: Example original CBOR data item 376 6([{"store": { 377 "book": [ 378 {simple(1): "reference", simple(2): "Nigel Rees", 379 simple(3): "Sayings of the Century", simple(0): simple(5)}, 380 {simple(1): simple(4), simple(2): "Evelyn Waugh", 381 simple(3): "Sword of Honour", simple(0): 12.99}, 382 {simple(1): simple(4), simple(2): "Herman Melville", 383 simple(3): "Moby Dick", simple(6): "0-553-21311-3", 384 simple(0): simple(5)}, 385 {simple(1): simple(4), simple(2): "J. R. R. Tolkien", 386 simple(3): "The Lord of the Rings", 387 simple(6): "0-395-19395-8", simple(0): 22.99}], 388 "bicycle": {"color": "red", simple(0): 19.95}}}, 389 [], 390 "price", "category", "author", "title", "fiction", 8.95, "isbn"]) 391 / 0 1 2 3 4 5 6 / 393 Figure 3: Example packed CBOR data item 395 TBD: Do this for a W3C Thing Description again to get better packing 396 and to exercise prefix compression... 398 Acknowledgements 400 CBOR packing was originally invented with the rest of CBOR, but did 401 not make it into [RFC7049]. Various attempts to come up with a 402 specification over the years didn't proceed. In 2017, Sebastian 403 Käbisch proposed investigating compact representations of W3C Thing 404 Descriptions, which prompted the author to come up with essentially 405 the present design. 407 Author's Address 409 Carsten Bormann 410 Universität Bremen TZI 411 Postfach 330440 412 D-28359 Bremen 413 Germany 415 Phone: +49-421-218-63921 416 Email: cabo@tzi.org