idnits 2.17.1 draft-ietf-cbor-file-magic-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (21 April 2021) is 1101 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'BCP14' is defined on line 265, but no explicit reference was found in the text == Outdated reference: A later version (-25) exists of draft-ietf-rats-eat-09 == Outdated reference: A later version (-24) exists of draft-ietf-sacm-coswid-17 -- Obsolete informational reference (is this intentional?): RFC 8152 (Obsoleted by RFC 9052, RFC 9053) Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CBOR Working Group M. Richardson 3 Internet-Draft Sandelman Software Works 4 Intended status: Best Current Practice 21 April 2021 5 Expires: 23 October 2021 7 On storing CBOR encoded items on stable storage 8 draft-ietf-cbor-file-magic-01 10 Abstract 12 This document proposes an on-disk format for CBOR objects that is 13 friendly to common on-disk recognition systems like the Unix file(1) 14 command. 16 This document is being discussed at: https://github.com/cbor-wg/cbor- 17 magic-number 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on 23 October 2021. 36 Copyright Notice 38 Copyright (c) 2021 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 43 license-info) in effect on the date of publication of this document. 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. Code Components 46 extracted from this document must include Simplified BSD License text 47 as described in Section 4.e of the Trust Legal Provisions and are 48 provided without warranty as described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Requirements for a Magic Number . . . . . . . . . . . . . . . 3 54 3. Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 4 55 3.1. The CBOR Protocol Specific Tag . . . . . . . . . . . . . 4 56 3.2. CBOR Tag Wrapped . . . . . . . . . . . . . . . . . . . . 4 57 3.3. CBOR Tag Sequence . . . . . . . . . . . . . . . . . . . . 4 58 4. Security Considerations . . . . . . . . . . . . . . . . . . . 5 59 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 60 5.1. CBOR Sequence Tag . . . . . . . . . . . . . . . . . . . . 5 61 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 6 62 7. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 6 63 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 8.1. Normative References . . . . . . . . . . . . . . . . . . 6 65 8.2. Informative References . . . . . . . . . . . . . . . . . 6 66 Appendix A. Example from Openswan . . . . . . . . . . . . . . . 7 67 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 8 68 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 70 1. Introduction 72 Since very early in computing, operating systems have sought ways to 73 mark which files could be processed by which programs. 75 For instance, the Unix file(1) command, which has existed since 1973 76 [file], has been able to identify many file formats for decades. 77 Many systems (Linux, MacOS, Windows) will select the correct 78 application based upon the file contents, if the system can not 79 determine it by other means: for instance, the classic MacOS 80 maintained a resource fork that includes media type ("MIME type") 81 information and therefore ideally never needs to know what anything 82 about the file. Other systems do this by file extensions. 84 While having a media type associated with the file is a better 85 solution in general, when files become disconnected from their type 86 information, such as when attempting to do forensics on a damaged 87 system, then being able to identify a file type can become very 88 important. 90 It is noted that in the media type registration, that a magic number 91 is asked for, if available, as is a file extension. 93 A challenge for the file(1) program is often that it can be confused 94 by the encoding vs the content. For instance, an Android "apk" used 95 to transfer and store an application may be identified as a ZIP file. 96 Both OpenOffice or MSOffice files are ZIP files of XML files. 97 (Unless OpenOffice files are flat (fodp) files, in which case they 98 may appear to be generic XML files.) 100 As CBOR becomes a more and more common encoding for a wide variety of 101 artifacts, identifying them as just "CBOR" is probably not 102 sufficient. This document provides a way to encode a magic number 103 into the beginning of a CBOR format file. Two options are presented: 104 typically a CBOR Protocol author will specify one. 106 A CBOR Protocol is a specification which uses CBOR as its encoding. 107 Examples of CBOR Protocols currently under development include CoSWID 108 [I-D.ietf-sacm-coswid], and EAT [I-D.ietf-rats-eat]. COSE itself 109 [RFC8152] is considered infrastructure, however the encoding of 110 public keys in CBOR as described in 111 [I-D.mattsson-cose-cbor-cert-compress] would be an identified CBOR 112 Protocol. 114 A major inspiration for this document is observing the mess in ASN.1 115 based systems where most files are PEM encoded, identified by the 116 extension "pem", confusing public keys, private keys, certificate 117 requests and SIME content. 119 These proposals are invasive to how CBOR protocols are written to 120 disk, but in both cases, the proposed envelope does not require that 121 the tag be transfered on the wire. 123 In addition to the on-disk identification aspects, there are some 124 protocols which may benefit from having such a magic number on the 125 wire if they presently using a different (legacy) encoding scheme. 126 The presence of the identifiable magic sequence signals that CBOR is 127 being used or a legacy scheme. 129 2. Requirements for a Magic Number 131 A magic number is ideally a unique fingerprint, present in the first 132 4 or 8 bytes of the file, which does not change when the contents 133 change, and does not depend upon the length of the file. 135 Less ideal solutions have a pattern that needs to be matched, but in 136 which some bytes need to be ignored. While the Unix file(1) command 137 can be told to ignore bytes, this can lead to ambiguities. 139 3. Protocol 141 There are two variations of this practice. Both use CBOR Tags in a 142 way that results in a deterministic first 8 to 12 bytes. 144 3.1. The CBOR Protocol Specific Tag 146 CBOR Protocol designers should obtain a tag for each major type of 147 object that they might store on disk. As there are more than 4 148 million available 4-byte tags, there should be little issue in 149 allocating a few to each available CBOR Protocol. 151 The policy is First Come First Served, so all that is required is an 152 email to IANA, having filled in the small template provided in 153 section 9.2 of [RFC8949]. 155 This tag should be allocated by the author of the CBOR Protocol, and 156 to be in the four-byte range, it should be at least 0x01000000 157 (decimal 16777216) in value. 159 The use of a sequence of four US-ASCII codes which are mnemonic to 160 the protocol is encouraged, but not required. 162 3.2. CBOR Tag Wrapped 164 This proposal starts with the Self-described CBOR tag, 55799, as 165 described in [RFC8949] section 3.4.6. 167 A second CBOR Tag is then allocated to describe the specific Protocol 168 involved, as described above. 170 This proposal wraps the CBOR value as tags usually do. Applications 171 that need to send the CBOR value across a constrained link may wish 172 to remove the two tags if the use is implicitly understood. This is 173 a decision of the CBOR Protocol specification. 175 3.3. CBOR Tag Sequence 177 This proposal makes use of CBOR Sequences as described in [RFC8742]. 179 This proposal consists of two tags and a constant string for a total 180 of 12 bytes. 182 1. The file shall start with the Self-described CBOR Sequence tag, 183 55800. 185 2. The file shall continue with a CBOR tag, from the First Come 186 First Served space, which uniquely identifies the CBOR Protocol. 187 The use of a four-byte tag is encouraged. 189 3. The three byte CBOR byte string containing 0x42_4F_52. When 190 encoded it shows up as "CBOR" 192 The first part identifies the file as being CBOR, and does so with 193 all the desirable properties explained in [RFC8949] section 3.4.6. 194 Specifically, it does not seem to conflict with any known file types, 195 and it is not valid Unicode in any Unicode encoding. 197 The second part identifies which CBOR Protocol is used, as described 198 above. 200 The third part is a constant value 0x43_42_4f_52, "CBOR". That is, 201 it the three byte sequence 0x42_4f_52 ("BOR"). This is the data item 202 that is tagged. 204 The actual CBOR Protocol value then follows as the next data item(s) 205 in the CBOR sequence, without a need for any further specific tag. 206 The use of a CBOR Sequence allows the application to trivially remove 207 the first item with the two tags. 209 This means that should a file be reviewed by a human (directly in an 210 editor, or in a hexdump display), it will include the string "CBOR" 211 prominently. This value is also included simply because the two tags 212 need to tag something. 214 4. Security Considerations 216 This document provides a way to identify CBOR Protocol objects. 217 Clearly identifying CBOR contents on disk may have a variety of 218 impacts. 220 The most obvious is that it may allow malware to identify interesting 221 objects on disk, and then corrupt them. 223 5. IANA Considerations 225 There are no IANA actions. This section documents the allocation 226 that was done. 228 5.1. CBOR Sequence Tag 230 IANA has allocated tag 55800 as the CBOR Sequence tag. This tag is 231 from the First Come/First Served area. 233 The value has been picked to have properties similiar to the 55799 234 tag. 236 The hexadecimal representation is: 0xd9_\d9_f8. 238 This is not valid UTF-8: the first 0xd9 puts the value into the 239 three-byte value of UTF-8, but the 0xd9 as the second value is not a 240 valid second byte for UTF-8. 242 This is not valid UTF-16: the byte sequence 0xd9d9 (in either endian 243 order), puts this value into the UTF-16 high-half zone, which would 244 signal that this a 32-bit Unicode value. However, the following 245 16-bit big-endian value 0xf8.. is not a valid second sequence 246 according to [RFC2781]. On a little-endian system, it would be 247 necessary to examine the fourth byte to determine if it is valid. 248 That next byte is determined by the subsequent encoding, and 249 [RFC8949] section 3.4.6 has already determined that no valid CBOR 250 encodings result in a valid UTF-16. 252 Data Item: byte string 253 Semantics: indicates that the file contains CBOR Sequences 255 6. Acknowledgements 257 The CBOR WG brainstormed this protocol on January 20, 2021. 259 7. Changelog 261 8. References 263 8.1. Normative References 265 [BCP14] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 266 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 267 May 2017, . 269 [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) 270 Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, 271 . 273 [RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object 274 Representation (CBOR)", STD 94, RFC 8949, 275 DOI 10.17487/RFC8949, December 2020, 276 . 278 8.2. Informative References 280 [file] Wikipedia, "file (command)", 20 January 2021, 281 . 283 [I-D.ietf-rats-eat] 284 Mandyam, G., Lundblade, L., Ballesteros, M., and J. 285 O'Donoghue, "The Entity Attestation Token (EAT)", Work in 286 Progress, Internet-Draft, draft-ietf-rats-eat-09, 7 March 287 2021, . 290 [I-D.ietf-sacm-coswid] 291 Birkholz, H., Fitzgerald-McKay, J., Schmidt, C., and D. 292 Waltermire, "Concise Software Identification Tags", Work 293 in Progress, Internet-Draft, draft-ietf-sacm-coswid-17, 22 294 February 2021, . 297 [I-D.mattsson-cose-cbor-cert-compress] 298 Raza, S., Höglund, J., Selander, G., Mattsson, J. P., and 299 M. Furuhed, "CBOR Encoded X.509 Certificates (C509 300 Certificates)", Work in Progress, Internet-Draft, draft- 301 mattsson-cose-cbor-cert-compress-08, 22 February 2021, 302 . 305 [ilbm] Wikipedia, "Interleaved BitMap", 20 January 2021, 306 . 308 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 309 10646", RFC 2781, DOI 10.17487/RFC2781, February 2000, 310 . 312 [RFC8152] Schaad, J., "CBOR Object Signing and Encryption (COSE)", 313 RFC 8152, DOI 10.17487/RFC8152, July 2017, 314 . 316 Appendix A. Example from Openswan 318 The Openswan IPsec project has a daemon ("pluto"), and two control 319 programs ("addconn", and "whack"). They communicate via a Unix- 320 domain socket, over which a C-structure containing pointers to 321 strings is serialized using a bespoke mechanism. This is normally 322 not a problem as the structure is compiled by the same compiler; but 323 when there are upgrades it is possible for the daemon and the control 324 programs to get out of sync by the bespoke serialization. As a 325 result, there are extra compensations to deal with shutting the 326 daemon down. During testing it is sometimes the case that upgrades 327 are backed out. 329 In addition, when doing unit testing, the easiest way to load policy 330 is to use the normal policy reading process, but that is not normally 331 loaded in the daemon. Instead the IPC that is normally sent across 332 the wire is compiled/serialized and placed in a file. The above 333 magic number is included in the file, and also on the IPC in order to 334 distinguish the "shutdown" command CBOR operation. 336 In order to reduce the problems due to serialization, the 337 serialization is being changed to CBOR. Additionally, this change 338 allows the IPC to be described by CDDL, and for any language that 339 encode to CBOR can be used. 341 IANA has allocated the tag 1330664270, or 0x4f_50_ 53_ 4e for this 342 purpose. As a result, each file and each IPC is prefixed with: 344 In diagnostic notation: ~~~~ 55800(1330664270(h'424F52')) ~~~~ 346 Or in hex: ~~~~ 00000000 d9 d9 f9 da 4f 50 53 4e 43 42 4f 347 52 |....OPSNCBOR| ~~~~ 349 Contributors 351 Carsten Bormann 353 Email: cabo@tzi.org 355 Josef 'Jeff' Sipek 357 Email: jeffpc@josefsipek.net 359 Author's Address 361 Michael Richardson 362 Sandelman Software Works 364 Email: mcr+ietf@sandelman.ca