idnits 2.17.1 draft-richardson-cbor-file-magic-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (21 January 2021) is 1190 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'BCP14' is defined on line 180, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 anima Working Group M. Richardson 3 Internet-Draft Sandelman Software Works 4 Intended status: Standards Track 21 January 2021 5 Expires: 25 July 2021 7 On storing CBOR encoded items on stable storage 8 draft-richardson-cbor-file-magic-01 10 Abstract 12 This document proposes an on-disk format for CBOR objects that is 13 friendly to common on-disk recognition systems like the Unix file(1) 14 command. 16 This document is being discussed at: https://github.com/mcr/cbor- 17 magic-number 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on 25 July 2021. 36 Copyright Notice 38 Copyright (c) 2021 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 43 license-info) in effect on the date of publication of this document. 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. Code Components 46 extracted from this document must include Simplified BSD License text 47 as described in Section 4.e of the Trust Legal Provisions and are 48 provided without warranty as described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Requirements for a Magic Number . . . . . . . . . . . . . . . 3 54 3. Protocol Proposal . . . . . . . . . . . . . . . . . . . . . . 3 55 4. Security Considerations . . . . . . . . . . . . . . . . . . . 4 56 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 4 57 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 4 58 7. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 8.1. Normative References . . . . . . . . . . . . . . . . . . 4 61 8.2. Informative References . . . . . . . . . . . . . . . . . 5 62 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 5 65 1. Introduction 67 Since very early in computing, operating systems have sought ways to 68 mark which files could be processed by which programs. 70 For instance, the Unix file(1) command, which has existed since 1973 71 ([file]), has been able to identify many file formats for decades. 72 Many systems (Linux, MacOS, Windows) will select the correct 73 application based upon the file contents, if the system can not 74 determine it by other means: for instsance, MacOS maintains a 75 resource fork that includes MIME information and therefore ideally 76 never needs to know what anything about the file. Other systems do 77 this by file extensions. 79 While having a MIME type associated with the file is a better 80 solution in general, when files become disconnected from their type 81 information, such as when attempting to do forensics on a damaged 82 system, then being able to identify a file type can become very 83 important. 85 It is noted that in the MIME type registration, that a magic number 86 is asked for, if available, as is a file extension. 88 A challenge for the file(1) program is often that it can be confused 89 by the encoding vs the content. For instance, an Android "apk" used 90 to transfer and store an application may be identified as a ZIP file. 91 Both OpenOffice or MSOffice files are XML files, but appear as ZIP, 92 unless they are flat files, in which case they appear to be generic 93 XML files. 95 As CBOR becomes a more and more common encoding for a wide variety of 96 artifacts, identifying them as CBOR is probably not useful. This 97 document provides a way to encode a magic number into the beginning 98 of a CBOR format file. Two options are presented, with the intention 99 of standardizing only one. 101 These proposals are invasive to how CBOR protocols are written to 102 disk, but in both cases, the proposed envelope does not require that 103 the tag be transfered on the wire. 105 In addition to the on-disk identification aspects, there are some 106 protocols which may benefit from having such a magic on the wire if 107 they presently using a different (legacy) encoding scheme. The 108 presence of the identifiable magic sequence signals that CBOR is 109 being used or a legacy scheme. 111 2. Requirements for a Magic Number 113 A magic number is ideally a unique fingerprint, present in the first 114 4 or 8 bytes of the file, which does not change when the content 115 change, and does not depend upon the length of the file. 117 Less ideal solutions have a pattern that needs to be matched, but in 118 which some bytes need to be ignored. While the Unix file(1) command 119 can be told to ignore bytes, this can lead to ambiguities. 121 3. Protocol Proposal 123 This proposal makes use of CBOR Sequences as described in [RFC8742]. 125 This proposal consists of two tags and a constant string for a total 126 of 12 bytes. 128 1. The file shall start with the Self-described CBOR tag, 55799, as 129 described in [RFC8949] section 3.4.6. 131 2. The file shall continue with a CBOR tag, from the First Come 132 First Served space, which uniquely identifies the CBOR Protocol. 133 The use of a four-byte tag is encouraged. 135 3. The three byte CBOR array containing 0x42_4F_52. When encoded it 136 shows up as "CBOR" 138 The first part identifies the file as being CBOR, and does so with 139 all the desirable properties explained in Specifically, it does not 140 seem to conflict with any known file types, and it is not valid 141 Unicode.[RFC8949] section 3.4.6. 143 The second part identifies which CBOR Protocol is used. CBOR 144 Protocol designers should obtain a tag for each major object that 145 they might store on disk. As there are more than 4 million available 146 4-byte tags, there should be issue in allocating a few to all 147 available CBOR Protocols. The policy is First Come First Served, so 148 all that is required is an email to IANA, having filled in the small 149 template provided in section 9.2 of [RFC8949]. 151 The third part is a constant value 0x43_42_4f_52, "CBOR". This means 152 that should a file be reviewed by a human (directly in an editor, or 153 in a hexdump display), it will include the string "CBOR" prominently. 154 The value is also included because the two tags need to tag 155 something. 157 4. Security Considerations 159 This document provides a way to identify CBOR Protocol objects. 160 Clearly identifying CBOR contents on disk may have a variety of 161 impacts. 163 The most obvious is that it may allow malware to identify interesting 164 objects on disk, and then corrupt them. 166 5. IANA Considerations 168 This document makes no new requests to IANA. 170 6. Acknowledgements 172 The CBOR WG brainstormed this protocol on January 20, 2021. 174 7. Changelog 176 8. References 178 8.1. Normative References 180 [BCP14] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 181 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 182 May 2017, . 184 [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) 185 Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, 186 . 188 [RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object 189 Representation (CBOR)", STD 94, RFC 8949, 190 DOI 10.17487/RFC8949, December 2020, 191 . 193 8.2. Informative References 195 [file] Wikipedia, "file (command)", 20 January 2021, 196 . 198 [ilbm] Wikipedia, "Interleaved BitMap", 20 January 2021, 199 . 201 Contributors 203 Author's Address 205 Michael Richardson 206 Sandelman Software Works 208 Email: mcr+ietf@sandelman.ca