idnits 2.17.1 draft-devault-bare-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (19 October 2020) is 1275 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force D. DeVault 3 Internet-Draft SourceHut 4 Intended status: Informational 19 October 2020 5 Expires: 22 April 2021 7 Binary Application Record Encoding (BARE) 8 draft-devault-bare-00 10 Abstract 12 The Binary Application Record Encoding (BARE) is a data format used 13 to represent application records for storage or transmission between 14 programs. BARE messages are concise and have a well-defined schema, 15 and implementations may be simple and broadly compatible. A schema 16 language is also provided to express message schemas out-of-band. 18 Comments 20 Comments are solicited and should be addressed to the mailing list at 21 ~sircmpwn/public-inbox@lists.sr.ht and/or the author(s). 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at https://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on 22 April 2021. 40 Copyright Notice 42 Copyright (c) 2020 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 47 license-info) in effect on the date of publication of this document. 48 Please review these documents carefully, as they describe your rights 49 and restrictions with respect to this document. Code Components 50 extracted from this document must include Simplified BSD License text 51 as described in Section 4.e of the Trust Legal Provisions and are 52 provided without warranty as described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Specification of the BARE Message Encoding . . . . . . . . . 3 59 2.1. Primitive Types . . . . . . . . . . . . . . . . . . . . . 3 60 2.2. Aggregate Types . . . . . . . . . . . . . . . . . . . . . 5 61 2.3. User-Defined Types . . . . . . . . . . . . . . . . . . . 6 62 2.4. Invariants . . . . . . . . . . . . . . . . . . . . . . . 6 63 3. BARE Schema Language Specification . . . . . . . . . . . . . 7 64 3.1. Lexical Analysis . . . . . . . . . . . . . . . . . . . . 7 65 3.2. ABNF Grammar . . . . . . . . . . . . . . . . . . . . . . 7 66 3.3. Semantic Elements . . . . . . . . . . . . . . . . . . . . 8 67 4. Application Considerations . . . . . . . . . . . . . . . . . 9 68 5. Future Considerations . . . . . . . . . . . . . . . . . . . . 9 69 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 70 7. Security Considerations . . . . . . . . . . . . . . . . . . . 10 71 8. Normative References . . . . . . . . . . . . . . . . . . . . 10 72 Appendix A. Example message schema . . . . . . . . . . . . . . . 11 73 Appendix B. Example Messages . . . . . . . . . . . . . . . . . . 13 74 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 14 76 1. Introduction 78 The purpose of the BARE message encoding, like hundreds of others, is 79 to encode application messages. The goals of such encodings vary 80 (leading to their proliferation); BARE's goals are the following: 82 * Concise messages 84 * A well-defined message schema 86 * Broad compatibility with programming environments 88 * Simplicity of implementation 90 This document specifies the BARE message encoding, as well as a 91 schema language which may be used to describe the layout of a BARE 92 message. The schema of a message must be agreed upon in advance by 93 each party exchanging a BARE message; message structure is not 94 encoded into the representation. The schema language is useful for 95 this purpose, but not required. 97 1.1. Terminology 99 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 100 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 101 document are to be interpreted as described in RFC 2119 [RFC2119]. 103 2. Specification of the BARE Message Encoding 105 A BARE message is a single value of a pre-defined type, which may be 106 of an aggregate type enclosing multiple values. Unless otherwise 107 specified there is no additional container or structure around the 108 value; it is encoded plainly. 110 A BARE message does not necessarily have a fixed length, but the 111 schema author may make a deliberate choice to constrain themselves to 112 types of well-defined lengths if this is desired. 114 The names for each type are provided to establish a vocabulary for 115 describing a BARE message schema out-of-band, by parties who plan to 116 exchange BARE messages. The type names used here are provided for 117 this informative purpose, but are more rigourously specified by the 118 schema language specification in Section 3. 120 2.1. Primitive Types 122 Primitive types represent exactly one value. 124 uint 125 An unsigned integer with a variable-length encoding. Each 126 octet of the encoded value has the most-significant bit set, 127 except for the last octet. The remaining bits are the 128 integer value in 7-bit groups, least-significant first. 130 The maximum precision of such a number is 64-bits. The 131 maximum length of an encoded uint is therefore 10 octets. 133 int 134 A signed integer with a variable-length encoding. Signed 135 integers are represented as uint using a "zig-zag" encoding: 136 positive values x are written as 2x + 0, negative values are 137 written as 2(^x) + 1. In other words, negative numbers are 138 complemented and whether to complement is encoded in bit 0. 140 The maximum precision of such a number is 64-bits. The 141 maximum length of an encoded int is therefore 10 octets. 143 u8, u16, u32, u64 144 Unsigned integers of a fixed precision, respectively 8, 16, 145 32, and 64 bits. They are encoded in little-endian (least 146 significant octet first). 148 i8, i16, i32, i64 149 Signed integers of a fixed precision, respectively 8, 16, 32, 150 and 64 bits. They are encoded in little-endian (least 151 significant octet first), with two's compliment notation. 153 f32, f64 154 Floating-point numbers represented with the IEEE 754 155 [IEEE.754.1985] binary32 and binary64 floating point number 156 formats. 158 bool 159 A boolean value, either true or false, encoded as a u8 type 160 with a value of one or zero, respectively representing true 161 or false. 163 If a value other than one or zero is found in the u8 164 representation of the bool, the message is considered 165 invalid, and the decoder SHOULD raise an error if it 166 encounters such a value. 168 enum 169 An unsigned integer value from a set of possible values 170 agreed upon in advance, encoded with the uint type. 172 An enum whose uint value is not a member of the values agreed 173 upon in advance is considered invalid, and the decoder SHOULD 174 raise an error if it encounters such a value. 176 Note that this makes the enum type unsuitable for 177 representing a several enum values which have been combined 178 with a bitwise OR operation. 180 string 181 A string of text. The length of the text in octets is 182 encoded first as a uint, followed by the text data 183 represented with the UTF-8 encoding [RFC3629]. 185 If the data is found to contain invalid UTF-8 sequences, it 186 is considered invalid, and the decoder SHOULD raise an error 187 if it encounters such a value. 189 data 190 Arbitrary data with a fixed "length" in octets, e.g. 191 data<16>. The data is encoded literally in the message, and 192 MUST NOT be greater than 18,446,744,073,709,551,615 octets in 193 length (the maximum value of a u64). 195 data 196 Arbitrary data of a variable length in octets. The length is 197 encoded first as a uint, followed by the data itself encoded 198 literally. 200 void 201 A type with zero length. It is not encoded into BARE 202 messages. 204 2.2. Aggregate Types 206 Aggregate types may store zero or more primitive or aggregate values. 208 optional 209 A value of "type" which may or may not be present, e.g. 210 optional. Represented as either a u8 with a value of 211 zero, indicating that the optional value is unset; or a u8 212 with a value of one, followed by the encoded data of the 213 optional type. 215 An optional value whose initial u8 is set to a number other 216 than zero or one is considered invalid, and the decoder 217 SHOULD raise an error if it encounters such a value. 219 [length]type 220 A list of "length" values of "type", e.g. [10]uint. The 221 length is not encoded into the message. The encoded values 222 of each member of the list are concatenated to form the 223 encoded list. 225 []type 226 A variable-length list of values of "type", e.g. []string. 227 The length of the list (in values) is encoded as a uint, 228 followed by the encoded values of each member of the list 229 concatenated. 231 map[type A]type B 232 An associative list of values of type B keyed by values of 233 type A, e.g. map[u32]string. The encoded representation of a 234 map begins with the number of key/value pairs as a uint, 235 followed by the encoded key/value pairs concatenated. Each 236 key/value pair is encoded as the encoded key concatenated 237 with the encoded value. 239 A message with repeated keys is considered invalid, and the 240 decoder SHOULD raise an error if it encounters such a value. 242 (type | type | ...) 243 A tagged union whose value may be one of any type from a set 244 of types, e.g. (int | uint | string). Each type in the set 245 is assigned a numeric identifier. The value is encoded as 246 the selected type's identifier represented with the uint 247 encoding, followed by the encoded value of that type. 249 A union with a tag value that does not have a corresponding 250 type assigned is considered invalid, and the decoder SHOULD 251 raise an error if it encounters such a value. 253 struct 254 A set of values of arbitrary types, concatenated in an order 255 agreed upon in advance. Each value is referred to as a 256 "field", and field has a name and type. 258 2.3. User-Defined Types 260 A user-defined type gives a name to another type. This creates a 261 distinct type whose representation is equivalent to the named type. 262 An arbitrary number of user-defined types may be used for the same 263 underlying type; each is distinct from the other. 265 2.4. Invariants 267 The following invariants are specified: 269 * Any type which is ultimately a void type (either directly or via a 270 user-defined type) MUST NOT be used as an optional type, struct 271 member, list member, map key, or map value. Void types may only 272 be used as members of the set of types in a tagged union. 274 * The lengths of fixed-length arrays and data types MUST be at least 275 one. 277 * Structs MUST have at least one field. 279 * Unions MUST have at least one type, and each type MUST NOT be 280 repeated. 282 * Map keys MUST be of a primitive type which is not data or 283 data. 285 * Each named value of an enum type MUST have a unique value. 287 3. BARE Schema Language Specification 289 The use of the schema language is optional. Implementations SHOULD 290 support decoding arbitrary BARE messages without a schema document, 291 by defining the schema in a manner which utilizes more native tools 292 available from the programming environment. 294 However, it may be useful to have a schema document for use with code 295 generation, documentation, or interoperability. A domain-specific 296 language is provided for this purpose. 298 3.1. Lexical Analysis 300 During lexical analysis, "#" is used for comments; if encountered, 301 the "#" character and any subsequent characters are discarded until a 302 line feed (%x0A) is found. 304 3.2. ABNF Grammar 306 The syntax of the schema language is provided here in Augmented 307 Backus-Naur form [RFC5234]. However, this grammar differs from 308 [RFC5234] in that strings are case-sensitive (e.g. "type" does not 309 match TypE). 311 schema = [WS] user-types [WS] 313 user-type = "type" WS user-type-name WS non-enum-type 314 user-type =/ "enum" WS user-type-name WS enum-type 315 user-types = user-type / (user-types WS user-type) 317 type = non-enum-type / enum-type 318 non-enum-type = primitive-type / aggregate-type / user-type-name 320 user-type-name = UPPER *(ALPHA / DIGIT) ; First letter is uppercase 322 primitive-type = "int" / "i8" / "i16" / "i32" / "i64" 323 primitive-type =/ "uint" / "u8" / "u16" / "u32" / "u64" 324 primitive-type =/ "f32" / "f64" 325 primitive-type =/ "bool" 326 primitive-type =/ "string" 327 primitive-type =/ "data" / ("data<" integer ">") 328 primitive-type =/ "void" 330 enum-type = "{" [WS] enum-values [WS] "}" 331 enum-values = enum-value / (enum-values WS enum-value) 332 enum-value = enum-value-name 333 enum-value =/ (enum-value-name [WS] "=" [WS] integer) 334 enum-value-name = UPPER *(UPPER / DIGIT / "_") 336 aggregate-type = optional-type 337 aggregate-type =/ array-type 338 aggregate-type =/ map-type 339 aggregate-type =/ union-type 340 aggregate-type =/ struct-type 342 optional-type = "optional<" type ">" 344 array-type = "[" [integer] "]" type 345 integer = 1*DIGIT 347 map-type = "map[" type "]" type 349 union-type = "(" union-members ")" 350 union-members = union-member 351 union-members =/ (union-members [WS] "|" [WS] union-member) 352 union-member = type [[WS] "=" [WS] integer] 354 struct-type = "{" [WS] fields [WS] "}" 355 fields = field / (fields WS field) 356 field = 1*ALPHA [WS] ":" [WS] type 358 UPPER = %x41-5A ; uppercase ASCII letters 359 ALPHA = %x41-5A / %x61-7A ; A-Z / a-z 360 DIGIT = %x30-39 ; 0-9 362 WS = 1*(%x0A / %x09 / " ") ; whitespace 364 See Appendix A for an example schema written in this language. 366 3.3. Semantic Elements 368 The names of fields and user-defined types are informational: they 369 are not represented in BARE messages. They may be used by code 370 generation tools to inform the generation of field and type names in 371 the native programming environment. 373 Enum values are also informational. Values without an integer token 374 are assigned automatically in the order that they appear, starting 375 from zero and incrementing for each subsequent unassigned value. If 376 a value is explicitly specified, automatic assignment continues from 377 that value plus one for subsequent enum values. 379 Union type members are assigned a tag in the order that they appear, 380 starting from zero and incrementing for each subsequent type. If a 381 tag value is explicitly specified, automatic assignment continues 382 from that value plus one for subsequent values. 384 4. Application Considerations 386 Message authors who wish to design a schema which is backwards- and 387 forwards-compatible with future messages are encouraged to use union 388 types for this purpose. New types may be appended to the members of 389 a union type while retaining backwards compatibility with older 390 message types. The choice to do this must be made from the first 391 message version-- moving a struct into a union _does not_ produce a 392 backwards-compatible message. 394 The following schema provides an example: 396 type Message (MessageV1 | MessageV2 | MessageV3) 398 type MessageV1 ... 400 type MessageV2 ... 402 type MessageV3 ... 404 An updated schema which adds a MessageV4 type would still be able to 405 decode versions 1, 2, and 3. 407 If a message version is later deprecated, it may be removed in a 408 manner compatible with future versions 2 and 3 if the initial tag is 409 specified explicitly. 411 type Message (MessageV2 = 1 | MessageV3) 413 5. Future Considerations 415 To ensure message compatibility between implementations and 416 backwards- and forwards-compatibility of messages, constraints on 417 vendor extensions are required. This specification is final, and new 418 types or extensions will not be added in the future. Implementors 419 MUST NOT define extensions to this specification. 421 To support the encoding of novel data structures, the implementor 422 SHOULD make use of user-defined types in combination with the data or 423 data types. 425 6. IANA Considerations 427 This memo includes no request to IANA. 429 7. Security Considerations 431 Message parsers are common vectors for security vulnerabilities. 432 BARE addresses this by making the message format as simple as 433 possible. However, the parser MUST be prepared to handle a number of 434 error cases when decoding untrusted messages, such as a union type 435 with an invalid tag, or an enum with an invalid value. Such errors 436 may also arise by mistake, for example when attempting to decode a 437 message with the wrong schema. 439 Support for data types of an arbitrary, message-defined length 440 (lists, maps, strings, etc) is commonly exploited to cause the 441 implementation to exhaust its resources while decoding a message. 442 However, legitimate use-cases for extremely large data types 443 (possibly larger than the system has the resources to store all at 444 once) do exist. The decoder MUST manage its resources accordingly, 445 and SHOULD provide the application a means of providing their own 446 decoder implementation for values which are expected to be large. 448 There is only one valid interpretation of a BARE message for a given 449 schema, and different decoders and encoders should be expected to 450 provide that interpretation. If an implementation has limitations 451 imposed from the programming environment (such as limits on numeric 452 precision), the implementor MUST document these limitations, and 453 prevent conflicting interpretations from causing undesired behavior. 455 8. Normative References 457 [IEEE.754.1985] 458 Institute of Electrical and Electronics Engineers, 459 "Standard for Binary Floating-Point Arithmetic", 460 IEEE Standard 754, August 1985. 462 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 463 Requirement Levels", BCP 14, RFC 2119, 464 DOI 10.17487/RFC2119, March 1997, 465 . 467 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 468 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 469 2003, . 471 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 472 Specifications: ABNF", STD 68, RFC 5234, 473 DOI 10.17487/RFC5234, January 2008, 474 . 476 Appendix A. Example message schema 478 The following is an example of a schema written in the BARE schema 479 language. 481 type PublicKey data<128> 482 type Time string # ISO 8601 484 enum Department { 485 ACCOUNTING 486 ADMINISTRATION 487 CUSTOMER_SERVICE 488 DEVELOPMENT 490 # Reserved for the CEO 491 JSMITH = 99 492 } 494 type Customer { 495 name: string 496 email: string 497 address: Address 498 orders: []{ 499 orderId: i64 500 quantity: i32 501 } 502 metadata: map[string]data 503 } 505 type Employee { 506 name: string 507 email: string 508 address: Address 509 department: Department 510 hireDate: Time 511 publicKey: optional 512 metadata: map[string]data 513 } 515 type TerminatedEmployee void 517 type Person (Customer | Employee | TerminatedEmployee) 519 type Address { 520 address: [4]string 521 city: string 522 state: string 523 country: string 524 } 526 Appendix B. Example Messages 528 Some basic example messages in hexadecimal are provided for the 529 schema specified in Appendix A. 531 A "Person" value of type "Customer" with the following values: 533 name James Smith 535 email jsmith@example.org 537 address 123 Main Street; Philadelphia; PA; United States 539 orders (1) orderId: 4242424242; quantity: 5 541 metadata (unset) 543 Encoded BARE message: 545 00 0b 4a 61 6d 65 73 20 53 6d 69 74 68 12 6a 73 546 6d 69 74 68 40 65 78 61 6d 70 6c 65 2e 6f 72 67 547 0b 31 32 33 20 4d 61 69 6e 20 53 74 00 00 00 0c 548 50 68 69 6c 61 64 65 6c 70 68 69 61 02 50 41 0d 549 55 6e 69 74 65 64 20 53 74 61 74 65 73 01 b2 41 550 de fc 00 00 00 00 05 00 00 00 00 552 A "Person" value of type "Employee" with the following values: 554 name Tiffany Doe 556 email tiffanyd@acme.corp 558 address 123 Main Street; Philadelphia; PA; United States 560 department ADMINISTRATION 562 hireDate 2020-06-21T21:18:05Z 564 publicKey (unset) 566 metadata (unset) 568 Encoded BARE message: 570 01 0b 54 69 66 66 61 6e 79 20 44 6f 65 12 74 69 571 66 66 61 6e 79 64 40 61 63 6d 65 2e 63 6f 72 70 572 0b 31 32 33 20 4d 61 69 6e 20 53 74 00 00 00 0c 573 50 68 69 6c 61 64 65 6c 70 68 69 61 02 50 41 0d 574 55 6e 69 74 65 64 20 53 74 61 74 65 73 01 19 32 575 30 32 30 2d 30 36 2d 32 31 54 32 31 3a 31 38 3a 576 30 35 2b 30 30 3a 30 30 00 00 578 A "Person" value of type "TerminatedEmployee": 580 Encoded BARE message: 582 02 584 Author's Address 586 Drew DeVault 587 SourceHut 588 454 E. Girard Ave #2R 589 Philadelphia, PA 19125 590 United States of America 592 Phone: +1 719 213 5473 593 Email: sir@cmpwn.com