idnits 2.17.1 draft-devault-bare-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (20 November 2020) is 1246 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force D. DeVault 3 Internet-Draft SourceHut 4 Intended status: Informational 20 November 2020 5 Expires: 24 May 2021 7 Binary Application Record Encoding (BARE) 8 draft-devault-bare-01 10 Abstract 12 The Binary Application Record Encoding (BARE) is a data format used 13 to represent application records for storage or transmission between 14 programs. BARE messages are concise and have a well-defined schema, 15 and implementations may be simple and broadly compatible. A schema 16 language is also provided to express message schemas out-of-band. 18 Comments 20 Comments are solicited and should be addressed to the mailing list at 21 ~sircmpwn/public-inbox@lists.sr.ht and/or the author(s). 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at https://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on 24 May 2021. 40 Copyright Notice 42 Copyright (c) 2020 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 47 license-info) in effect on the date of publication of this document. 48 Please review these documents carefully, as they describe your rights 49 and restrictions with respect to this document. Code Components 50 extracted from this document must include Simplified BSD License text 51 as described in Section 4.e of the Trust Legal Provisions and are 52 provided without warranty as described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 58 1.2. Use-cases . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2. Specification of the BARE Message Encoding . . . . . . . . . 3 60 2.1. Primitive Types . . . . . . . . . . . . . . . . . . . . . 4 61 2.2. Aggregate Types . . . . . . . . . . . . . . . . . . . . . 6 62 2.3. User-Defined Types . . . . . . . . . . . . . . . . . . . 7 63 2.4. Invariants . . . . . . . . . . . . . . . . . . . . . . . 7 64 3. BARE Schema Language Specification . . . . . . . . . . . . . 8 65 3.1. Lexical Analysis . . . . . . . . . . . . . . . . . . . . 8 66 3.2. ABNF Grammar . . . . . . . . . . . . . . . . . . . . . . 8 67 3.3. Semantic Elements . . . . . . . . . . . . . . . . . . . . 9 68 4. Application Considerations . . . . . . . . . . . . . . . . . 10 69 5. Future Considerations . . . . . . . . . . . . . . . . . . . . 10 70 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 71 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 72 8. Normative References . . . . . . . . . . . . . . . . . . . . 11 73 Appendix A. Example message schema . . . . . . . . . . . . . . . 12 74 Appendix B. Example Messages . . . . . . . . . . . . . . . . . . 14 75 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 77 1. Introduction 79 The purpose of the BARE message encoding, like hundreds of others, is 80 to encode application messages. The goals of such encodings vary 81 (leading to their proliferation); BARE's goals are the following: 83 * Concise messages 85 * A well-defined message schema 87 * Broad compatibility with programming environments 89 * Simplicity of implementation 91 This document specifies the BARE message encoding, as well as a 92 schema language which may be used to describe the layout of a BARE 93 message. The schema of a message must be agreed upon in advance by 94 each party exchanging a BARE message; message structure is not 95 encoded into the representation. The schema language is useful for 96 this purpose, but not required. 98 1.1. Terminology 100 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 101 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 102 document are to be interpreted as described in RFC 2119 [RFC2119]. 104 1.2. Use-cases 106 The goals of a concise, binary, strongly-typed, and broadly- 107 compatible structured message encoding format support a broad number 108 of use-cases. Examples include: 110 * Self-describing authentication tokens for web services 112 * Opaque messages for transmitting arbitrary state between unrelated 113 internet services 115 * A representation for packets in an internet protocol 117 * A structured data format for encrypted or signed application 118 messages 120 * A structured data format for storing data in persistent storage 122 The conciseness of a BARE-encoded message enables representing 123 structured data under strict limitations on message length in a large 124 variety of contexts. The simple binary format may also be easily 125 paired with additional tools, such as plain-text encodings, 126 compression, or cryptography algorithms, as demanded by the 127 application's needs, without increasing the complexity of the message 128 encoding. A BARE message has a comparible size and entropy to the 129 underlying state it represents. 131 The BARE schema language also provides a means of describing the 132 format of BARE messages without implementation-specific details. 133 This encourages applications which utilize BARE to describe their 134 state in a manner which other programmers can easily utilize for 135 application inter-operation. The conservative set of primitives 136 offered by BARE aids in making such new implementations easy to 137 write. 139 2. Specification of the BARE Message Encoding 141 A BARE message is a single value of a pre-defined type, which may be 142 of an aggregate type enclosing multiple values. Unless otherwise 143 specified there is no additional container or structure around the 144 value; it is encoded plainly. 146 A BARE message does not necessarily have a fixed length, but the 147 schema author may make a deliberate choice to constrain themselves to 148 types of well-defined lengths if this is desired. 150 The names for each type are provided to establish a vocabulary for 151 describing a BARE message schema out-of-band, by parties who plan to 152 exchange BARE messages. The type names used here are provided for 153 this informative purpose, but are more rigourously specified by the 154 schema language specification in Section 3. 156 2.1. Primitive Types 158 Primitive types represent exactly one value. 160 uint 161 An unsigned integer with a variable-length encoding. Each 162 octet of the encoded value has the most-significant bit set, 163 except for the last octet. The remaining bits are the 164 integer value in 7-bit groups, least-significant first. 166 The maximum precision of such a number is 64-bits. The 167 maximum length of an encoded uint is therefore 10 octets. 169 Numbers which require all ten octets will have 6 bits in the 170 final octet which do not have meaning, between the least- and 171 most-significant bits. The implementation MUST set these to 172 zero. 174 int 175 A signed integer with a variable-length encoding. Signed 176 integers are represented as uint using a "zig-zag" encoding: 177 positive values x are written as 2x + 0, negative values are 178 written as 2(^x) + 1. In other words, negative numbers are 179 complemented and whether to complement is encoded in bit 0. 181 The maximum precision of such a number is 64-bits. The 182 maximum length of an encoded int is therefore 10 octets. 184 Numbers which require all ten octets will have 6 bits in the 185 final octet which do not have meaning, between the least- and 186 most-significant bits. The implementation MUST set these to 187 zero. 189 u8, u16, u32, u64 190 Unsigned integers of a fixed precision, respectively 8, 16, 191 32, and 64 bits. They are encoded in little-endian (least 192 significant octet first). 194 i8, i16, i32, i64 195 Signed integers of a fixed precision, respectively 8, 16, 32, 196 and 64 bits. They are encoded in little-endian (least 197 significant octet first), with two's compliment notation. 199 f32, f64 200 Floating-point numbers represented with the IEEE 754 201 [IEEE.754.1985] binary32 and binary64 floating point number 202 formats. 204 The encoder MUST NOT encode NaN into a BARE message, and the 205 decoder SHOULD raise an error if it encounters such a value. 207 bool 208 A boolean value, either true or false, encoded as a u8 type 209 with a value of one or zero, respectively representing true 210 or false. 212 If a value other than one or zero is found in the u8 213 representation of the bool, the message is considered 214 invalid, and the decoder SHOULD raise an error if it 215 encounters such a value. 217 enum 218 An unsigned integer value from a set of possible values 219 agreed upon in advance, encoded with the uint type. 221 An enum whose uint value is not a member of the values agreed 222 upon in advance is considered invalid, and the decoder SHOULD 223 raise an error if it encounters such a value. 225 Note that this makes the enum type unsuitable for 226 representing a several enum values which have been combined 227 with a bitwise OR operation. 229 string 230 A string of text. The length of the text in octets is 231 encoded first as a uint, followed by the text data 232 represented with the UTF-8 encoding [RFC3629]. 234 If the data is found to contain invalid UTF-8 sequences, it 235 is considered invalid, and the decoder SHOULD raise an error 236 if it encounters such a value. 238 data 239 Arbitrary data with a fixed "length" in octets, e.g. 240 data<16>. The data is encoded literally in the message, and 241 MUST NOT be greater than 18,446,744,073,709,551,615 octets in 242 length (the maximum value of a u64). 244 data 245 Arbitrary data of a variable length in octets. The length is 246 encoded first as a uint, followed by the data itself encoded 247 literally. 249 void 250 A type with zero length. It is not encoded into BARE 251 messages. 253 2.2. Aggregate Types 255 Aggregate types may store zero or more primitive or aggregate values. 257 optional 258 A value of "type" which may or may not be present, e.g. 259 optional. Represented as either a u8 with a value of 260 zero, indicating that the optional value is unset; or a u8 261 with a value of one, followed by the encoded data of the 262 optional type. 264 An optional value whose initial u8 is set to a number other 265 than zero or one is considered invalid, and the decoder 266 SHOULD raise an error if it encounters such a value. 268 [length]type 269 A list of "length" values of "type", e.g. [10]uint. The 270 length is not encoded into the message. The encoded values 271 of each member of the list are concatenated to form the 272 encoded list. 274 []type 275 A variable-length list of values of "type", e.g. []string. 276 The length of the list (in values) is encoded as a uint, 277 followed by the encoded values of each member of the list 278 concatenated. 280 map[type A]type B 281 An mapping of values of type B keyed by values of type A, 282 e.g. map[u32]string. The encoded representation of a map 283 begins with the number of key/value pairs as a uint, followed 284 by the encoded key/value pairs concatenated. Each key/value 285 pair is encoded as the encoded key concatenated with the 286 encoded value. 288 A message with repeated keys is considered invalid, and the 289 decoder SHOULD raise an error if it encounters such a value. 291 (type | type | ...) 292 A tagged union whose value may be one of any type from a set 293 of types, e.g. (int | uint | string). Each type in the set 294 is assigned a numeric identifier. The value is encoded as 295 the selected type's identifier represented with the uint 296 encoding, followed by the encoded value of that type. 298 A union with a tag value that does not have a corresponding 299 type assigned is considered invalid, and the decoder SHOULD 300 raise an error if it encounters such a value. 302 struct 303 A set of values of arbitrary types, concatenated in an order 304 agreed upon in advance. Each value is referred to as a 305 "field", and field has a name and type. 307 2.3. User-Defined Types 309 A user-defined type gives a name to another type. This creates a 310 distinct type whose representation is equivalent to the named type. 311 An arbitrary number of user-defined types may be used for the same 312 underlying type; each is distinct from the other. 314 2.4. Invariants 316 The following invariants are specified: 318 * Any type which is ultimately a void type (either directly or via a 319 user-defined type) MUST NOT be used as an optional type, struct 320 member, list member, map key, or map value. Void types may only 321 be used as members of the set of types in a tagged union. 323 * The lengths of fixed-length arrays and data types MUST be at least 324 one. 326 * Structs MUST have at least one field. 328 * Unions MUST have at least one type, and each type MUST NOT be 329 repeated. 331 * Map keys MUST be of a primitive type which is not data or 332 data. 334 * Each named value of an enum type MUST have a unique value. 336 3. BARE Schema Language Specification 338 The use of the schema language is optional. Implementations SHOULD 339 support decoding arbitrary BARE messages without a schema document, 340 by defining the schema in a manner which utilizes more native tools 341 available from the programming environment. 343 However, it may be useful to have a schema document for use with code 344 generation, documentation, or interoperability. A domain-specific 345 language is provided for this purpose. 347 3.1. Lexical Analysis 349 During lexical analysis, "#" is used for comments; if encountered, 350 the "#" character and any subsequent characters are discarded until a 351 line feed (%x0A) is found. 353 3.2. ABNF Grammar 355 The syntax of the schema language is provided here in Augmented 356 Backus-Naur form [RFC5234]. However, this grammar differs from 357 [RFC5234] in that strings are case-sensitive (e.g. "type" does not 358 match TypE). 360 schema = [WS] user-types [WS] 362 user-type = "type" WS user-type-name WS non-enum-type 363 user-type =/ "enum" WS user-type-name WS enum-type 364 user-types = user-type / (user-types WS user-type) 366 type = non-enum-type / enum-type 367 non-enum-type = primitive-type / aggregate-type / user-type-name 369 user-type-name = UPPER *(ALPHA / DIGIT) ; First letter is uppercase 371 primitive-type = "int" / "i8" / "i16" / "i32" / "i64" 372 primitive-type =/ "uint" / "u8" / "u16" / "u32" / "u64" 373 primitive-type =/ "f32" / "f64" 374 primitive-type =/ "bool" 375 primitive-type =/ "string" 376 primitive-type =/ "data" / ("data<" integer ">") 377 primitive-type =/ "void" 379 enum-type = "{" [WS] enum-values [WS] "}" 380 enum-values = enum-value / (enum-values WS enum-value) 381 enum-value = enum-value-name 382 enum-value =/ (enum-value-name [WS] "=" [WS] integer) 383 enum-value-name = UPPER *(UPPER / DIGIT / "_") 385 aggregate-type = optional-type 386 aggregate-type =/ array-type 387 aggregate-type =/ map-type 388 aggregate-type =/ union-type 389 aggregate-type =/ struct-type 391 optional-type = "optional<" type ">" 393 array-type = "[" [integer] "]" type 394 integer = 1*DIGIT 396 map-type = "map[" type "]" type 398 union-type = "(" union-members ")" 399 union-members = union-member 400 union-members =/ (union-members [WS] "|" [WS] union-member) 401 union-member = type [[WS] "=" [WS] integer] 403 struct-type = "{" [WS] fields [WS] "}" 404 fields = field / (fields WS field) 405 field = 1*ALPHA [WS] ":" [WS] type 407 UPPER = %x41-5A ; uppercase ASCII letters 408 ALPHA = %x41-5A / %x61-7A ; A-Z / a-z 409 DIGIT = %x30-39 ; 0-9 411 WS = 1*(%x0A / %x09 / " ") ; whitespace 413 See Appendix A for an example schema written in this language. 415 3.3. Semantic Elements 417 The names of fields and user-defined types are informational: they 418 are not represented in BARE messages. They may be used by code 419 generation tools to inform the generation of field and type names in 420 the native programming environment. 422 Enum values are also informational. Values without an integer token 423 are assigned automatically in the order that they appear, starting 424 from zero and incrementing for each subsequent unassigned value. If 425 a value is explicitly specified, automatic assignment continues from 426 that value plus one for subsequent enum values. 428 Union type members are assigned a tag in the order that they appear, 429 starting from zero and incrementing for each subsequent type. If a 430 tag value is explicitly specified, automatic assignment continues 431 from that value plus one for subsequent values. 433 4. Application Considerations 435 Message authors who wish to design a schema which is backwards- and 436 forwards-compatible with future messages are encouraged to use union 437 types for this purpose. New types may be appended to the members of 438 a union type while retaining backwards compatibility with older 439 message types. The choice to do this must be made from the first 440 message version-- moving a struct into a union _does not_ produce a 441 backwards-compatible message. 443 The following schema provides an example: 445 type Message (MessageV1 | MessageV2 | MessageV3) 447 type MessageV1 ... 449 type MessageV2 ... 451 type MessageV3 ... 453 An updated schema which adds a MessageV4 type would still be able to 454 decode versions 1, 2, and 3. 456 If a message version is later deprecated, it may be removed in a 457 manner compatible with future versions 2 and 3 if the initial tag is 458 specified explicitly. 460 type Message (MessageV2 = 1 | MessageV3) 462 5. Future Considerations 464 To ensure message compatibility between implementations and 465 backwards- and forwards-compatibility of messages, constraints on 466 vendor extensions are required. This specification is final, and new 467 types or extensions will not be added in the future. Implementors 468 MUST NOT define extensions to this specification. 470 To support the encoding of novel data structures, the implementor 471 SHOULD make use of user-defined types in combination with the data or 472 data types. 474 6. IANA Considerations 476 This memo includes no request to IANA. 478 7. Security Considerations 480 Message parsers are common vectors for security vulnerabilities. 481 BARE addresses this by making the message format as simple as 482 possible. However, the parser MUST be prepared to handle a number of 483 error cases when decoding untrusted messages, such as a union type 484 with an invalid tag, or an enum with an invalid value. Such errors 485 may also arise by mistake, for example when attempting to decode a 486 message with the wrong schema. 488 Support for data types of an arbitrary, message-defined length 489 (lists, maps, strings, etc) is commonly exploited to cause the 490 implementation to exhaust its resources while decoding a message. 491 However, legitimate use-cases for extremely large data types 492 (possibly larger than the system has the resources to store all at 493 once) do exist. The decoder MUST manage its resources accordingly, 494 and SHOULD provide the application a means of providing their own 495 decoder implementation for values which are expected to be large. 497 There is only one valid interpretation of a BARE message for a given 498 schema, and different decoders and encoders should be expected to 499 provide that interpretation. If an implementation has limitations 500 imposed from the programming environment (such as limits on numeric 501 precision), the implementor MUST document these limitations, and 502 prevent conflicting interpretations from causing undesired behavior. 504 8. Normative References 506 [IEEE.754.1985] 507 Institute of Electrical and Electronics Engineers, 508 "Standard for Binary Floating-Point Arithmetic", 509 IEEE Standard 754, August 1985. 511 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 512 Requirement Levels", BCP 14, RFC 2119, 513 DOI 10.17487/RFC2119, March 1997, 514 . 516 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 517 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 518 2003, . 520 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 521 Specifications: ABNF", STD 68, RFC 5234, 522 DOI 10.17487/RFC5234, January 2008, 523 . 525 Appendix A. Example message schema 527 The following is an example of a schema written in the BARE schema 528 language. 530 type PublicKey data<128> 531 type Time string # ISO 8601 533 enum Department { 534 ACCOUNTING 535 ADMINISTRATION 536 CUSTOMER_SERVICE 537 DEVELOPMENT 539 # Reserved for the CEO 540 JSMITH = 99 541 } 543 type Customer { 544 name: string 545 email: string 546 address: Address 547 orders: []{ 548 orderId: i64 549 quantity: i32 550 } 551 metadata: map[string]data 552 } 554 type Employee { 555 name: string 556 email: string 557 address: Address 558 department: Department 559 hireDate: Time 560 publicKey: optional 561 metadata: map[string]data 562 } 564 type TerminatedEmployee void 566 type Person (Customer | Employee | TerminatedEmployee) 568 type Address { 569 address: [4]string 570 city: string 571 state: string 572 country: string 573 } 575 Appendix B. Example Messages 577 Some basic example messages in hexadecimal are provided for the 578 schema specified in Appendix A. 580 A "Person" value of type "Customer" with the following values: 582 name James Smith 584 email jsmith@example.org 586 address 123 Main Street; Philadelphia; PA; United States 588 orders (1) orderId: 4242424242; quantity: 5 590 metadata (unset) 592 Encoded BARE message: 594 00 0b 4a 61 6d 65 73 20 53 6d 69 74 68 12 6a 73 595 6d 69 74 68 40 65 78 61 6d 70 6c 65 2e 6f 72 67 596 0b 31 32 33 20 4d 61 69 6e 20 53 74 00 00 00 0c 597 50 68 69 6c 61 64 65 6c 70 68 69 61 02 50 41 0d 598 55 6e 69 74 65 64 20 53 74 61 74 65 73 01 b2 41 599 de fc 00 00 00 00 05 00 00 00 00 601 A "Person" value of type "Employee" with the following values: 603 name Tiffany Doe 605 email tiffanyd@acme.corp 607 address 123 Main Street; Philadelphia; PA; United States 609 department ADMINISTRATION 611 hireDate 2020-06-21T21:18:05Z 613 publicKey (unset) 615 metadata (unset) 617 Encoded BARE message: 619 01 0b 54 69 66 66 61 6e 79 20 44 6f 65 12 74 69 620 66 66 61 6e 79 64 40 61 63 6d 65 2e 63 6f 72 70 621 0b 31 32 33 20 4d 61 69 6e 20 53 74 00 00 00 0c 622 50 68 69 6c 61 64 65 6c 70 68 69 61 02 50 41 0d 623 55 6e 69 74 65 64 20 53 74 61 74 65 73 01 19 32 624 30 32 30 2d 30 36 2d 32 31 54 32 31 3a 31 38 3a 625 30 35 2b 30 30 3a 30 30 00 00 627 A "Person" value of type "TerminatedEmployee": 629 Encoded BARE message: 631 02 633 Author's Address 635 Drew DeVault 636 SourceHut 637 454 E. Girard Ave #2R 638 Philadelphia, PA 19125 639 United States of America 641 Phone: +1 719 213 5473 642 Email: sir@cmpwn.com