CBOR
IETF 101 - London
Tuesday, Mar 20, 2018, 13:30-15:30
Chairs: Joe Hildebrand, Francesca Palombini
Recordings: https://play.conf.meetecho.com/Playout/?session=IETF101-CBOR-20180320-1330 or https://youtu.be/FrVinVcs-P0
  Minutes by Paul Hoffman, Christian Amsüss
  Nothing from the slides reproduced here

* Introduction [15'] : Chairs
  https://youtu.be/FrVinVcs-P0?t=5m55s
  Agenda bashing and status update
  CDDL had WGLC, that generated review comments to be discussed today. 7049bis: Implementation matrix updated (see wiki). array tags got reviews, ready for adoption when reviews addressed.


* CDDL, draft-ietf-cbor-cddl-02 Presented by Carsten Bormann
   https://youtu.be/FrVinVcs-P0?t=8m32s
  - Changes since IETF 100: cuts in maps introduced for one particular application. From regexp discussion there: Using XSD regexp that are still weird but not too much, and [something with unicode].
  - Created a "freezer" document for things that will not go into the main document
  - On map matching: CDDL has single concept of "groups" (for maps and arrays) that are grammars of types; describe linear languages. Linear properties are only used for arrays, and as maps are unordered (thus match anything), implementations need to be driven from grammar and not from text. That also creates some nondeterminism -- that hasn't hurt anywhere, but edge cases might come up.
  - Map validation (though "validation" not a term of CDDL): wildcard in matches could consume explicit matches [if the scribe understood correctly]. For that (and only that), cuts are introduced: once a fork is matched, the path is committed, and if the rest won't match, the whole thing won't. This is only meaningful on a match. That comes with limitations [...].
  - Are there only editorial issues left, or is anything technical still open?
  Jeffrey Yasskin: Semantics of group and cuts are not specified well
    Wants to know precisely what causes a map to match. We need a precise specification of the matching algorithm.
    Carsten: This is precise as it's a parse expression grammar, which is greedy. It becomes a problem when expressed in a nondeterministic[?] finite automaton.
  Jim Schaad: Wants to change the ordering.
    Match the most specific keys first, specific before "any"
    Jeffrey: Can't say one type is more specific than other types.
    Choice types can just overlap and not be more specific than each other.
    Jim: Anything is a value > constructed type > groups (Value is more specific than type is more specific than any.)
  Sean Leonard: I want to express that I am opposed to introducing "cuts" into CDDL v1.0. Cuts transform a context-free grammar into a context-sensitive one. An alternative, "subsets" or "constraints", was sketched out in Singapore. From an editorial point, this is introducing new matter at the last minute when it needs to be fleshed out more over the coming months, i.e., after we get this version of CDDL out. (So basically it is similar to Jeffrey Yaskin's point: not well specified and also trying to do too much.)
    Carsten: If we want colon to be a short cut for "^ =>", we need to do this now, can't be added later. And that is the meaning probably most spec writers will expect of ":". The alternative is to smearing the previous cases into the later more generic ones. This won't make concise specificatons.
      Jeffrey: useful to have ":"" syntax act as a cut, not sure we need the cut syntax. Just column is maybe easier to specify.
    Henk Birkholz: Exception is the behavior of cuts
      If we don't want a simple notation, we need to decide this soon
      Likes the last example; intuitive
    Carsten: The interesting proposal here is to only have ":" and not "^".
      It's weird to have a shortcut that you can't have in long form
    Sean: (refers to slide 15) So the point is that you want to say "4 is text only, all other uints are anything else", right? So the slide says "* unit" and that means "* all uints except 4", right? What happens when later you want to say 5 is only a byte string? You have to put ? 5 : at the top, but you can't put it at the top, you are supposed to be putting it at the bottom...
    Jim: if you append a colon-thing to the bottom, you're going to expect enforcement, but not checked/enforced.
    Carsten: By creating a sorting mechanism, this consideration could be handled. I don't like the sorting mechanism b/c spec writer can intend a sequence.
    Sean/Jim: Spec writer can intend a sequence, but extension writer can only append.
    Carsten: [...] You can give the first thing a name, and have that as an extension point, and have the wild card after the named.
    Sean: The way that "constraints" work is that you say, with appropriate syntax: * unit -> any FIRST. Then in the subsequent parts of the spec, you identify specific instances of *uint => any. For example: "when uint is 4: text". "When uint is 5: byte string". (this is discussed in draft-seantek-constrained-abnf, for ABNF.)
    Carsten: [...] There are several proposals. Some proposal to have the shortcut but not the long form, others not to do that at all (refer to slide 15). Take it from the list from there.
    Francesca: Yes.
  - Carsten: Next topic: operator precedence. Operator precedence is logical when it comes to groups and types. The same syntax in a map context is unfamiliar in a type choice after a [quantifier]. General changes in operator precedence would create annoyance in form of needing more parens and raising syntax errors if missing. We can add text to explain and encourage a style that doesn't contain surprising cases. Comments? Room: silent. 
    Hank: I see the necessity, but it violates the rule of not being noisy, and specs will be paren-laden after that, and move away from the being easy-to-read-and-write, but I see the point. 
    Paul: It's never too hard to read too many parenthesis. 
    Carsten: It doesn't need to be names, more common is to name the choice and than use that name. We can still make the recommendation w/o littering up specs -- but yes, we should check that.
  Carsten: Addressing Jim's review.
    @Dead code: should not lead to hard errors. A tool might still give warnings on that. It's generally undecidable, but often possible in realistic cases. 
    @generics: grammar says it, text doesn't, but should say it too. @precedence: there were errors.
    @unwrap grammar: found copy/paste error.
    @terminology: we should make it visible that there are CBOR and CDDL terms, and they never mix.
  Jeffrey: CDDL spec is written as a tutorial, not a spec
    Appendix C is a good start, but it should move there before becoming an RFC.
    Carsten: A sensible proposition -- which would need half a year.
      Jeffrey: can we speed that up with a pull-request style?
    Alexey: Is this just reordering?
      Jeffrey: More. For some cases, I don't even know the algorithm Carsten has in mind. For other, it's clear enough that I'd be capable of writing the spec, but it takes time.
      I could sketch something in a month, but getting the exact words would take longer.
    Francesca: The WG said it wants this out as soon as possible
    Sean: Let's get version 1 out now and run a more formal spec on next version if we feel it's necessary.
      Jim: Agrees
    Joe Hildebrand: Sees a ton a value for this, but wants something sooner
    Carsten: Wants a list of the items Jeffrey does not understand
    Jeffrey: Can't currently use this in web specs
    Joe: We know that there is still work to do, we expect a -bis
    Alexey: Is comfortable with this approach
  Alexey: When can you be done?
    Carsten: Before late May. Would like to get input from Jeffrey.
  Francesca: To the WG: keep checking the doc (see github for most recent update). Bring leftover points/issues to the mailing list. After the update, we'll see if we need another WGLC.

* CBOR specification, draft-ietf-cbor-7049bis-02 Presented by Carsten Bormann
  https://youtu.be/FrVinVcs-P0?t=51m52s
  - This is about taking this to standard level, learn from first 5 years but don't futz around. This is way beyond errata, but follows the definition of standard level (look it up).
  - Since -00: experience says making readers infer data model from spec is mistake. We now define "generic data model" with extension points.
  - Separation of integer and floating point types (as it has been used). That played back into key equivalence. Now there are environments that don't allow that separation easily -- and we can't fix that. Needs to be considered when writing a model, will need a bit of general guidance.
  Joe: In JSON RFC, if you use something like an integer that is >2^53, you'll have problems
    Carsten: We won't have that problem
    Joe: Nevermind, not an useful idea
  - On canonicalization (c14n): this was problematic btwn authors in original spec, but there are uses for it -- let's help those people. Careful: There is key equivalence that can come from the application level. Floats are problematic too. We want to encourage generic encoder writers to not ignore it when users ask for c14n. To help them: Provide recommendations (and that's all that is in the RFC). Those recommendation rules were leaky, and keyorder (often complained about by implementors). The key order will change to byte-wise lexical -- but we also keep the old one in (but not recommendation) so existing specs can still reference it. Too bad, sorry. We will want to be more specific in float normalization; we have 3 models, should we express a preference? Own preference: Prefer shortest encoding in all cases (For int, length info, strings, tag number, floating, bignum etc).
    Jim: Worries about things like bignums into ints
      Worries about loss of tagging
    Carsten agrees, but has questions about whether it matters
    Jim: Yes in the crypto world completely different things
    Paul: For example a counter that is supposed to be 128bits
    Carsten: You can represent that in 64 bits if you can
    Jim: TSA signature, 2 int.
    Jeffrey: 2 examples. From FIDO: AGL found software bugs based on processors getting the length wrong. Geo location extansion to web authen that use floating point, did not want short encoding for floating.
    Matt Miller: More in the cryptographic context, if uses as a counter, semantic of a counter but syntax a set of bits of determinate length. Catastrophic to decrypt. Proposal: if we propose shortest encoding you have to have very clear considerations that you have to be careful
      Carsten: Example of things that need to be constant size
    Joe: We have the option of saying "don't do canonicalization of any protocol, use bytes"
      Carsten: We can do both. In the crypto space, we have learned it is bad, but in other use spaces, it could be OK
    Thiago Macieira: What happens if you decode and reincode in a compliant program
      You may be making bignums in all implementations
      Does this mean that bignums become mandatory?
      Carsten: It might be an option for your decoder
        Another thing a decoder can do is *check* canonicalization.
        Does this answer the question?
      Thiago: It's an answer to the question, but I am not satisfied.
    Matt: An alternative nuclear option: point to a different document
      Sean: A separate document might be good
    Jeffrey: All the specs should specify canonical output for testing
      Also contrains encoders to what they can put out
      We don't have to state a preference, but for basic generic data model we can have one but not extended generic, and giving such a canonicalization a name is sufficient for other specs.
    Sean: To what extent is type and tag is assumed to be saved across encoding
      Jim: +1. Are you canonicalizing the data model or the CBOR structure
      Carsten: Yes
        It's important for parser speed that it can discard information that is immaterial to the data model. Whether that includes int/bignum is up to questio
    Joe: If something is in canonical form, and I re-generate canonical form, these are equal.
      Carsten: That's almost the definition of canonical.
    Paul: Suggests that we only list ideas, but no preferences
      We should give some information, "but you make your own rules, and you're gonna cry."
    Jeffrey: Generic canonical is bad thing, protocol specific canonicalizatiion is important
    Matt: CBOR has as strong idea of what types are, unlike JSON. 1 more vote to do not deal with canon in this doc
    Joe: A world with 20 CBOR canonicalizations would be much worse than where we are today
      ASN.1 are an anti-pattern for writing a parser
      1) Never canonicalize, it's evil
      2) Here is a canonicalization form in this doc
      3) There is a canonicalization form in another doc
    Sean: I guess the point is, to what extent is CBOR type & tag information supposed to be preserved when serializing between different implementations?
    Carsten: We can split the technical issues from the procedural issues.
    Jeffrey: The rest of the set is close to ready, but this is not, suggests different doc
    Alexey: Group needs to decide between "saying very little" and "strongly against it"
    Paul: For this doc we can say "it got took out for a reason". There might be a doc in the future
    Carsten: Splitting out might be the best way. 
    Thiago: Also yank of equivalence of keys
      Joe: Good point, we need to do analysis first
  - implementation matrix
  Jeffrey: There should be a better way to determine consensus to accept PR submitted to github
    Pull requests should be discussed on the list sooner
    Francesca: Reminder: important/big PR should go to the list
    Joe: we can be more aggressive on getting them included when we think the consensus has been reached
    Paul: Maybe wait three weeks after the end of discussion in
     the mailing list to include them
  Jeffrey: Chrome has two implementations that will get added

* Array Tag, draft-jroatch-cbor-tags-07, Presented by Carsten Bormann
  https://youtu.be/FrVinVcs-P0?t=1h35m56s
  2-byte space or 3-byte space
    Paul: 2-byte is fine
    Jim: 3-byte is fine and we might end up regretting 
    Sean: Weak +1 for 3-byte
  Other registration in IANA overlaps
    Alexey: We cannot stop them but maybe we can convince them
    Zach Shelby: Did something in CORE
      Also has a fast track
      Maybe have a separation of rules
  Will be adopted in the WG

* Wrap-up: Chairs
  https://youtu.be/FrVinVcs-P0?t=1h53m38s
  Joe: will be stepping down as CBOR chair