[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rohc] Formal notation, combined thread



> Hi all,
> 
> After last weeks discussions, I found it useful to let all new insights 
> come to rest before answering again. Below, I have tried to group the 
> issues, although they are of course all closely related to each other.
> 
> BR
> /L-E
> 
> 
> 1. Specification vs. implementation
> 
> We have to separate specification from implementation. Even if
> implementation is in the back of our heads when we specify our
> protocols, the specification is not about how to implement it.

Yes!  Ideally, the way we specify a profile should give
implementers the freedom to use any implementation
strategy they like, without being biased towards any one
strategy in particular.

> What we define for header compression are data structures
> (compressed headers), as well as compression rules for what to put
> in compressed headers and how to interpret what is sent. Even if
> you would implement a ROHC profile using a "notation compiler",
> what we do is to define the notation language, and in the profile
> specification we will use it as a reference (we do not have to
> redefine these for each case).
> 
> > RP: An "oracle" function is just a function which returns an
> > implementation-specific value.  Whenever the compressed header
> > formats offer a choice for implementers, the notation calls
> > an oracle function to find out what choice the implementer
> > wants to make.
> 
> The oracle functions is thus an implementation issue about
> choosing proper compressed formats, so we do not even have to
> mention these anywhere, right? We define the means to communicate
> and how to interpret what is communicated, but not what to say.

All that needs to be specified is the point at which an
implementation decision must be made.  The exact
choice of function to use in making the decision is
of course up to the implementer.

> > RP: From the notation's perspective the oracle function can be
> > called just like any other function (e.g. the CRC).  When the
> > notation is compiled, an implementation of each of these basic
> > functions must of course be provided.
> 
> I guess you mean "when a profile specified using the notation
> language is compiled...". The oracle functions is just an
> implementation issue, as it is part of "compress" (i.e. not
> specified). We must be careful and separate implementation from
> specification.

Yes.

> > > LEJ: Note that the notation itself would not be something that
> > > would be implementable.
> >
> > RP: I don't understand the above statement.  The notation is a
> > language, so it's possible to implement a compiler which converts
> > (say) a description of TCP/IP written in the notation into
> > executable code.
> 
> Yes, you might do a compiler FOR specifications based on the notation.
> 
> > > LEJ: Of course, the various mechanisms described by the notation
> > > could be implemented, but not the notation itself. The notation
> > > is not a machine, it is a language.
> >
> > RP: Yes, which is why it's possible to implement a compiler for
> > the notation.
> 
> Yes, and such a compiler could of course be a useful tool when
> implementing profiles defined using that notation. But that is only
> about implementation.
> 
> > > LEJ: The point is that, I believe the notation should be
> > > about.........notating, nothing else.
> >
> > RP: Absolutely.  Would it be possible to point out any specific
> > statements which seem to contradict the above (particularly if
> > they're in the notation draft itself)?
> 
> Yes, for example section 4.2 in the notation draft, which talks about
> "implementing the notation".

To clarify the above, we should instead say something like
"implementing a profile specified using the notation".

> 2. Field encoding vs. discriminator encoding
> 
> > > > RP: It shouldn't matter whether the function compresses a
> > > > single field, a whole protocol header, or even the entire
> > > > packet. Also, it shouldn't matter whether the function outputs
> > > > compressed bits that are considered to lie in the
> > > > "discriminator" or in the "compressed body" part of the
> > > > header.
> > >
> > > LEJ: Here I would strongly disagree. I rather think we should
> > > have a few different terms for the different kind of "encoding
> > > methods", since I think that would heavily simplify
> > > understanding, and make it easier to separate the various
> > > pieces.
> > 
> > RP: However, a single term is still needed for the concept of an
> > "encoding method" as a whole. 
> 
> Actually, I do not think we really need that. Encoding of fields is
> a rather different thing than "discriminator encoding" (or maybe
> "branch encoding"?), and we should make that clear in the notation.

In general it's rather tough to partition the set of encoding
methods into the above categories.  Some encoding methods
(e.g. LSB) are clearly field encodings, and others (e.g. EPIC)
are clearly discriminator encodings.  However, certain
encoding methods (e.g. list encoding) output both
compressed fields and discriminator bits, whilst others (e.g.
label) don't output any compressed bits at all.

> A good name for these would be really useful, as "self-describing
> values" as just another method does not say much about what it is
> or how different the method really is. 
> 
> > > LEJ: Methods to use for generating the actual compressed header
> > > formats, e.g. EPIC-lite, are rather different.
> >
> > RP: However, from the point of view of the notation language, is
> > there any reason to treat EPIC-lite differently from the other
> > encoding methods?
> 
> Yes, because there is a significant difference between these.
> Normal encoding methods are chosen based on how fields behave, 
> while it is a subjective profile specification decision to pick
> a "discriminator encoding method" for the profile. 

The statement that "normal encoding methods are chosen
based on how fields behave" is just as applicable to the
so-called discriminator encoding methods.  For example,
EPIC can calculate an optimally efficient set of
discriminator bits using only knowledge about how the
uncompressed header behaves.  So if compression
efficiency is the main goal then EPIC is the correct
discriminator encoding to use.

Of course, there may be a good reason *not* to use
EPIC as the discriminator encoding (e.g. IPR, complexity),
and the final decision is always up to the profile writers.
But this is exactly the same as for the "normal" encoding
methods!  For example, if we decide that LSB encoding
is too complicated for a certain profile then nothing
prevents us from using IRREGULAR encoding instead.

> > RP: The notation language itself shouldn't make a distinction
> > between EPIC-lite and any other encoding method.
> 
> As I have said above, I think it is important to make a clear
> distinction between these, as they are so different. That is also
> what we said last year, when we talked about field encoding vs.
> compressed header encoding.

As above, I can't see any reason to think of the
discriminator encodings as "special", or to treat them
differently in some way.  The distinction between the
various types of encoding method that we made last year
was only intended to clarify how the notation handles all
of the different compression techniques needed by ROHC
profiles.  From a technical perspective the notation has
never given special treatment to the discriminator
encodings, and I think that it would be confusing and
unnecessarily restrictive to do so.

> 3. Notating the content or the compressed formats
> 
> > RP: In my view, a ROHC-FN description is a complete specification
> > of "bits on the wire" (assuming of course that all of the basic
> > encoding methods have been defined).
> 
> This is not what we discussed last year when we started the 
> notation discussion and talked about field encoding versus
> encoding the complete compressed headers. The notation was said
> to capture only the first. I had not heard anything else so I 
> assumed this was still the case. That absolutely caused some
> confusion on my side.

I think that the confusion might arise because of the split
between the notation and the encoding methods
themselves.  The notation captures the choice of
encoding methods for compressing a particular header,
and for each encoding method it also captures all of the
parameters needed to specify the header behaviour.  For
example, with LSB encoding the notation must specify
two parameters: the number of LSBs and their offset.

However, on its own this information does *not* specify
any "bits on the wire".  We still need the definition of LSB
encoding itself - there is plenty of room for manoeuvre,
as we could choose e.g. to byte-swap the compressed bits,
invert them, XOR them with an arbitrary constant and so
on, all without affecting the efficiency of the encoding.

Exactly the same situation occurs with the discriminator
encodings.  The notation captures all of the needed
inputs (in this case the probabilities), but we only get
"bits on the wire" once we've defined the discriminator
encoding methods themselves.

> > RP: If the compressed header formats are created manually then the
> > ROHC-FN description captures the exact hand-crafted packet
> > formats - not just the content of the compressed headers, but the
> > hand-crafted "discriminator" bits as well.
> 
> OK, but then we would not even need the notation as a tool, if
> the actual formats (in written form) would just mean the same.

The notation provides a really useful benefit even when
describing hand-crafted compressed headers, which is the
extra implementation option of writing a "notation compiler"
to automatically convert a notated description of the
packet formats into running code.

Of course, not everyone will want to use this
implementation approach - some folks may still want to
implement a particular profile by hand.  But the key
advantage of the notation-based packet formats is that
both implementation options are equally valid - in contrast,
if we describe the packet formats by hand then we're
forced to implement them by hand, which is
unnecessarily restrictive.

> I think the notation should capture the field compression part and
> provide all details useful for arbitrary "discriminator encodings",
> as that would provide flexibility for profile specifications to use
> the same general notation independent of which discriminator method
> is then defined to be used for the profile.

We already have this flexibility, due to the way that the
discriminator encodings are currently defined.

What we notice is that a large class of discriminator
encodings (EPIC, EPIC-lite, arithmetic encoding etc.)
all require the same inputs - namely a set of probability
values whenever we offer a choice of more than one
encoding method.  To accommodate this, we define a
new encoding method called "pick" (denoted by the
symbol "|") whose purpose is to assign the probability
values to the different branches.  The "pick" encoding
method itself outputs no bits on the wire - all it does is
provide the inputs for the profile writer's choice of
discriminator encoding (EPIC, EPIC-lite etc.).

> > RP: The notation describes more than just the compressed header
> > content - it describes the exact packet formats (i.e. bits on the
> > wire). 
> 
> Now I get this, from the definition of "/". However, I do think the
> "/" method should not be totally defined, only have defined
> parameters, then the notated fields would look the same independent
> of which discriminator method is used for the profile. A profile
> might choose to use the current definition of "/", define it as
> "EPIC-Lite", any other method, or define the actual formats by hand.
> The generic "/" should have probability values as parameters, as
> that is essential information independent of how the
> discriminators/formats are defined (not for the simple per-branch
> bit mask as the current "/" uses, but for any sophisticated method).

We already have an encoding method which does
exactly this - namely the "pick" encoding method that
I mentioned above.  Note however that this is a
completely different encoding method from the
"self-describing values" encoding which offers a
choice of two branches and then outputs a 1-bit
indicator flag depending on which branch is chosen.

The two encoding methods work very differently.
"Self-describing values" encoding generates its own
bits on the wire, whereas "pick" just defines the
probability values of each branch, and leaves the
choice of bits on the wire up to a different encoding
method (e.g. EPIC).  Also, "self-describing values"
generates a local discriminator every time a branch
is chosen, whilst "pick" allows a global discriminator
to be used to indicate many different choices.

Since the notation offers both of the above encoding
methods, a profile writer has the flexibility to employ
either depending on the situation.  "Self-describing
values" are less efficient, but simpler (plus they can
help to reduce the memory requirements of a profile,
by reducing the overall number of packet formats).
It's even possible to use a combination of both
approaches within the same profile (just as is done
in RFC 3095).

> > RP: The probabilities aren't part of the notation - they're just
> > the parameters which EPIC-lite needs in order to calculate the
> > bits on the wire.  If the notation is describing hand-crafted
> > packet formats, then the probabilities aren't included (because
> > they're not needed to describe the packet formats).
> 
> As explained above, I would like to make the notation a formal way
> of defining what information to put in compressed headers, but not
> include defining the actual headers (although a profile might choose
> to define "/" formally to be used as you suggest). The difference is
> basically that the "/" should just be an unspecified "branch" or 
> "discriminator" method, which a profile specification defines the
> meaning of (might be current "/" definition, "EPIC-L", or something
> else).

The "unspecified discriminator" method is already
available to the notation, but we don't make it
mandatory - the simpler "self-describing values"
encoding is also available, and it's up to the profile
writers to choose which approach to adopt (or
perhaps use a combination of both).

A minor point - whilst it's certainly possible to
define the various encoding methods differently for
each profile, I'd rather avoid doing this as it will lead
to no end of confusion!  Thus, we should use different
symbols to denote the two approaches to creating
discriminator flags.  The symbol "/" always denotes
"self-describing values" encoding, where the
discriminator flag is created locally.  The symbol
"|" always denotes the so-called "unspecified
discriminator" encoding.  The latter approach
requires a further encoding method (e.g. EPIC)
to be applied to the whole header in order to get
the final bits on the wire - rather than changing the
definition of "|" on a per-profile basis, we should
just include our choice of discriminator encoding
separately as part of the notation (e.g. by writing
EPIC, EPIC-lite or whatever at the start of the
notated packet formats).

> 4. Discriminators
> 
> > > LEJ: I followed you up to "single discriminator", but there you
> > > lost me. Can you elaborate on this?
> 
> > RP: This last point is just talking about a specific feature of
> > EPIC-lite, which is that it always generates a single
> > discriminator for the *entire* packet (as opposed to RFC 3095,
> > where there is usually a discriminator to indicate the overall
> > packet format, as well as separate discriminators on the front
> > of each individual compressed protocol header).
> 
> Do you mean like the bit mask we have in Ext3, which is an additional
> discriminator for compressed header, on top of the "packet type"?

Yes.  RFC 3095 adopts a hybrid approach to packet
format design - some choices of encoding methods
are indicated using a "global" discriminator similar
to the one generated by EPIC-lite, but other choices
are indicated locally using a version of the "self-
describing values" encoding.

Both approaches have their advantages - the EPIC-
lite approach is always more efficient, but the self-
describing values approach is simpler and less
memory-intensive.  From the notation perspective
we should therefore make sure that both approaches
are supported, and leave it up to profile writers to
select the correct approach for a given situation.

> 5. Notation "completeness"
> 
> I think we agree that we can not know that the notation will be
> "complete" for all future needs, but it might have to be extended
> in the future to capture new things we might want to do. However,
> this is actually not a problem, but might even be a good thing,
> as after accepting this we do not have to worry about the future 
> at all. You can of course do a compiler for the notation (if that
> is a preferred choice when implementing a profile), but that
> compiler might have to be modified if new methods are defined and
> used by future profile specifications. The point is that since a
> "notation compiler" is just an implementation tool (in the same
> way as the notation itself is a specification tool), having to
> modify it when implementing new profiles should not be a concern.

Absolutely - ensuring that the notation is future-
proof is much less important than, say, for SigComp,
where we defined a proper virtual machine (as a
mandatory part of a SigComp implementation).
The notation is just a useful tool for profile
writers, and if it needs to be modified or
extended in the future then this is not a problem,
since we lose nothing compared with the hand-
written approach to profile design.

> 6. Human-readable vs machine-readable
> 
> > RP: In the notation draft we have a specific section devoted to
> > "list encoding methods" which handle lists of items.  However,
> > the notation language itself doesn't treat these encoding methods
> > differently from any others.
> 
> This is actually one of my main concerns. Everything is forced to
> become "just another encoding method", and that does not make the
> whole thing clearer. Just because it is possible to do, you do not
> necessary have to do it that way.

I think that it makes the notation much clearer if
we avoid trying to make any artificial distinctions
between the different types of encoding method.
When using the notation, if we want to compress
a field using LSB encoding then we write "LSB"
followed by the necessary parameters.  If we
want to compress a list of items then we write
"list" followed by the parameters needed by
list encoding.  How are the two cases different?

> > RP: The big advantage of making the adjustments within ROHC-FN
> > is that the description of the compressed packet formats remain
> > machine-readable
> 
> We are defining standards, which should be human-readable. There 
> are probably disagreements in this regard, but personally I think
> human-readability is more important than machine-readability.

I agree that human-readability is at least as
important as machine-readability.  However, in
my opinion a notated set of packet formats is
actually rather more human-readable than a set
of packet formats drawn out pictorially.  What I
dislike about the pictorial version of packet
formats is that the pictures themselves don't
include all of the necessary parameters for the
various different encoding methods (e.g. for LSB
encoding, the offset of the interpretation interval
is missing).  The remaining parameters must be
documented separately, which is confusing.

> 7. EPIC-Lite 
>  
> The EPIC and EPIC-Lite terms have been overloaded several times by
> now, and they actually do not say much about what they really do.
> If we want to use the "EPIC-Lite" method for "discriminator
> encoding" and potentially define it in the general notation
> document, we should come up with a name that tells us what it does. 
> 
> Any ideas?

We originally used the term "EPIC-lite" to mean
the entire notation-based solution, including
the notation itself and the complete library of
encoding methods.  The algorithm for generating
the discriminator bits is actually nothing more than
ordinary Huffman encoding, so perhaps we could
just call it that instead?

Regards,

Richard
_______________________________________________
Rohc mailing list
Rohc@ietf.org
https://www1.ietf.org/mailman/listinfo/rohc