From: cabo@Informatik.Uni-Bremen.DE (Dr. Carsten Bormann)
Date: Tue Mar 4, 2003 07:07:59 Europe/Berlin
To: "Lars-Erik Jonsson (EAB)" <Lars-Erik.Jonsson@epl.ericsson.se>
Subject: Re: Notation questions
From Lars-Erik.Jonsson@epl.ericsson.se Mon Mar 3 17:25:37 2003
Date: Mon, 3 Mar 2003 17:18:38 +0100
From: "Lars-Erik Jonsson (EAB)" <Lars-Erik.Jonsson@epl.ericsson.se>
To: "Carsten Bormann (E-mail)" <cabo@tzi.org>
Subject: Notation questions
Intro:
For me, notation is about having a well defined way of
describing what to send in compressed headers. That
description would be a result of the characteristics of
the original headers, and the "compression principles"
used for the profile.
I think what we came up with is a little different -- it describes the
meaning of compressed headers in terms of the uncompressed headers.
Of course, you can leave the latter part out, it just becomes hard to
have any kind of formal semantics then. (I.e., you can still decide
whether a compressed header is syntactically correct -- but for a good
compression scheme this is true for almost any bit combination. But
it becomes hard to assign formal semantics to a parse tree if you
don't have a formal notation for the uncompressed headers.)
The notated information could then be used to draw
compressed header formats by hand, or one could
potentially use some automated means to generate compressed
header formats in a more structured way. Whether such formats
would be humanly readable or not would depend on the
mechanism used, but must in either case be documented in
some way in profile specifications.
This drawing mechanism might be a third operational mode of the
compiled notation programs (the other ones are compression and
decompression, but see below).
With EPIC, they had a notation, and two different automated
means to generate formats, called EPIC and EPIC-Lite, where
both generated a very large set of formats that could not
easily be documented. The header generation algorithms do
put limitations on how the formats are done, and imply a
number of specific ways of operating. An algorithm, such as
EPIC, thus constitutes a number of technical compression
decisions, things that are far from obvious how to do.
Which format to use when compressing a packet is further a
separate issue. I believe that header formats that are
generated by automatic means might provide simpler means
to create compressed headers, but these two issues (defining
compressed formats and actually compressing headers) are
still different parts of what we are discussing.
In an implementation, yes. In a formal notation, formally defining
the structure of the compressed packets without defining their
relationship to the uncompressed packets is not very useful (I'm
repeating myself). Once you describe this relationship formally, and
find a way to compile the formal notation, you can build a compressor
and a decompressor this way (modulo oracles, see below).
1) In Richards last e-mails, he seems to say that the notation
itself gives the compressed formats (bits on the wire), and
I do not buy that. How could that be possible?
I believe Richard's/Mark's idea is that the notation defines all the
bits in the compressed packet (its "body"), apart from an initial
discriminator. This discriminator is what differs between EPIC and
EPIC-lite. The discriminator carries all the information that is
needed to parse the body. The body carries information that is too
big to carry in the discriminator; essentially orthogonally encodable
information. The discriminator can use interesting tricks like EPIC
(Huffman on complete discriminators) or, if we want, arithmetic coding
(I don't know when those patents run out, though). The notation
supplies everything that is needed to go into the discriminator
computation (values, probabilities).
2) Richard is confusing compressing headers with defining the
candidate compressed header formats. Do you understand this?
Well, the compressed header formats are in some way related to the
uncompressed packet. The notation captures this relationship. The
notation can use an oracle function to be able to derive more than one
packet format based on the same uncompressed packet.
3) By using a functional language, exactly what do you mean
you would get by compiling the notated information?
That was one question that Richard raised. Right now, in his proposal
you get a program that can either compress or decompress, depending on
a switch thrown initially. I don't know yet how his oracles work; I
still have to read it up.
In my proposal, the Prolog program defines a relation (in the
mathematical sense) between uncompressed and compressed packets.
Since the relation can have multiple compressed packets for one
uncompressed packet, the oracle functionality is already built into
the model. (Obviously, it also can have multiple uncompressed packets
for one compressed packet, so you can write incorrect instances of the
notation. One useful property of a notation would be to allow
automatically catching this error -- this may be easier in Richard's
variant than in mine.)
In both cases, state (context) enters the picture. Even with that, it
is not always clear which packet format to use. The fun part is
leaving the hard decisions (and, preferably, the implementation
dependent decisions) to oracles.
4) Do you really believe that all the kind of stuff we had
in 3095 could be captured this way? Note that Roke Manor
themselves had problems to make the EPIC stuff operating in
anything but unidirectional mode. There is much to consider
when using feedback, and operating towards contexts.
Most definitely, the basic approach can. Whether you would *want* to,
I don't know. In other words, the notation we define (i.e., the
library of encoding rules) may favor packet formats that are different
from the ones we finally decided to use in 3095. This is of course
particularly true if we stick with a single discriminator for the
whole packet (I believe that is a mistake and you need some form of
cross-product generation of packet formats, which calls for separate
discriminators -- something my proposal does not even start to
address yet).