[XCON] On encodings
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[XCON] On encodings
Since we're on the perennially fun topic of protocol encodings, maybe
we can get past the stereotypes and try to do an engineering
comparison, or at least an attempt at an approximation. Roughly
speaking, an XML element is equivalent in functionality to a TLV
object: it contains a label (element tag and the 'type' field),
length or delimiter fields to figure out the boundaries, and the
actual data (value). ASN.1 BER is close enough to TLV for this apples-
to-pears comparison.
For TLV, you have a type value of 16 to 32 bits, depending on the
predictions on the need to extend the protocol. For XML, you have
<tag </tag>, thus, 4 bytes plus whatever length you make the tag
from 'X' to
'WeLikeToPutTheWholeRFCIntoTheTagSoThatTheImplementerDoesntHaveToReadThe
RFC'.
For length, this is already counted above for XML; for TLV, it would
presumably be again 16 or 32 bits.
Thus, the basic overhead is fairly similar if you choose small
element names. Since numeric labels aren't self-describing at all,
this only seems like a fair comparison.
The big difference is in the datatypes. Clearly, if you have long
binary opaque data, base64 encoding typical for XML costs you about 30%.
Integers and floats are trickier: Unless you believe in ASN.1-style
variable-length encoding, it is quite possible that in many practical
cases, XML integers will be shorter since TLV always has to allocate
the maximum range, presumably 32 bits for integers and 32 (single
precision) or 64 bits for floats. You can build custom fixed-point
types in binary encodings, but they are a pain to get right (All
kinds of funny things happen because of signed vs. unsigned issues,
for example. Java programmers will hate you if you try this...)
Strings are obviously the same size in either TLV or XML, except for
the use of delimiters (such as ") or length fields, but that's a wash
at roughly 2 bytes each.
The one big difference for XML is the namespace declaration. The
impact of that depends on the number of namespaces and the size of
the overall document. This is also hard to compare since they give
you proprietary extensions without the kludges like vendor IDs that
binary protocols have to go through (unless they choose similar Java-
style or domain-style tags). Indeed, for namespace-style extensions
that aren't IANA registered, TLV is usually a pain. You end up with
either per-TLV long labels or re-invent the XML-style indirection
approach.
This is a rough comparison, but it indicates that for the same
functionality, the cost isn't all that different. I'm guessing that
for protocols that have a common application mix of text and mostly-
integer numeric values, that we're talking about 10 to 30%.
You do pay a price for very long tags, but only if you can't use gzip-
style compression, which essentially does the text-to-code
translation automatically and without penalty.
I'm not comparing other textual, non-XML approaches here. Frankly, I
don't think they have a chance in the market place and the size
difference for the ones I have seen isn't all that large.
A separate issue is the functionality. This is left as an exercise
for another post.
Henning
_______________________________________________
XCON mailing list
XCON at ietf.org
https://www1.ietf.org/mailman/listinfo/xcon
Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.