[XCON] On encodings
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[XCON] On encodings



Since we're on the perennially fun topic of protocol encodings, maybe we can get past the stereotypes and try to do an engineering comparison, or at least an attempt at an approximation. Roughly speaking, an XML element is equivalent in functionality to a TLV object: it contains a label (element tag and the 'type' field), length or delimiter fields to figure out the boundaries, and the actual data (value). ASN.1 BER is close enough to TLV for this apples- to-pears comparison.

For TLV, you have a type value of 16 to 32 bits, depending on the predictions on the need to extend the protocol. For XML, you have <tag </tag>, thus, 4 bytes plus whatever length you make the tag from 'X' to 'WeLikeToPutTheWholeRFCIntoTheTagSoThatTheImplementerDoesntHaveToReadThe RFC'.

For length, this is already counted above for XML; for TLV, it would presumably be again 16 or 32 bits.

Thus, the basic overhead is fairly similar if you choose small element names. Since numeric labels aren't self-describing at all, this only seems like a fair comparison.

The big difference is in the datatypes. Clearly, if you have long binary opaque data, base64 encoding typical for XML costs you about 30%.

Integers and floats are trickier: Unless you believe in ASN.1-style variable-length encoding, it is quite possible that in many practical cases, XML integers will be shorter since TLV always has to allocate the maximum range, presumably 32 bits for integers and 32 (single precision) or 64 bits for floats. You can build custom fixed-point types in binary encodings, but they are a pain to get right (All kinds of funny things happen because of signed vs. unsigned issues, for example. Java programmers will hate you if you try this...)

Strings are obviously the same size in either TLV or XML, except for the use of delimiters (such as ") or length fields, but that's a wash at roughly 2 bytes each.

The one big difference for XML is the namespace declaration. The impact of that depends on the number of namespaces and the size of the overall document. This is also hard to compare since they give you proprietary extensions without the kludges like vendor IDs that binary protocols have to go through (unless they choose similar Java- style or domain-style tags). Indeed, for namespace-style extensions that aren't IANA registered, TLV is usually a pain. You end up with either per-TLV long labels or re-invent the XML-style indirection approach.

This is a rough comparison, but it indicates that for the same functionality, the cost isn't all that different. I'm guessing that for protocols that have a common application mix of text and mostly- integer numeric values, that we're talking about 10 to 30%.

You do pay a price for very long tags, but only if you can't use gzip- style compression, which essentially does the text-to-code translation automatically and without penalty.

I'm not comparing other textual, non-XML approaches here. Frankly, I don't think they have a chance in the market place and the size difference for the ones I have seen isn't all that large.

A separate issue is the functionality. This is left as an exercise for another post.

Henning

_______________________________________________
XCON mailing list
XCON at ietf.org
https://www1.ietf.org/mailman/listinfo/xcon




Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.