[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Sipping] SIP-CLF: Results on ASCII vs. binary representation
Folks,
Please direct any comments to the DISPATCH WG mailing list.
Thanks,
Mary.
-----Original Message-----
From: sipping-bounces at ietf.org [mailto:sipping-bounces at ietf.org] On
Behalf Of Vijay K. Gurbani
Sent: Wednesday, April 29, 2009 9:32 AM
To: sip-ops at ietf.org; dispatch at ietf.org
Subject: [Sipping] SIP-CLF: Results on ASCII vs. binary representation
[Bcc'd to sipping]
Hello:
During the SF IETF, the SIP CLF work [2] garnered support and attention;
the minutes of the ad-hoc are archived in [1].
While there was near universal support for having a common log format,
there was a lot of discussion about whether the format should be text or
binary, the argument for binary being that it should be much faster to
search. An option for text generation is in [2] and an option for
binary generation is in [3].
We realized the question is not "binary vs. text?" but "should we
optimize for log generation vs. optimize for log processing?" To that
extent, this email is to socialize the performance results we have
obtained for generating both binary and ASCII formats, including a
simulation of a worst-case analysis by retrieving the last record from
large binary and ASCII files.
To get these results, we generated 1 Million SIP CLF entries into an
ASCII file and the same 1 Million into a binary file.
The ASCII file followed the convention of [2] and the binary file that
of [3]. The last entry in these files was a SIP request with a special
Call-ID. We measured the time it took to search for the special Call-ID
in both the binary and ASCII files. Here are the results, followed by
some discussion; the source code to the programs that generated these
results is also available (see [4].)
Total records in binary and ASCII CLF file: 1 Million File size:
Binary: 300,999,984 bytes
ASCII: 258,999,984 bytes
Time taken to generate the CLF file with 1 Million records:
Binary CLF: 138.60s
ASCII CLF: 7.26s
This is a difference of almost 20x in favor of the ASCII CLF.
Time taken to seek to the last record of the CLF file:
Binary CLF: 3.08s
ASCII CLF: 16.55s (using perl v5.6.1)
42.92s (using perl v.5.8 and v5.10)
The ASCII CLF seek is five times slower using perl v5.6.1,
and 13x slower using perl v5.8 and v5.10. It looks like later
versions of perl may have inadvertently made the regex compiler
less optimized. We don't know why.
The above data is from experiments ran on an Intel dual-core
(T2500 @ 2.00 GHz) IBM T60 laptop running Linux 2.6.27 with 1
GByte of memory.
We also ran the programs on a more powerful machine: Intel
dual-core (X6800 @ 2.93GHz) machines with 8GB RAM and a
Linux 2.6.24 kernel. The results scaled accordingly.
Clearly, the biggest difference in the above data is the time taken to
produce the CLF file. ASCII is a lightweight approach since the SIP
entity producing the ASCII CLF file already has the SIP message in text
form. It is then just a matter of writing the fields out on disk. With
the binary form, the SIP entity producing the binary CLF file has to
calculate offsets, which takes a non-negligible amount of time. Since
the entity producing the SIP CLF log file should not be over- burdened
with the act of producing it, we feel that ASCII CLF generation is the
only choice here (i.e., we should optimize for log generation.)
Otherwise, the SIP entity producing the binary CLF file will spend an
inordinate time in calculating offsets, creating a table of contents,
etc. to the detriment of providing the service it is supposed to.
That said, it is also clear that the the worst-case search for a record
is at five to 13x slower when using ASCII. But, because searching is
done offline, we feel that this sub-optimality can well be tolerated.
We also feel that there is value in specifying a binary format because
it allows for SIP operators who want to do such searches to convert
their ASCII files to binary for optimized traversal and other such uses.
A binary format must be defined so that offline processes can convert
the captured ASCII data to binary format for optimized traversal.
Comments and discussions on these results are welcome. If you find any
errors in the programs used to generate these results, please do let us
know.
[1] Thread "[Sipping] Meeting Minutes: Ad-hoc Common Log Format
meeting," IETF SIPPING WG, March 27, 2009. Archived at:
http://www.ietf.org/mail-archive/web/sipping/current/msg17199.html
[2] V. Gurbani, E. Burger, T. Anjali, H. Abdelnur and O. Festor,
"The Common Log File (CLF) format for the Session Initiation
Protocol (SIP)," IETF Internet-Draft, work in progress, March 9,
2009. Archived at:
http://tools.ietf.org/html/draft-gurbani-sipping-clf-01
[3] A. Roach, "Binary Syntax for SIP Common Log Format," IETF
Internet-Draft, work in progress, March 25, 2009. Archived at:
http://tools.ietf.org/html/draft-roach-sipping-clf-syntax-00.
[4] Source code available at the following URLs; please see
comment block in clf-write.c on how to generate ASCII and
binary CLF files.
http://ect.bell-labs.com/who/vkg/IETF/sip-clf/write-clf.c
http://ect.bell-labs.com/who/vkg/IETF/sip-clf/clf.h
http://ect.bell-labs.com/who/vkg/IETF/sip-clf/read-clf-record.c
http://ect.bell-labs.com/who/vkg/IETF/sip-clf/read-clf-record.pl
http://ect.bell-labs.com/who/vkg/IETF/sip-clf/Makefile
Thanks,
- vijay
--
Vijay K. Gurbani, Bell Laboratories, Alcatel-Lucent 1960 Lucent Lane,
Rm. 9C-533, Naperville, Illinois 60566 (USA)
Email: vkg at {alcatel-lucent.com,bell-labs.com,acm.org}
Web: http://ect.bell-labs.com/who/vkg/
_______________________________________________
Sipping mailing list https://www.ietf.org/mailman/listinfo/sipping
This list is for NEW development of the application of SIP Use
sip-implementors at cs.columbia.edu for questions on current sip Use
sip at ietf.org for new developments of core SIP