1
0
Fork 0
mirror of git://git.psyc.eu/libpsyc synced 2024-08-15 03:19:02 +00:00
libpsyc/bench/benchmark.org
psyc://psyced.org/~lynX 73f19bd300 Merge commit 'origin'
2011-05-24 20:14:01 +02:00

276 lines
11 KiB
Org Mode

#+TITLE: libpsyc Performance Benchmarks
In this document we present the results of performance benchmarks
of libpsyc compared with libjson-glib and libxml2.
* PSYC, JSON, XML Syntax Benchmarks
First we look at the mere performance of the PSYC syntax
compared to equivalent XML and JSON encodings. We'll
look at actual XMPP messaging later.
** User Profile
In this test we'll compare the efficiency of the three
syntaxes at serializing a typical user data base
storage information. Let's start with XML:
#+INCLUDE: packets/user_profile.xml src xml
In JSON this could look like this:
#+INCLUDE: packets/user_profile.json src js
Here's a way to model this in PSYC:
#+INCLUDE: packets/user_profile.psyc src psyc
** A message with JSON-unfriendly characters
This message contains some characters which are
impractical to encode in JSON. We should probably
put a lot more inside to actually see an impact
on performance.
#+INCLUDE: packets/json-unfriendly.xml src xml
#+INCLUDE: packets/json-unfriendly.json src js
#+INCLUDE: packets/json-unfriendly.psyc src psyc
** A message with XML-unfriendly characters
Same test with characters which aren't practical
in the XML syntax, yet we should put more of
them inside.
#+INCLUDE: packets/xml-unfriendly.xml src xml
#+INCLUDE: packets/xml-unfriendly.json src js
#+INCLUDE: packets/xml-unfriendly.psyc src psyc
** A message with PSYC-unfriendly strings
PSYC prefixes data with length as soon as it
exceeds certain sizes or contains certain strings.
In the case of short messages this is less
efficient than scanning the values without lengths.
Also, lengths are harder to edit by hand.
#+INCLUDE: packets/psyc-unfriendly.xml src xml
#+INCLUDE: packets/psyc-unfriendly.json src js
#+INCLUDE: packets/psyc-unfriendly.psyc src psyc
** Packets containing binary data
We'll use a generator of random binary data to
see how well the formats behave with different
sizes of data. We'll consider 7000 as a possible
size of an icon, 70000 for an avatar, 700000
for a photograph, 7000000 for a piece of music,
70000000 for a large project and
700000000 for the contents of a CD.
* PSYC vs XMPP Protocol Benchmarks
These tests use typical messages from the XMPP ("stanzas" in
Jabber lingo) and compare them with equivalent JSON encodings,
verbose and compact PSYC formats.
** A presence packet
Since presence packets are by far the dominant messaging content
in the XMPP network, we'll start with one of them.
Here's an example from paragraph 4.4.2 of RFC 6121.
#+INCLUDE: packets/presence.xml src xml
And here's the same information in a JSON rendition:
#+INCLUDE: packets/presence.json src js
Here's the equivalent PSYC packet in verbose form
(since it is a multicast, the single recipients do not
need to be mentioned):
#+INCLUDE: packets/presence.psyc src psyc
And the same in compact form:
#+BEGIN_SRC psyc
:c psyc://example.com/~juliet
=da 4
np
|
#+END_SRC
** An average chat message
#+INCLUDE: packets/chat_msg.xml src xml
#+INCLUDE: packets/chat_msg.json src js
#+INCLUDE: packets/chat_msg.psyc src psyc
Why PSYC doesn't have an id? Because packet counting from contexts
and circuits is automatic: The packet already has a number just by
being there.
Also, PSYC by default doesn't mention a "resource" in XMPP terms,
instead it allows for more addressing schemes than just PSYC.
** A new status updated activity
Example taken from http://onesocialweb.org/spec/1.0/osw-activities.html
You could call this XML namespace hell:
#+INCLUDE: packets/activity.xml src xml
http://activitystrea.ms/head/json-activity.html proposes a JSON encoding
of this. We'll have to add a routing header to it.
#+INCLUDE: packets/activity.json src js
http://about.psyc.eu/Activity suggests a PSYC mapping for activity
streams. Should a "status post" be considered equivalent to a presence
description announcement or just a message in the "microblogging" channel?
We'll use the latter here:
#+INCLUDE: packets/activity.psyc src psyc
* Results
Parsing time of 1 000 000 packets, in milliseconds.
a simple strlen scan of the respective message is provided for comparison.
| input: | PSYC | | JSON | | | XML | |
| parser: | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml |
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
| user profile | 55 | 608 | 4715 | 16503 | 7350 | 12377 | 2477 |
| psyc-unfriendly | 70 | 286 | 2892 | 12567 | 5538 | 8659 | 1896 |
| json-unfriendly | 49 | 430 | 2328 | 10006 | 5141 | 7875 | 1751 |
| xml-unfriendly | 37 | 296 | 2156 | 9591 | 5571 | 8769 | 1765 |
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
| / | < | > | < | > | < | | > |
| | <r> | <r> | <r> | <r> | <r> | <r> | <r> |
Pure syntax comparisons above, protocol performance comparisons below:
| input: | PSYC | | JSON | | | XMPP | |
| parser: | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml |
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
| presence | 30 | 236 | 2463 | 10016 | 4997 | 7557 | 1719 |
| chat msg | 40 | 295 | 2147 | 9526 | 5997 | 9777 | 1893 |
| activity | 42 | 353 | 4666 | 16327 | 13357 | 28858 | 4356 |
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
| / | < | > | < | > | < | | > |
Parsing large amounts of binary data. For JSON & XML base64 encoding was used.
Note that the results below include only the parsing time, base64 decoding was
not performed.
| input: | PSYC | | JSON | | | XML | |
| parser: | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml |
|---------+--------+---------+--------+------------+------------+-----------+----------|
| 7K | 92 | 77 | 14459 | 98000 | 11445 | 19299 | 8701 |
| 70K | 53 | 77 | 14509 | 1003900 | 96209 | 167738 | 74296 |
| 700K | 42 | 77 | 14551 | 10616000 | 842025 | 1909428 | 729419 |
| 7M | 258 | 78 | 14555 | 120810000 | 12466610 | 16751363 | 7581169 |
| 70M | 304 | 80 | 14534 | 1241000000 | 169622110 | 296017820 | 75308906 |
|---------+--------+---------+--------+------------+------------+-----------+----------|
| / | < | > | < | > | < | | > |
| <r> | | | | | | | |
These tests were performed on a 2.53 GHz Intel(R) Core(TM)2 Duo P9500 CPU.
* Criticism
Are we comparing apples and oranges? Yes and no, depends on what you
need. XML is a syntax best suited for complex structured data in
well-defined formats - especially good for text mark-up. JSON is a syntax
intended to hold arbitrarily structured data suitable for immediate
inclusion in javascript source codes. The PSYC syntax is an evolved
derivate of RFC 822, the syntax used by HTTP and E-Mail, and is therefore
limited in the kind and depth of data structures that can be represented
with it, but in exchange it is highly performant at doing just that.
In fact we are looking into suitable syntax extensions to represent
generic structures and semantic signatures, but for now PSYC only
provides for simple typed values and lists of typed values.
Another aspect is the availability of these formats for spontaneous
use. You could generate and parse JSON yourself but you have to be
careful about escaping. XML can be rendered manually if you know your
data will not break the syntax, but you can't really parse it without
a bullet proof parser. PSYC is easy to render and parse yourself for
simple tasks, as long as your body does not contain "\n|\n" and your
variables do not contain newlines.
After all it is up to you to find out which format fulfils your
requirements the best. We use PSYC for the majority of messaging where
JSON and XMPP aren't efficient and opaque enough, but we employ XML and
JSON as payloads within PSYC for data that doesn't fit the PSYC model.
For some reason all three formats are being used for messaging, although
only PSYC was actually designed for that purpose.
* Caveats
In every case we'll compare performance of parsing and re-rendering
these messages, but consider also that the applicative processing
of an XML DOM tree is more complicated than just accessing
certain elements in a JSON data structure or PSYC variable
mapping.
For a speed check in real world conditions which also consider the
complexity of processing incoming messages we should compare
the performance of a chat client using the two protocols,
for instance by using libpurple with XMPP and PSYC accounts.
To this purpose we first need to integrate libpsyc into libpurple.
* Conclusions
The Internet has developed two major breeds of protocol formats.
The binary ones are extremely efficient but usually not very flexible
(unless you are willing to recompile all instances each time you
change something)
while the plain-text ones are reaching out for achieving perfection
in data representation while leaving the path of efficiency. Some
protocols such as HTTP and SIP are in-between these two schools,
offering both a text-based extensible syntax (it's actually easier to
add a header to HTTP than to come up with a namespace for XMPP...)
and the ability to deliver binary data. But these protocols do not
come with native data structure support. PSYC is a protocol that
combines the compactness and efficiency of binary protocols with the
extensibility of text-based protocols and still provides for enough
data structuring to rarely require the use of other data formats.
* Futures
After a month of development libpsyc is already performing pretty
well, but we presume various optimizations, like rewriting parts
in assembler, are possible.
* Related Work
If this didn't help, you can also look into:
- Adobe AMF
- ASN.1
- BSON
- Cisco Etch
- Efficient XML
- Facebook Thrift
- Google Protocol Buffers
The drawback of these binary formats is, unlike PSYC, JSON and XML
you can't edit them manually and you can't produce valid messages
by replacing variables in a simple text template. You depend on
specialized parsers and renderers to be provided.
* Appendix
** Tools used
*** libpsyc
: make bench
which uses the following commands:
: test/testStrlen -sc 1000000 -f $file
: test/testPsycSpeed -sc 1000000 -f $file
: test/testJson -snc 1000000 -f $file
: test/testJsonGlib -snc 1000000 -f $file
*** xmlbench
: parse/libxml-sax 1000000 $file
: parse/libxml 1000000 $file
: parse/rapidxml 1000000 $file