mirror of
git://git.psyc.eu/libpsyc
synced 2024-08-15 03:19:02 +00:00
294 lines
13 KiB
Org Mode
294 lines
13 KiB
Org Mode
#+TITLE: libpsyc Performance Benchmarks
|
|
|
|
In this document we present the results of performance benchmarks
|
|
of libpsyc compared to json-c, libjson-glib, rapidxml and libxml2.
|
|
|
|
* PSYC, JSON, XML Syntax Benchmarks
|
|
First we look at the mere performance of the PSYC syntax
|
|
compared to equivalent XML and JSON encodings. We'll
|
|
look at actual XMPP messaging later.
|
|
|
|
** User Profile
|
|
In this test we'll compare the efficiency of the three
|
|
syntaxes at serializing a typical user data base
|
|
storage information. Let's start with XML:
|
|
|
|
#+INCLUDE: packets/user_profile.xml src xml
|
|
|
|
In JSON this could look like this:
|
|
|
|
#+INCLUDE: packets/user_profile.json src js
|
|
|
|
Here's a way to model this in PSYC (verbose mode):
|
|
|
|
#+INCLUDE: packets/user_profile.psyc src psyc
|
|
|
|
** A message with JSON-unfriendly characters
|
|
This message contains some characters which are
|
|
impractical to encode in JSON. We should probably
|
|
put a lot more inside to actually see an impact
|
|
on performance. *TODO*
|
|
|
|
#+INCLUDE: packets/json-unfriendly.xml src xml
|
|
#+INCLUDE: packets/json-unfriendly.json src js
|
|
#+INCLUDE: packets/json-unfriendly.psyc src psyc
|
|
|
|
** A message with XML-unfriendly characters
|
|
Same test with characters which aren't practical
|
|
in the XML syntax, yet we should put more of
|
|
them inside. *TODO*
|
|
|
|
#+INCLUDE: packets/xml-unfriendly.xml src xml
|
|
#+INCLUDE: packets/xml-unfriendly.json src js
|
|
#+INCLUDE: packets/xml-unfriendly.psyc src psyc
|
|
|
|
** A message with PSYC-unfriendly strings
|
|
PSYC prefixes data with length as soon as it
|
|
exceeds certain sizes or contains certain strings.
|
|
In the case of short messages this is less
|
|
efficient than scanning the values without lengths.
|
|
Also, lengths are harder to edit by hand.
|
|
|
|
#+INCLUDE: packets/psyc-unfriendly.xml src xml
|
|
#+INCLUDE: packets/psyc-unfriendly.json src js
|
|
#+INCLUDE: packets/psyc-unfriendly.psyc src psyc
|
|
|
|
** Packets containing binary data
|
|
We'll use a generator of random binary data to
|
|
see how well the formats behave with different
|
|
sizes of data. We'll consider 7000 as a possible
|
|
size of an icon, 70000 for an avatar, 700000
|
|
for a photograph, 7000000 for a piece of music,
|
|
70000000 for a large project and
|
|
700000000 for the contents of a CD.
|
|
|
|
* PSYC vs XMPP Protocol Benchmarks
|
|
|
|
These tests use typical messages from the XMPP ("stanzas" in
|
|
Jabber lingo) and compare them with equivalent JSON encodings
|
|
and PSYC formats.
|
|
|
|
** A presence packet
|
|
Since presence packets are by far the dominant messaging content
|
|
in the XMPP network, we'll start with one of them.
|
|
Here's an example from paragraph 4.4.2 of RFC 6121.
|
|
|
|
#+INCLUDE: packets/presence.xml src xml
|
|
|
|
And here's the same information in a JSON rendition:
|
|
|
|
#+INCLUDE: packets/presence.json src js
|
|
|
|
Here's the equivalent PSYC packet in verbose mode
|
|
(since it is a multicast, the single recipients do not
|
|
need to be mentioned):
|
|
|
|
#+INCLUDE: packets/presence.psyc src psyc
|
|
|
|
And this is the same message in PSYC's compact form, but since compact mode
|
|
hasn't been implemented nor deployed yet, you should only consider this
|
|
for future projects:
|
|
|
|
#+INCLUDE: packets/presence-c.psyc src psyc
|
|
|
|
** An average chat message
|
|
|
|
#+INCLUDE: packets/chat_msg.xml src xml
|
|
#+INCLUDE: packets/chat_msg.json src js
|
|
#+INCLUDE: packets/chat_msg.psyc src psyc
|
|
|
|
# Why PSYC doesn't have an id? Because for most operations they aren't
|
|
# needed: PSYC has automatic packet counting from contexts and circuits.
|
|
# Therefore, the packet already has an id just by being there.
|
|
# Should you want to tag a packet anyway, you can do so by adding a _tag.
|
|
#
|
|
# Update: XMPP doesn't *need* to have an id there, so we can just remove it.
|
|
|
|
Little difference: PSYC by default doesn't mention a "resource" in XMPP terms,
|
|
instead it allows for more addressing schemes than just PSYC.
|
|
|
|
** A new status updated activity
|
|
Example taken from http://onesocialweb.org/spec/1.0/osw-activities.html
|
|
You could call this XML namespace hell.. :-)
|
|
|
|
#+INCLUDE: packets/activity.xml src xml
|
|
|
|
http://activitystrea.ms/head/json-activity.html proposes a JSON encoding
|
|
of this. We'll have to add a routing header to it.
|
|
|
|
#+INCLUDE: packets/activity.json src js
|
|
|
|
http://about.psyc.eu/Activity suggests a PSYC mapping for activity
|
|
streams. Should a "status post" be considered equivalent to a presence
|
|
description announcement or just a message in the "microblogging" channel?
|
|
We'll use the latter here:
|
|
|
|
#+INCLUDE: packets/activity.psyc src psyc
|
|
|
|
It's nice about XML namespaces how they can by definition never collide,
|
|
but this degree of engineering perfection causes us a lot of overhead.
|
|
The PSYC approach is to just extend the name of the method - as long as
|
|
people use differing method names, protocol extensions can exist next
|
|
to each other happily. Method name unicity cannot mathematically be ensured,
|
|
but it's enough to append your company name to make it unlikely for anyone
|
|
else on earth to have the same name. How this kind of safety is delivered
|
|
when using the JSON syntax of ActivityStreams is unclear. Apparently it was
|
|
no longer an important design criterion.
|
|
|
|
* Results
|
|
|
|
Parsing time of 1 000 000 packets, in milliseconds.
|
|
A simple strlen() scan of the respective message is provided for comparison.
|
|
These tests were performed on a 2.53 GHz Intel(R) Core(TM)2 Duo P9500 CPU.
|
|
|
|
| | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml |
|
|
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
|
|
| user profile | 55 | 608 | 4715 | 16503 | 7350 | 12377 | 2477 |
|
|
| psyc-unfriendly | 70 | 286 | 2892 | 12567 | 5538 | 8659 | 1896 |
|
|
| json-unfriendly | 49 | 430 | 2328 | 10006 | 5141 | 7875 | 1751 |
|
|
| xml-unfriendly | 37 | 296 | 2156 | 9591 | 5571 | 8769 | 1765 |
|
|
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
|
|
| / | < | | < | > | < | | > |
|
|
| | <r> | <r> | <r> | <r> | <r> | <r> | <r> |
|
|
|
|
Pure syntax comparisons above, protocol performance comparisons below:
|
|
|
|
| | strlen | libpsyc | libpsyc compact | json-c | json-glib | libxml sax | libxml | rapidxml |
|
|
|----------+--------+---------+-----------------+--------+-----------+------------+--------+----------|
|
|
| presence | 30 | 236 | 122 | 2463 | 10016 | 4997 | 7557 | 1719 |
|
|
| chat msg | 40 | 295 | 258 | 2147 | 9526 | 5911 | 8999 | 1850 |
|
|
| activity | 42 | 353 | 279 | 4666 | 16327 | 13357 | 28858 | 4356 |
|
|
|----------+--------+---------+-----------------+--------+-----------+------------+--------+----------|
|
|
| / | < | | > | < | > | < | | > |
|
|
| | | | <c> | | | | | |
|
|
|
|
Parsing large amounts of binary data. For JSON & XML base64 encoding was used.
|
|
Note that the results below include only the parsing time, base64 decoding was
|
|
not performed.
|
|
|
|
| | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml |
|
|
|---------+----------+---------+-----------+------------+------------+-----------+----------|
|
|
| 7K | 978 | 77 | 18609 | 98000 | 11445 | 19299 | 8701 |
|
|
| 70K | 9613 | 77 | 187540 | 1003900 | 96209 | 167738 | 74296 |
|
|
| 700K | 95888 | 77 | 1883500 | 10616000 | 842025 | 1909428 | 729419 |
|
|
| 7M | 1347300 | 78 | 26359000 | 120810000 | 12466610 | 16751363 | 7581169 |
|
|
| 70M | 14414000 | 80 | 357010000 | 1241000000 | 169622110 | 296017820 | 75308906 |
|
|
|---------+----------+---------+-----------+------------+------------+-----------+----------|
|
|
| / | < | > | < | > | < | | > |
|
|
| <r> | | | | | | | |
|
|
|
|
In each case we compared performance of parsing and re-rendering
|
|
these messages, but consider also that the applicative processing
|
|
of an XML DOM tree is more complicated than just accessing
|
|
certain elements in a JSON data structure or PSYC variable mapping.
|
|
|
|
* Explanations
|
|
|
|
As you can tell the PSYC data format outpaces its rivals in all circumstances.
|
|
Extremely so when delivering binary data as PSYC simply returns the starting
|
|
point and the length of the given buffer while the other parsers have to scan
|
|
for the end of the transmission, but also with many simpler operations, when
|
|
PSYC quickly figures out where the data starts and ends and passes such
|
|
information back to the application while the other formats are forced to
|
|
generate a copy of the data in order to process possibly embedded special
|
|
character sequences. PSYC essentially operates like a binary data protocol
|
|
even though it is actually text-based.
|
|
|
|
* Criticism
|
|
|
|
Are we comparing apples and oranges? Yes and no, depends on what you
|
|
need. XML is a syntax best suited for complex structured data in
|
|
well-defined formats - especially good for text mark-up. JSON is a syntax
|
|
intended to hold arbitrarily structured data suitable for immediate
|
|
inclusion in Javascript source codes. The PSYC syntax is an evolved
|
|
derivate of RFC 822, the syntax used by HTTP and E-Mail. It is currently
|
|
limited in the kind and depth of data structures that can be represented
|
|
with it, but it is highly efficient in exchange.
|
|
|
|
In fact we are currently looking into suitable syntax extensions to represent
|
|
generic structures and semantic signatures, but for now PSYC only
|
|
provides for simple typed values and lists of typed values.
|
|
|
|
* Ease of Implementation
|
|
|
|
Another aspect is the availability of these formats for spontaneous
|
|
use. You could generate and parse JSON yourself but you have to be
|
|
careful about escaping. XML can be rendered manually if you know your
|
|
data will not break the syntax, but you shouldn't dare to parse it without
|
|
a bullet proof parser. PSYC is easy to render and parse yourself for
|
|
simple tasks, as long as the body does not contain "\n|\n" and your
|
|
variables do not contain newlines.
|
|
|
|
* Conclusions
|
|
|
|
After all it is up to you to find out which format fulfils your
|
|
requirements the best. We use PSYC for the majority of messaging where
|
|
JSON and XMPP aren't efficient and opaque enough, but we employ XML and
|
|
JSON as payloads within PSYC for data that doesn't fit the PSYC model.
|
|
For some reason all three formats are being used for messaging, although
|
|
only PSYC was actually designed for that purpose.
|
|
|
|
The Internet has developed two major breeds of protocol formats.
|
|
The binary ones are extremely efficient but in most cases you have
|
|
to recompile all instances each time you change something
|
|
while the plain-text ones are reaching out for achieving perfection
|
|
in data representation while leaving the path of efficiency. Some
|
|
protocols such as HTTP and SIP are in-between these two schools,
|
|
offering both a text-based extensible syntax (it's actually easier to
|
|
add a header to HTTP than to come up with a namespace for XMPP...)
|
|
and the ability to deliver binary data. But these protocols do not
|
|
come with native data structure support. PSYC is a protocol that
|
|
combines the compactness and efficiency of binary protocols with the
|
|
extensibility of text-based protocols and still provides for enough
|
|
data structuring to rarely require the use of other data formats.
|
|
|
|
* Futures
|
|
|
|
After a month of development libpsyc is already performing pretty
|
|
well, but we presume various optimizations, like rewriting parts
|
|
in assembler, are possible.
|
|
|
|
* Related Work
|
|
|
|
If this didn't help, you can also look into:
|
|
|
|
- Adobe AMF
|
|
- ASN.1
|
|
- BSON
|
|
- Cisco Etch
|
|
- Efficient XML
|
|
- Facebook Thrift
|
|
- Google Protocol Buffers
|
|
|
|
The drawback of these binary formats is, unlike PSYC, JSON and XML
|
|
you can't edit them manually and you can't produce valid messages
|
|
by replacing variables in a simple text template. You depend on
|
|
specialized parsers and renderers to be provided.
|
|
|
|
There's also
|
|
|
|
- Bittorrent's bencode
|
|
|
|
This format is formally text-based, but not easy to read as it doesn't
|
|
have any visual separators and isn't easy to edit as everything is
|
|
prefixed by lengths even for very short items.
|
|
|
|
* Further Reading
|
|
|
|
http://about.psyc.eu/Spec:Syntax provides you with the ABNF grammar
|
|
of the PSYC 1.0 syntax. You may also be interested in PSYC's decentralized
|
|
state mechanism provided by the +/-/= operators.
|
|
|
|
See http://about.psyc.eu/XML and http://about.psyc.eu/JSON for more
|
|
biased information on the respective formats.
|
|
|
|
* Appendix
|
|
** Tools used
|
|
|
|
This document and its benchmarks are distributed with libpsyc.
|
|
See http://about.psyc.eu/libpsyc on how to obtain it.
|
|
|
|
The benchmarks can be run with the following command
|
|
(xmlbench is needed for the xml tests):
|
|
|
|
: make bench
|