mirror of
git://git.psyc.eu/libpsyc
synced 2024-08-15 03:19:02 +00:00
276 lines
11 KiB
Org Mode
276 lines
11 KiB
Org Mode
#+TITLE: libpsyc Performance Benchmarks
|
|
|
|
In this document we present the results of performance benchmarks
|
|
of libpsyc compared with libjson-glib and libxml2.
|
|
|
|
* PSYC, JSON, XML Syntax Benchmarks
|
|
First we look at the mere performance of the PSYC syntax
|
|
compared to equivalent XML and JSON encodings. We'll
|
|
look at actual XMPP messaging later.
|
|
|
|
** User Profile
|
|
In this test we'll compare the efficiency of the three
|
|
syntaxes at serializing a typical user data base
|
|
storage information. Let's start with XML:
|
|
|
|
#+INCLUDE: packets/user_profile.xml src xml
|
|
|
|
In JSON this could look like this:
|
|
|
|
#+INCLUDE: packets/user_profile.json src js
|
|
|
|
Here's a way to model this in PSYC:
|
|
|
|
#+INCLUDE: packets/user_profile.psyc src psyc
|
|
|
|
** A message with JSON-unfriendly characters
|
|
This message contains some characters which are
|
|
impractical to encode in JSON. We should probably
|
|
put a lot more inside to actually see an impact
|
|
on performance.
|
|
|
|
#+INCLUDE: packets/json-unfriendly.xml src xml
|
|
#+INCLUDE: packets/json-unfriendly.json src js
|
|
#+INCLUDE: packets/json-unfriendly.psyc src psyc
|
|
|
|
** A message with XML-unfriendly characters
|
|
Same test with characters which aren't practical
|
|
in the XML syntax, yet we should put more of
|
|
them inside.
|
|
|
|
#+INCLUDE: packets/xml-unfriendly.xml src xml
|
|
#+INCLUDE: packets/xml-unfriendly.json src js
|
|
#+INCLUDE: packets/xml-unfriendly.psyc src psyc
|
|
|
|
** A message with PSYC-unfriendly strings
|
|
PSYC prefixes data with length as soon as it
|
|
exceeds certain sizes or contains certain strings.
|
|
In the case of short messages this is less
|
|
efficient than scanning the values without lengths.
|
|
Also, lengths are harder to edit by hand.
|
|
|
|
#+INCLUDE: packets/psyc-unfriendly.xml src xml
|
|
#+INCLUDE: packets/psyc-unfriendly.json src js
|
|
#+INCLUDE: packets/psyc-unfriendly.psyc src psyc
|
|
|
|
** Packets containing binary data
|
|
We'll use a generator of random binary data to
|
|
see how well the formats behave with different
|
|
sizes of data. We'll consider 7000 as a possible
|
|
size of an icon, 70000 for an avatar, 700000
|
|
for a photograph, 7000000 for a piece of music,
|
|
70000000 for a large project and
|
|
700000000 for the contents of a CD.
|
|
|
|
* PSYC vs XMPP Protocol Benchmarks
|
|
|
|
These tests use typical messages from the XMPP ("stanzas" in
|
|
Jabber lingo) and compare them with equivalent JSON encodings,
|
|
verbose and compact PSYC formats.
|
|
|
|
** A presence packet
|
|
Since presence packets are by far the dominant messaging content
|
|
in the XMPP network, we'll start with one of them.
|
|
Here's an example from paragraph 4.4.2 of RFC 6121.
|
|
|
|
#+INCLUDE: packets/presence.xml src xml
|
|
|
|
And here's the same information in a JSON rendition:
|
|
|
|
#+INCLUDE: packets/presence.json src js
|
|
|
|
Here's the equivalent PSYC packet in verbose form
|
|
(since it is a multicast, the single recipients do not
|
|
need to be mentioned):
|
|
|
|
#+INCLUDE: packets/presence.psyc src psyc
|
|
|
|
And the same in compact form:
|
|
|
|
#+BEGIN_SRC psyc
|
|
:c psyc://example.com/~juliet
|
|
|
|
=da 4
|
|
np
|
|
|
|
|
#+END_SRC
|
|
|
|
** An average chat message
|
|
|
|
#+INCLUDE: packets/chat_msg.xml src xml
|
|
#+INCLUDE: packets/chat_msg.json src js
|
|
#+INCLUDE: packets/chat_msg.psyc src psyc
|
|
|
|
Why PSYC doesn't have an id? Because packet counting from contexts
|
|
and circuits is automatic: The packet already has a number just by
|
|
being there.
|
|
|
|
Also, PSYC by default doesn't mention a "resource" in XMPP terms,
|
|
instead it allows for more addressing schemes than just PSYC.
|
|
|
|
** A new status updated activity
|
|
Example taken from http://onesocialweb.org/spec/1.0/osw-activities.html
|
|
You could call this XML namespace hell:
|
|
|
|
#+INCLUDE: packets/activity.xml src xml
|
|
|
|
http://activitystrea.ms/head/json-activity.html proposes a JSON encoding
|
|
of this. We'll have to add a routing header to it.
|
|
|
|
#+INCLUDE: packets/activity.json src js
|
|
|
|
http://about.psyc.eu/Activity suggests a PSYC mapping for activity
|
|
streams. Should a "status post" be considered equivalent to a presence
|
|
description announcement or just a message in the "microblogging" channel?
|
|
We'll use the latter here:
|
|
|
|
#+INCLUDE: packets/activity.psyc src psyc
|
|
|
|
* Results
|
|
|
|
Parsing time of 1 000 000 packets, in milliseconds.
|
|
a simple strlen scan of the respective message is provided for comparison.
|
|
|
|
| input: | PSYC | | JSON | | | XML | |
|
|
| parser: | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml |
|
|
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
|
|
| user profile | 55 | 608 | 4715 | 16503 | 7350 | 12377 | 2477 |
|
|
| psyc-unfriendly | 70 | 286 | 2892 | 12567 | 5538 | 8659 | 1896 |
|
|
| json-unfriendly | 49 | 430 | 2328 | 10006 | 5141 | 7875 | 1751 |
|
|
| xml-unfriendly | 37 | 296 | 2156 | 9591 | 5571 | 8769 | 1765 |
|
|
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
|
|
| / | < | > | < | > | < | | > |
|
|
| | <r> | <r> | <r> | <r> | <r> | <r> | <r> |
|
|
|
|
Pure syntax comparisons above, protocol performance comparisons below:
|
|
|
|
| input: | PSYC | | JSON | | | XMPP | |
|
|
| parser: | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml |
|
|
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
|
|
| presence | 30 | 236 | 2463 | 10016 | 4997 | 7557 | 1719 |
|
|
| chat msg | 40 | 295 | 2147 | 9526 | 5997 | 9777 | 1893 |
|
|
| activity | 42 | 353 | 4666 | 16327 | 13357 | 28858 | 4356 |
|
|
|-----------------+--------+---------+--------+-----------+------------+--------+----------|
|
|
| / | < | > | < | > | < | | > |
|
|
|
|
Parsing large amounts of binary data. For JSON & XML base64 encoding was used.
|
|
Note that the results below include only the parsing time, base64 decoding was
|
|
not performed.
|
|
|
|
| input: | PSYC | | JSON | | | XML | |
|
|
| parser: | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml |
|
|
|---------+--------+---------+--------+------------+------------+-----------+----------|
|
|
| 7K | 92 | 77 | 14459 | 98000 | 11445 | 19299 | 8701 |
|
|
| 70K | 53 | 77 | 14509 | 1003900 | 96209 | 167738 | 74296 |
|
|
| 700K | 42 | 77 | 14551 | 10616000 | 842025 | 1909428 | 729419 |
|
|
| 7M | 258 | 78 | 14555 | 120810000 | 12466610 | 16751363 | 7581169 |
|
|
| 70M | 304 | 80 | 14534 | 1241000000 | 169622110 | 296017820 | 75308906 |
|
|
|---------+--------+---------+--------+------------+------------+-----------+----------|
|
|
| / | < | > | < | > | < | | > |
|
|
| <r> | | | | | | | |
|
|
|
|
|
|
These tests were performed on a 2.53 GHz Intel(R) Core(TM)2 Duo P9500 CPU.
|
|
|
|
* Criticism
|
|
|
|
Are we comparing apples and oranges? Yes and no, depends on what you
|
|
need. XML is a syntax best suited for complex structured data in
|
|
well-defined formats - especially good for text mark-up. JSON is a syntax
|
|
intended to hold arbitrarily structured data suitable for immediate
|
|
inclusion in javascript source codes. The PSYC syntax is an evolved
|
|
derivate of RFC 822, the syntax used by HTTP and E-Mail, and is therefore
|
|
limited in the kind and depth of data structures that can be represented
|
|
with it, but in exchange it is highly performant at doing just that.
|
|
In fact we are looking into suitable syntax extensions to represent
|
|
generic structures and semantic signatures, but for now PSYC only
|
|
provides for simple typed values and lists of typed values.
|
|
|
|
Another aspect is the availability of these formats for spontaneous
|
|
use. You could generate and parse JSON yourself but you have to be
|
|
careful about escaping. XML can be rendered manually if you know your
|
|
data will not break the syntax, but you can't really parse it without
|
|
a bullet proof parser. PSYC is easy to render and parse yourself for
|
|
simple tasks, as long as your body does not contain "\n|\n" and your
|
|
variables do not contain newlines.
|
|
|
|
After all it is up to you to find out which format fulfils your
|
|
requirements the best. We use PSYC for the majority of messaging where
|
|
JSON and XMPP aren't efficient and opaque enough, but we employ XML and
|
|
JSON as payloads within PSYC for data that doesn't fit the PSYC model.
|
|
For some reason all three formats are being used for messaging, although
|
|
only PSYC was actually designed for that purpose.
|
|
|
|
* Caveats
|
|
|
|
In every case we'll compare performance of parsing and re-rendering
|
|
these messages, but consider also that the applicative processing
|
|
of an XML DOM tree is more complicated than just accessing
|
|
certain elements in a JSON data structure or PSYC variable
|
|
mapping.
|
|
|
|
For a speed check in real world conditions which also consider the
|
|
complexity of processing incoming messages we should compare
|
|
the performance of a chat client using the two protocols,
|
|
for instance by using libpurple with XMPP and PSYC accounts.
|
|
To this purpose we first need to integrate libpsyc into libpurple.
|
|
|
|
* Conclusions
|
|
|
|
The Internet has developed two major breeds of protocol formats.
|
|
The binary ones are extremely efficient but usually not very flexible
|
|
(unless you are willing to recompile all instances each time you
|
|
change something)
|
|
while the plain-text ones are reaching out for achieving perfection
|
|
in data representation while leaving the path of efficiency. Some
|
|
protocols such as HTTP and SIP are in-between these two schools,
|
|
offering both a text-based extensible syntax (it's actually easier to
|
|
add a header to HTTP than to come up with a namespace for XMPP...)
|
|
and the ability to deliver binary data. But these protocols do not
|
|
come with native data structure support. PSYC is a protocol that
|
|
combines the compactness and efficiency of binary protocols with the
|
|
extensibility of text-based protocols and still provides for enough
|
|
data structuring to rarely require the use of other data formats.
|
|
|
|
* Futures
|
|
|
|
After a month of development libpsyc is already performing pretty
|
|
well, but we presume various optimizations, like rewriting parts
|
|
in assembler, are possible.
|
|
|
|
* Related Work
|
|
|
|
If this didn't help, you can also look into:
|
|
|
|
- Adobe AMF
|
|
- ASN.1
|
|
- BSON
|
|
- Cisco Etch
|
|
- Efficient XML
|
|
- Facebook Thrift
|
|
- Google Protocol Buffers
|
|
|
|
The drawback of these binary formats is, unlike PSYC, JSON and XML
|
|
you can't edit them manually and you can't produce valid messages
|
|
by replacing variables in a simple text template. You depend on
|
|
specialized parsers and renderers to be provided.
|
|
|
|
* Appendix
|
|
** Tools used
|
|
|
|
*** libpsyc
|
|
|
|
: make bench
|
|
|
|
which uses the following commands:
|
|
|
|
: test/testStrlen -sc 1000000 -f $file
|
|
: test/testPsycSpeed -sc 1000000 -f $file
|
|
: test/testJson -snc 1000000 -f $file
|
|
: test/testJsonGlib -snc 1000000 -f $file
|
|
|
|
*** xmlbench
|
|
|
|
: parse/libxml-sax 1000000 $file
|
|
: parse/libxml 1000000 $file
|
|
: parse/rapidxml 1000000 $file
|