1
0
Fork 0
mirror of git://git.psyc.eu/libpsyc synced 2024-08-15 03:19:02 +00:00
libpsyc/bench/benchmark.org
psyc://psyced.org/~lynX 0111aa7224 more paketz
2011-05-18 21:59:00 +02:00

183 lines
6.1 KiB
Org Mode

#+TITLE: libpsyc Performance Benchmarks
In this document we present the results of performance benchmarks
of libpsyc compared with libjson-glib and libxml2.
* Procedure
We'll use typical messages from the XMPP ("stanzas" in Jabber
lingo) and compare them with equivalent JSON encodings,
verbose and compact PSYC formats.
In some cases we will additionally compare PSYC packets to
a more efficient XML encoding based on PSYC methods, to have
a more accurate comparison of the actual PSYC and XML
syntaxes, rather than the protocol structures of PSYC and XMPP.
* The Benchmarks
** A presence packet
Since presence packets are by far the dominant messaging content
in the XMPP network, we'll start with one of them.
Here's an example from paragraph 4.4.2 of RFC 6121.
#+INCLUDE: packets/presence.xml src xml
And here's the same information in a JSON rendition:
#+INCLUDE: packets/presence.json src js
Here's the equivalent PSYC packet in verbose form
(since it is a multicast, the single recipients do not
need to be mentioned):
#+INCLUDE: packets/presence.psyc src psyc
And the same in compact form:
#+BEGIN_SRC psyc
:c psyc://example.com/~juliet
=da 4
np
|
#+END_SRC
** An average chat message
XMPP:
#+INCLUDE: packets/chat_msg.xml src xml
JSON:
#+INCLUDE: packets/chat_msg.json src js
PSYC:
#+INCLUDE: packets/chat_msg.psyc src psyc
Why PSYC doesn't have an id? Because packet counting from contexts
and circuits is automatic: The packet already has a number just by
being there.
Also, PSYC by default doesn't mention a "resource" in XMPP terms,
instead it allows for more addressing schemes than just PSYC.
** A new status updated activity
Example taken from http://onesocialweb.org/spec/1.0/osw-activities.html
You could call this XML namespace hell:
#+INCLUDE: packets/activity.xml src xml
http://activitystrea.ms/head/json-activity.html proposes a JSON encoding
of this. We'll have to add a routing header to it.
#+INCLUDE: packets/activity.json src js
http://about.psyc.eu/Activity suggests a PSYC mapping for activity
streams. Should a "status post" be considered equivalent to a presence
description announcement or just a message in the "microblogging" channel?
We'll use the latter here:
#+INCLUDE: packets/activity.psyc src psyc
** A message with JSON-unfriendly characters
#+INCLUDE: packets/json-unfriendly.xml src xml
#+INCLUDE: packets/json-unfriendly.json src js
#+INCLUDE: packets/json-unfriendly.psyc src psyc
** A message with XML-unfriendly characters
#+INCLUDE: packets/xml-unfriendly.xml src xml
** A message with PSYC-unfriendly strings
#+INCLUDE: packets/psyc-unfriendly.xml src xml
#+INCLUDE: packets/psyc-unfriendly.json src js
#+INCLUDE: packets/psyc-unfriendly.psyc src psyc
** A packet containing a JPEG photograph
... TBD ...
** A random data structure
In this test we'll not consider XMPP at all and simply compare the
efficiency of the three syntaxes at serializing a typical user data base
storage information. We'll again start with XML:
#+INCLUDE: packets/user_profile.xml src xml
In JSON this would look like this:
#+INCLUDE: packets/user_profile.json src js
Here's a way to model this in PSYC:
#+INCLUDE: packets/user_profile.psyc src psyc
* Results
Parsing time of 1 000 000 packets in milliseconds:
| input: | PSYC | | JSON | | | XML | |
| parser: | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml |
|-----------+--------+---------+--------+-----------+------------+--------+----------|
| presence | 30 | 246 | 2463 | 10197 | 4997 | 7557 | 1719 |
| chat msg | 41 | 320 | | | 5997 | 9777 | 1893 |
| activity | 42 | 366 | 4666 | 16846 | 13357 | 28858 | 4419 |
| user prof | 55 | 608 | 4715 | 17468 | 7350 | 12377 | 2477 |
|-----------+--------+---------+--------+-----------+------------+--------+----------|
| / | < | > | < | > | < | | > |
These tests were performed on a 2.53 GHz Intel(R) Core(TM)2 Duo P9500 CPU.
* Conclusions
... TBD ...
* Criticism
Are we comparing apples and oranges? Yes and no, depends on what you
need. XML is a syntax best suited for complex structured data in
well-defined formats - especially good for text mark-up. JSON is a syntax
intended to hold arbitrarily structured data suitable for immediate
inclusion in javascript source codes. The PSYC syntax is an evolved
derivate of RFC 822, the syntax used by HTTP and E-Mail, and is therefore
limited in the kind and depth of data structures that can be represented
with it, but in exchange it is highly performant at doing just that.
So it is up to you to find out which of the three formats fulfils your
requirements the best. We use PSYC for the majority of messaging where
JSON and XMPP aren't efficient and opaque enough, but we employ XML and
JSON as payloads within PSYC for data that doesn't fit the PSYC model.
For some reason all three formats are being used for messaging, although
only PSYC was actually designed for that purpose.
* Caveats
In every case we'll compare performance of parsing and re-rendering
these messages, but consider also that the applicative processing
of an XML DOM tree is more complicated than just accessing
certain elements in a JSON data structure or PSYC variable
mapping.
For a speed check in real world conditions which also consider the
complexity of processing incoming messages we should compare
the performance of a chat client using the two protocols,
for instance by using libpurple with XMPP and PSYC accounts.
To this purpose we first need to integrate libpsyc into libpurple.
* Futures
After a month of development libpsyc is already performing pretty
well, but we presume various optimizations, like rewriting parts
in assembler, are possible.
* Appendix
** Tools used
libpsyc:
: test/testStrlen -sc 1000000 -f $file
: test/testPsycSpeed -sc 1000000 -f $file
: test/testJson -snc 1000000 -f $file
: test/testJsonGlib -snc 1000000 -f $file
xmlbench:
: parse/libxml-sax 1000000 $file
: parse/libxml 1000000 $file
: parse/rapidxml 1000000 $file