#+TITLE: libpsyc Performance Benchmarks In this document we present the results of performance benchmarks of libpsyc compared with libjson-glib and libxml2. * PSYC, JSON, XML Syntax Benchmarks First we look at the mere performance of the PSYC syntax compared to equivalent XML and JSON encodings. We'll look at actual XMPP messaging later. ** User Profile In this test we'll compare the efficiency of the three syntaxes at serializing a typical user data base storage information. Let's start with XML: #+INCLUDE: packets/user_profile.xml src xml In JSON this could look like this: #+INCLUDE: packets/user_profile.json src js Here's a way to model this in PSYC: #+INCLUDE: packets/user_profile.psyc src psyc ** A message with JSON-unfriendly characters This message contains some characters which are impractical to encode in JSON. Let's see how much performance impact this has. #+INCLUDE: packets/json-unfriendly.xml src xml #+INCLUDE: packets/json-unfriendly.json src js #+INCLUDE: packets/json-unfriendly.psyc src psyc ** A message with XML-unfriendly characters Same test with characters which aren't practical in the XML syntax. #+INCLUDE: packets/xml-unfriendly.xml src xml #+INCLUDE: packets/xml-unfriendly.json src js #+INCLUDE: packets/xml-unfriendly.psyc src psyc ** A message with PSYC-unfriendly strings PSYC prefixes data with length as soon as it exceeds certain sizes or contains certain strings. In the case of short messages this is less efficient than scanning the values without lengths. Also, lengths are harder to edit by hand. #+INCLUDE: packets/psyc-unfriendly.xml src xml #+INCLUDE: packets/psyc-unfriendly.json src js #+INCLUDE: packets/psyc-unfriendly.psyc src psyc ** Packets containing binary data We'll use a generator of random binary data to see how well the formats behave with different sizes of data. We'll consider 7000 as a possible size of an icon, 70000 for an avatar, 700000 for a photograph, 7000000 for a piece of music, 70000000 for a large project and 700000000 for the contents of a CD. * PSYC vs XMPP Protocol Benchmarks These tests use typical messages from the XMPP ("stanzas" in Jabber lingo) and compare them with equivalent JSON encodings, verbose and compact PSYC formats. ** A presence packet Since presence packets are by far the dominant messaging content in the XMPP network, we'll start with one of them. Here's an example from paragraph 4.4.2 of RFC 6121. #+INCLUDE: packets/presence.xml src xml And here's the same information in a JSON rendition: #+INCLUDE: packets/presence.json src js Here's the equivalent PSYC packet in verbose form (since it is a multicast, the single recipients do not need to be mentioned): #+INCLUDE: packets/presence.psyc src psyc And the same in compact form: #+BEGIN_SRC psyc :c psyc://example.com/~juliet =da 4 np | #+END_SRC ** An average chat message #+INCLUDE: packets/chat_msg.xml src xml #+INCLUDE: packets/chat_msg.json src js #+INCLUDE: packets/chat_msg.psyc src psyc Why PSYC doesn't have an id? Because packet counting from contexts and circuits is automatic: The packet already has a number just by being there. Also, PSYC by default doesn't mention a "resource" in XMPP terms, instead it allows for more addressing schemes than just PSYC. ** A new status updated activity Example taken from http://onesocialweb.org/spec/1.0/osw-activities.html You could call this XML namespace hell: #+INCLUDE: packets/activity.xml src xml http://activitystrea.ms/head/json-activity.html proposes a JSON encoding of this. We'll have to add a routing header to it. #+INCLUDE: packets/activity.json src js http://about.psyc.eu/Activity suggests a PSYC mapping for activity streams. Should a "status post" be considered equivalent to a presence description announcement or just a message in the "microblogging" channel? We'll use the latter here: #+INCLUDE: packets/activity.psyc src psyc * Results Parsing time of 1 000 000 packets in milliseconds: | input: | PSYC | | JSON | | | XML | | | parser: | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml | |-----------+--------+---------+--------+-----------+------------+--------+----------| | user prof | 55 | 608 | 4715 | 17468 | 7350 | 12377 | 2477 | |-----------+--------+---------+--------+-----------+------------+--------+----------| | / | < | > | < | > | < | | > | Pure syntax comparisons above, protocol performance comparisons below: | input: | PSYC | | JSON | | | XMPP | | | parser: | strlen | libpsyc | json-c | json-glib | libxml sax | libxml | rapidxml | |-----------+--------+---------+--------+-----------+------------+--------+----------| | presence | 30 | 246 | 2463 | 10197 | 4997 | 7557 | 1719 | | chat msg | 41 | 320 | | | 5997 | 9777 | 1893 | | activity | 42 | 366 | 4666 | 16846 | 13357 | 28858 | 4419 | |-----------+--------+---------+--------+-----------+------------+--------+----------| | / | < | > | < | > | < | | > | These tests were performed on a 2.53 GHz Intel(R) Core(TM)2 Duo P9500 CPU. * Conclusions The Internet has developed two major breeds of protocol formats. The binary ones are extremely efficient but usually not very flexible while the plain-text ones are reaching out for achieving perfection in data representation while leaving the path of efficiency. Some protocols such as HTTP and SIP are in-between these two schools, offering both a text-based extensible syntax (it's actually easier to add a header to HTTP than to come up with a namespace for XMPP...) and the ability to deliver binary data. But these protocols do not come with native data structure support. PSYC is a protocol that combines the compactness and efficiency of binary protocols with the extensibility of text-based protocols and still provides for enough data structuring to rarely require the use of other data formats. * Criticism Are we comparing apples and oranges? Yes and no, depends on what you need. XML is a syntax best suited for complex structured data in well-defined formats - especially good for text mark-up. JSON is a syntax intended to hold arbitrarily structured data suitable for immediate inclusion in javascript source codes. The PSYC syntax is an evolved derivate of RFC 822, the syntax used by HTTP and E-Mail, and is therefore limited in the kind and depth of data structures that can be represented with it, but in exchange it is highly performant at doing just that. So it is up to you to find out which of the three formats fulfils your requirements the best. We use PSYC for the majority of messaging where JSON and XMPP aren't efficient and opaque enough, but we employ XML and JSON as payloads within PSYC for data that doesn't fit the PSYC model. For some reason all three formats are being used for messaging, although only PSYC was actually designed for that purpose. * Caveats In every case we'll compare performance of parsing and re-rendering these messages, but consider also that the applicative processing of an XML DOM tree is more complicated than just accessing certain elements in a JSON data structure or PSYC variable mapping. For a speed check in real world conditions which also consider the complexity of processing incoming messages we should compare the performance of a chat client using the two protocols, for instance by using libpurple with XMPP and PSYC accounts. To this purpose we first need to integrate libpsyc into libpurple. * Futures After a month of development libpsyc is already performing pretty well, but we presume various optimizations, like rewriting parts in assembler, are possible. * Appendix ** Tools used libpsyc: : test/testStrlen -sc 1000000 -f $file : test/testPsycSpeed -sc 1000000 -f $file : test/testJson -snc 1000000 -f $file : test/testJsonGlib -snc 1000000 -f $file xmlbench: : parse/libxml-sax 1000000 $file : parse/libxml 1000000 $file : parse/rapidxml 1000000 $file See also "make bench"