libpsyc Performance Benchmarks
+Table of Contents
+ ++In this document we present the results of performance benchmarks +of libpsyc compared to json-c, libjson-glib, rapidxml and libxml2. +
+ +1 PSYC, JSON, XML Syntax Benchmarks
++First we look at the mere performance of the PSYC syntax +compared to equivalent XML and JSON encodings. We'll +look at actual XMPP messaging later. +
+1.1 User Profile
++In this test we'll compare the efficiency of the three +syntaxes at serializing a typical user data base +storage information. Let's start with XML: +
+ +<UserProfile> + <Name>Silvio Berlusconi</Name> + <JobTitle>Premier</JobTitle> + <Country>I</Country> + <Address> + <Street>Via del Colosseo, 1</Street> + <PostalCode>00100</PostalCode> + <City>Roma</City> + </Address> + <Page>http://example.org</Page> +</UserProfile> ++
+In JSON this could look like this: +
+ +["UserProfile",{"Name":"Silvio Berlusconi","JobTitle":"Premier","Country":"I","Address": +{"Street":"Via del Colosseo, 1","PostalCode":"00100","City":"Roma"},"Page":"http://example.org"}] ++
+Here's a way to model this in PSYC (verbose mode): +
+ +:_name Silvio Berlusconi +:_title_job Premier +:_country I +:_address_street Via del Colosseo, 1 +:_address_code_postal 00100 +:_address_city Roma +:_page http://example.org +_profile_user +| ++
1.2 A message with JSON-unfriendly characters
++This message contains some characters which are +impractical to encode in JSON. We should probably +put a lot more inside to actually see an impact +on performance. TODO +
+ +<message from='romeo@example.net/orchard' to='juliet@example.com/balcony'> +<body>"Neither, fair saint, if either thee dislike.", he said. +And +the +rest +is +history.</body> +</message> ++
["message",{"from":"romeo@example.net/orchard","to":"juliet@example.com/balcony"}, +"\"Neither, fair saint, if either thee dislike.\", he said.\nAnd\nthe\nrest\nis\nhistory."] ++
:_source psyc://example.com/~romeo +:_target psyc://example.net/~juliet + +_message +"Neither, fair saint, if either thee dislike.", he said. +And +the +rest +is +history. +| ++
1.3 A message with XML-unfriendly characters
++Same test with characters which aren't practical +in the XML syntax, yet we should put more of +them inside. TODO +
+ +<message from='juliet@example.com/balcony' to='romeo@example.net'> +<body>PročeŽ jsi ty, Romeo?</body> +</message> ++
["message",{"from":"juliet@example.com/balcony","to":"romeo@example.net"}, +"Pro\u010de\u017d jsi ty, Romeo?"] ++
:_source psyc://example.com/~juliet +:_target psyc://example.net/~romeo + +_message + PročeŽ jsi ty, Romeo? +| ++
1.4 A message with PSYC-unfriendly strings
++PSYC prefixes data with length as soon as it +exceeds certain sizes or contains certain strings. +In the case of short messages this is less +efficient than scanning the values without lengths. +Also, lengths are harder to edit by hand. +
+ +<message from='juliet@example.com/balcony' to='romeo@example.net'> +<subject>I implore you with a pointless +newline in a header variable</subject> +<body>Wherefore art thou, Romeo? +| +And for practicing purposes we added a PSYC packet delimiter.</body> +</message> ++
["message",{"from":"juliet@example.com/balcony","to":"romeo@example.net", +"subject":"I implore you with a pointless\nnewline in a header variable"}, +"Wherefore art thou, Romeo?\n|\nAnd for practicing purposes we added a PSYC packet delimiter."] ++
:_source psyc://example.com/~juliet +:_target psyc://example.net/~romeo +173 +:_subject 59 I implore you with a pointless +newline in a header variable +_message +Wherefore art thou, Romeo? +| +And for practicing purposes we added a PSYC packet delimiter. +| ++
1.5 Packets containing binary data
++We'll use a generator of random binary data to +see how well the formats behave with different +sizes of data. We'll consider 7000 as a possible +size of an icon, 70000 for an avatar, 700000 +for a photograph, 7000000 for a piece of music, +70000000 for a large project and +700000000 for the contents of a CD. +
+2 PSYC vs XMPP Protocol Benchmarks
++These tests use typical messages from the XMPP ("stanzas" in +Jabber lingo) and compare them with equivalent JSON encodings +and PSYC formats. +
+2.1 A presence packet
++Since presence packets are by far the dominant messaging content +in the XMPP network, we'll start with one of them. +Here's an example from paragraph 4.4.2 of RFC 6121. +
+ +<presence from='juliet@example.com/balcony' + to='benvolio@example.net'> + <show>away</show> +</presence> ++
+And here's the same information in a JSON rendition: +
+ +["presence",{"from":"juliet@example.com/balcony","to":"benvolio@example.net"},{"show":"away"}] ++
+Here's the equivalent PSYC packet in verbose mode +(since it is a multicast, the single recipients do not +need to be mentioned): +
+ +:_context psyc://example.com/~juliet + +=_degree_availability 4 +_notice_presence +| ++
+And this is the same message in PSYC's compact form, but since compact mode +hasn't been implemented nor deployed yet, you should only consider this +for future projects: +
+ +:c psyc://example.com/~juliet + +=da 4 +np +| ++
2.2 An average chat message
+<message from='juliet@example.com/balcony' to='romeo@example.net' type='chat'> +<body>Art thou not Romeo, and a Montague?</body> +</message> ++
["message",{"from":"juliet@example.com/balcony","to":"romeo@example.net"}, +"Art thou not Romeo, and a Montague?"] ++
:_source psyc://example.com/~juliet +:_target xmpp:romeo@example.net + +_message +Art thou not Romeo, and a Montague? +| ++
+Little difference: PSYC by default doesn't mention a "resource" in XMPP terms, +instead it allows for more addressing schemes than just PSYC. +
+2.3 A new status updated activity
++Example taken from http://onesocialweb.org/spec/1.0/osw-activities.html +You could call this XML namespace hell.. :-) +
+ +<iq type='set' + from='hamlet@denmark.lit/snsclient' + to='hamlet@denmark.lit' + id='osw1'> + <pubsub xmlns='http://jabber.org/protocol/pubsub'> + <publish node='urn:xmpp:microblog:0'> + <item> + <entry xmlns="http://www.w3.org/2005/Atom" + xmlns:activity="http://activitystrea.ms/spec/1.0/" + xmlns:osw="http://onesocialweb.org/spec/1.0/"> + <title>to be or not to be ?</title> + <activity:verb>http://activitystrea.ms/schema/1.0/post</activity:verb> + <activity:object> + <activity:object-type>http://onesocialweb.org/spec/1.0/object/status</activity:object-type> + <content type="text/plain">to be or not to be ?</content> + </activity:object> + <osw:acl-rule> + <osw:acl-action permission="http://onesocialweb.org/spec/1.0/acl/permission/grant"> + http://onesocialweb.org/spec/1.0/acl/action/view + </osw:acl-action> + <osw:acl-subject type="http://onesocialweb.org/spec/1.0/acl/subject/everyone"/> + </osw:acl-rule> + </entry> + </item> + </publish> + </pubsub> +</iq> ++
+http://activitystrea.ms/head/json-activity.html proposes a JSON encoding +of this. We'll have to add a routing header to it. +
+ +["activity",{"from":"hamlet@denmark.lit/snsclient"},{"verb":"post", +"title":"to be or not to be ?","object":{"type":"status", +"content":"to be or not to be ?","contentType":"text/plain"}}] ++
+http://about.psyc.eu/Activity suggests a PSYC mapping for activity +streams. Should a "status post" be considered equivalent to a presence +description announcement or just a message in the "microblogging" channel? +We'll use the latter here: +
+ +:_context psyc://denmark.lit/~hamlet#_follow + +:_subject to be or not to be ? +:_type_content text/plain +_message +to be or not to be ? +| ++
+It's nice about XML namespaces how they can by definition never collide, +but this degree of engineering perfection causes us a lot of overhead. +The PSYC approach is to just extend the name of the method - as long as +people use differing method names, protocol extensions can exist next +to each other happily. Method name unicity cannot mathematically be ensured, +but it's enough to append your company name to make it unlikely for anyone +else on earth to have the same name. How this kind of safety is delivered +when using the JSON syntax of ActivityStreams is unclear. Apparently it was +no longer an important design criterion. +
+3 Results
++Parsing time of 1 000 000 packets, in milliseconds. +A simple strlen() scan of the respective message is provided for comparison. +These tests were performed on a 2.53 GHz Intel(R) Core(TM)2 Duo P9500 CPU. +
+ ++ | strlen | +libpsyc | +json-c | +json-glib | +libxml sax | +libxml | +rapidxml | +
---|---|---|---|---|---|---|---|
user profile | +55 | +608 | +4715 | +16503 | +7350 | +12377 | +2477 | +
psyc-unfriendly | +70 | +286 | +2892 | +12567 | +5538 | +8659 | +1896 | +
json-unfriendly | +49 | +430 | +2328 | +10006 | +5141 | +7875 | +1751 | +
xml-unfriendly | +37 | +296 | +2156 | +9591 | +5571 | +8769 | +1765 | +
+Pure syntax comparisons above, protocol performance comparisons below: +
+ ++ | strlen | +libpsyc | +libpsyc compact | +json-c | +json-glib | +libxml sax | +libxml | +rapidxml | +
---|---|---|---|---|---|---|---|---|
presence | +30 | +236 | +122 | +2463 | +10016 | +4997 | +7557 | +1719 | +
chat msg | +40 | +295 | +258 | +2147 | +9526 | +5911 | +8999 | +1850 | +
activity | +42 | +353 | +279 | +4666 | +16327 | +13357 | +28858 | +4356 | +
+Parsing large amounts of binary data. For JSON & XML base64 encoding was used. +Note that the results below include only the parsing time, base64 decoding was +not performed. +
+ ++ | strlen | +libpsyc | +json-c | +json-glib | +libxml sax | +libxml | +rapidxml | +
---|---|---|---|---|---|---|---|
7K | +978 | +77 | +18609 | +98000 | +11445 | +19299 | +8701 | +
70K | +9613 | +77 | +187540 | +1003900 | +96209 | +167738 | +74296 | +
700K | +95888 | +77 | +1883500 | +10616000 | +842025 | +1909428 | +729419 | +
7M | +1347300 | +78 | +26359000 | +120810000 | +12466610 | +16751363 | +7581169 | +
70M | +14414000 | +80 | +357010000 | +1241000000 | +169622110 | +296017820 | +75308906 | +
+In each case we compared performance of parsing and re-rendering +these messages, but consider also that the applicative processing +of an XML DOM tree is more complicated than just accessing +certain elements in a JSON data structure or PSYC variable mapping. +
+4 Explanations
++As you can tell the PSYC data format outpaces its rivals in all circumstances. +Extremely so when delivering binary data as PSYC simply returns the starting +point and the length of the given buffer while the other parsers have to scan +for the end of the transmission, but also with many simpler operations, when +PSYC quickly figures out where the data starts and ends and passes such +information back to the application while the other formats are forced to +generate a copy of the data in order to process possibly embedded special +character sequences. PSYC essentially operates like a binary data protocol +even though it is actually text-based. +
+5 Criticism
++Are we comparing apples and oranges? Yes and no, depends on what you +need. XML is a syntax best suited for complex structured data in +well-defined formats - especially good for text mark-up. JSON is a syntax +intended to hold arbitrarily structured data suitable for immediate +inclusion in Javascript source codes. The PSYC syntax is an evolved +derivate of RFC 822, the syntax used by HTTP and E-Mail. It is currently +limited in the kind and depth of data structures that can be represented +with it, but it is highly efficient in exchange. +
+ ++In fact we are currently looking into suitable syntax extensions to represent +generic structures and semantic signatures, but for now PSYC only +provides for simple typed values and lists of typed values. +
+6 Ease of Implementation
++Another aspect is the availability of these formats for spontaneous +use. You could generate and parse JSON yourself but you have to be +careful about escaping. XML can be rendered manually if you know your +data will not break the syntax, but you shouldn't dare to parse it without +a bullet proof parser. PSYC is easy to render and parse yourself for +simple tasks, as long as the body does not contain "\n|\n" and your +variables do not contain newlines. +
+7 Conclusions
++After all it is up to you to find out which format fulfils your +requirements the best. We use PSYC for the majority of messaging where +JSON and XMPP aren't efficient and opaque enough, but we employ XML and +JSON as payloads within PSYC for data that doesn't fit the PSYC model. +For some reason all three formats are being used for messaging, although +only PSYC was actually designed for that purpose. +
+ ++The Internet has developed two major breeds of protocol formats. +The binary ones are extremely efficient but in most cases you have +to recompile all instances each time you change something +while the plain-text ones are reaching out for achieving perfection +in data representation while leaving the path of efficiency. Some +protocols such as HTTP and SIP are in-between these two schools, +offering both a text-based extensible syntax (it's actually easier to +add a header to HTTP than to come up with a namespace for XMPP…) +and the ability to deliver binary data. But these protocols do not +come with native data structure support. PSYC is a protocol that +combines the compactness and efficiency of binary protocols with the +extensibility of text-based protocols and still provides for enough +data structuring to rarely require the use of other data formats. +
+8 Futures
++After a month of development libpsyc is already performing pretty +well, but we presume various optimizations, like rewriting parts +in assembler, are possible. +
+9 Related Work
++If this didn't help, you can also look into: +
+ +-
+
- Adobe AMF + +
- ASN.1 + +
- BSON + +
- Cisco Etch + +
- Efficient XML + +
- Facebook Thrift + +
- Google Protocol Buffers + +
+The drawback of these binary formats is, unlike PSYC, JSON and XML +you can't edit them manually and you can't produce valid messages +by replacing variables in a simple text template. You depend on +specialized parsers and renderers to be provided. +
+ ++There's also +
+ +-
+
- Bittorrent's bencode + +
+This format is formally text-based, but not easy to read as it doesn't +have any visual separators and isn't easy to edit as everything is +prefixed by lengths even for very short items. +
+10 Further Reading
++http://about.psyc.eu/Spec:Syntax provides you with the ABNF grammar +of the PSYC 1.0 syntax. You may also be interested in PSYC's decentralized +state mechanism provided by the +/-/= operators. +
+ ++See http://about.psyc.eu/XML and http://about.psyc.eu/JSON for more +biased information on the respective formats. +
+11 Appendix
+11.1 Tools used
++This document and its benchmarks are distributed with libpsyc. +See http://about.psyc.eu/libpsyc on how to obtain it. +
+ ++The benchmarks can be run with the following command +(xmlbench is needed for the xml tests): +
+ ++make bench ++