diff --git a/README.org b/README.org index 4226b71..4e05131 100644 --- a/README.org +++ b/README.org @@ -44,6 +44,27 @@ For more information see the API documentation at [[http://lib.psyc.eu/doc/]]. : test/ # test sourcecodes and test input files : d/ # the D binding +* Compiling libpsyc + +On GNU systems, type +: make + +On other systems install GNU make, then type +: gmake + +and you are done. +Other possible targets include: + +: make install [prefix=/usr] # install into prefix +: make diet # compile with diet libc +: make test # compile and run the tests +: make doc # generate the API documentation (will be put in the doc folder) +: make help # display the possible targets + +* Requirements + +libpsyc is currently known to compile under Linux and SunOS. + * Contact If you created new bindings, patches or anything other that you think should go diff --git a/bench/.gitignore b/bench/.gitignore index fd2f6c3..71da57f 100644 --- a/bench/.gitignore +++ b/bench/.gitignore @@ -1,4 +1,4 @@ -*.html -*.pdf +# *.html +# *.pdf results/ packets/binary/[0-9]* diff --git a/bench/Makefile b/bench/Makefile index cecdfe3..67be473 100644 --- a/bench/Makefile +++ b/bench/Makefile @@ -10,11 +10,12 @@ INIT = (setq load-path (cons \"/usr/share/emacs/site-lisp/org-mode\" load-path) #' ORG = benchmark.org -html: +it: for f in ${ORG}; do \ emacs -Q --batch --eval \ "(progn ${INIT} (find-file \"$$f\") \ - (org-export-as-html-batch) (kill-buffer))"; \ + (org-html-export-to-html) (kill-buffer))"; \ +# (org-export-as-html-batch) (kill-buffer))"; done pdf: diff --git a/bench/benchmark.html b/bench/benchmark.html new file mode 100644 index 0000000..537c7da --- /dev/null +++ b/bench/benchmark.html @@ -0,0 +1,1104 @@ + + + + +libpsyc Performance Benchmarks + + + + + + + + + +
+

libpsyc Performance Benchmarks

+
+

Table of Contents

+ +
+ + +

+In this document we present the results of performance benchmarks +of libpsyc compared to json-c, libjson-glib, rapidxml and libxml2. +

+ +
+

1 PSYC, JSON, XML Syntax Benchmarks

+
+

+First we look at the mere performance of the PSYC syntax +compared to equivalent XML and JSON encodings. We'll +look at actual XMPP messaging later. +

+
+ +
+

1.1 User Profile

+
+

+In this test we'll compare the efficiency of the three +syntaxes at serializing a typical user data base +storage information. Let's start with XML: +

+ +
+ +
<UserProfile>
+    <Name>Silvio Berlusconi</Name>
+    <JobTitle>Premier</JobTitle>
+    <Country>I</Country>
+    <Address>
+	    <Street>Via del Colosseo, 1</Street>
+	    <PostalCode>00100</PostalCode>
+	    <City>Roma</City>
+    </Address>
+    <Page>http://example.org</Page>
+</UserProfile>
+
+
+ +

+In JSON this could look like this: +

+ +
+ +
["UserProfile",{"Name":"Silvio Berlusconi","JobTitle":"Premier","Country":"I","Address":
+{"Street":"Via del Colosseo, 1","PostalCode":"00100","City":"Roma"},"Page":"http://example.org"}]
+
+
+ +

+Here's a way to model this in PSYC (verbose mode): +

+ +
+ +
:_name	Silvio Berlusconi
+:_title_job	Premier
+:_country	I
+:_address_street	Via del Colosseo, 1
+:_address_code_postal	00100
+:_address_city	Roma
+:_page	http://example.org
+_profile_user
+|
+
+
+
+
+ +
+

1.2 A message with JSON-unfriendly characters

+
+

+This message contains some characters which are +impractical to encode in JSON. We should probably +put a lot more inside to actually see an impact +on performance. TODO +

+ +
+ +
<message from='romeo@example.net/orchard' to='juliet@example.com/balcony'>
+<body>"Neither, fair saint, if either thee dislike.", he said.
+And
+the
+rest
+is
+history.</body>
+</message>
+
+
+
+ +
["message",{"from":"romeo@example.net/orchard","to":"juliet@example.com/balcony"},
+"\"Neither, fair saint, if either thee dislike.\", he said.\nAnd\nthe\nrest\nis\nhistory."]
+
+
+
+ +
:_source	psyc://example.com/~romeo
+:_target	psyc://example.net/~juliet
+
+_message
+"Neither, fair saint, if either thee dislike.", he said.
+And
+the
+rest
+is
+history.
+|
+
+
+
+
+ +
+

1.3 A message with XML-unfriendly characters

+
+

+Same test with characters which aren't practical +in the XML syntax, yet we should put more of +them inside. TODO +

+ +
+ +
<message from='juliet@example.com/balcony' to='romeo@example.net'>
+<body>Pro&#x010D;e&#x017D; jsi ty, Romeo?</body>
+</message>
+
+
+
+ +
["message",{"from":"juliet@example.com/balcony","to":"romeo@example.net"},
+"Pro\u010de\u017d jsi ty, Romeo?"]
+
+
+
+ +
:_source	psyc://example.com/~juliet
+:_target	psyc://example.net/~romeo
+
+_message
+ Pro&#x010D;e&#x017D; jsi ty, Romeo?
+|
+
+
+
+
+ +
+

1.4 A message with PSYC-unfriendly strings

+
+

+PSYC prefixes data with length as soon as it +exceeds certain sizes or contains certain strings. +In the case of short messages this is less +efficient than scanning the values without lengths. +Also, lengths are harder to edit by hand. +

+ +
+ +
<message from='juliet@example.com/balcony' to='romeo@example.net'>
+<subject>I implore you with a pointless
+newline in a header variable</subject>
+<body>Wherefore art thou, Romeo?
+|
+And for practicing purposes we added a PSYC packet delimiter.</body>
+</message>
+
+
+
+ +
["message",{"from":"juliet@example.com/balcony","to":"romeo@example.net",
+"subject":"I implore you with a pointless\nnewline in a header variable"},
+"Wherefore art thou, Romeo?\n|\nAnd for practicing purposes we added a PSYC packet delimiter."]
+
+
+
+ +
:_source	psyc://example.com/~juliet
+:_target	psyc://example.net/~romeo
+173
+:_subject 59	I implore you with a pointless
+newline in a header variable
+_message
+Wherefore art thou, Romeo?
+|
+And for practicing purposes we added a PSYC packet delimiter.
+|
+
+
+
+
+ +
+

1.5 Packets containing binary data

+
+

+We'll use a generator of random binary data to +see how well the formats behave with different +sizes of data. We'll consider 7000 as a possible +size of an icon, 70000 for an avatar, 700000 +for a photograph, 7000000 for a piece of music, +70000000 for a large project and +700000000 for the contents of a CD. +

+
+
+
+ +
+

2 PSYC vs XMPP Protocol Benchmarks

+
+

+These tests use typical messages from the XMPP ("stanzas" in +Jabber lingo) and compare them with equivalent JSON encodings +and PSYC formats. +

+
+ +
+

2.1 A presence packet

+
+

+Since presence packets are by far the dominant messaging content +in the XMPP network, we'll start with one of them. +Here's an example from paragraph 4.4.2 of RFC 6121. +

+ +
+ +
<presence from='juliet@example.com/balcony'
+            to='benvolio@example.net'>
+	<show>away</show>
+</presence>
+
+
+ +

+And here's the same information in a JSON rendition: +

+ +
+ +
["presence",{"from":"juliet@example.com/balcony","to":"benvolio@example.net"},{"show":"away"}]
+
+
+ +

+Here's the equivalent PSYC packet in verbose mode +(since it is a multicast, the single recipients do not +need to be mentioned): +

+ +
+ +
:_context	psyc://example.com/~juliet
+
+=_degree_availability	4
+_notice_presence
+|
+
+
+ +

+And this is the same message in PSYC's compact form, but since compact mode +hasn't been implemented nor deployed yet, you should only consider this +for future projects: +

+ +
+ +
:c	psyc://example.com/~juliet
+
+=da	4
+np
+|
+
+
+
+
+ +
+

2.2 An average chat message

+
+
+ +
<message from='juliet@example.com/balcony' to='romeo@example.net' type='chat'>
+<body>Art thou not Romeo, and a Montague?</body>
+</message>
+
+
+
+ +
["message",{"from":"juliet@example.com/balcony","to":"romeo@example.net"},
+"Art thou not Romeo, and a Montague?"]
+
+
+
+ +
:_source	psyc://example.com/~juliet
+:_target	xmpp:romeo@example.net
+
+_message
+Art thou not Romeo, and a Montague?
+|
+
+
+ +

+Little difference: PSYC by default doesn't mention a "resource" in XMPP terms, +instead it allows for more addressing schemes than just PSYC. +

+
+
+ +
+

2.3 A new status updated activity

+
+

+Example taken from http://onesocialweb.org/spec/1.0/osw-activities.html +You could call this XML namespace hell.. :-) +

+ +
+ +
<iq type='set'
+    from='hamlet@denmark.lit/snsclient'
+    to='hamlet@denmark.lit'
+    id='osw1'>
+ <pubsub xmlns='http://jabber.org/protocol/pubsub'>
+    <publish node='urn:xmpp:microblog:0'>
+      <item>
+        <entry xmlns="http://www.w3.org/2005/Atom" 
+               xmlns:activity="http://activitystrea.ms/spec/1.0/" 
+               xmlns:osw="http://onesocialweb.org/spec/1.0/">
+          <title>to be or not to be ?</title>
+          <activity:verb>http://activitystrea.ms/schema/1.0/post</activity:verb>
+          <activity:object>
+            <activity:object-type>http://onesocialweb.org/spec/1.0/object/status</activity:object-type>
+            <content type="text/plain">to be or not to be ?</content>
+          </activity:object>
+          <osw:acl-rule>
+            <osw:acl-action permission="http://onesocialweb.org/spec/1.0/acl/permission/grant">
+              http://onesocialweb.org/spec/1.0/acl/action/view
+            </osw:acl-action>
+            <osw:acl-subject type="http://onesocialweb.org/spec/1.0/acl/subject/everyone"/>
+          </osw:acl-rule>
+        </entry>
+      </item>
+    </publish>
+  </pubsub>
+</iq>
+
+
+ +

+http://activitystrea.ms/head/json-activity.html proposes a JSON encoding +of this. We'll have to add a routing header to it. +

+ +
+ +
["activity",{"from":"hamlet@denmark.lit/snsclient"},{"verb":"post",
+"title":"to be or not to be ?","object":{"type":"status",
+"content":"to be or not to be ?","contentType":"text/plain"}}]
+
+
+ +

+http://about.psyc.eu/Activity suggests a PSYC mapping for activity +streams. Should a "status post" be considered equivalent to a presence +description announcement or just a message in the "microblogging" channel? +We'll use the latter here: +

+ +
+ +
:_context	psyc://denmark.lit/~hamlet#_follow
+
+:_subject	to be or not to be ?
+:_type_content	text/plain
+_message
+to be or not to be ?
+|
+
+
+ +

+It's nice about XML namespaces how they can by definition never collide, +but this degree of engineering perfection causes us a lot of overhead. +The PSYC approach is to just extend the name of the method - as long as +people use differing method names, protocol extensions can exist next +to each other happily. Method name unicity cannot mathematically be ensured, +but it's enough to append your company name to make it unlikely for anyone +else on earth to have the same name. How this kind of safety is delivered +when using the JSON syntax of ActivityStreams is unclear. Apparently it was +no longer an important design criterion. +

+
+
+
+ +
+

3 Results

+
+

+Parsing time of 1 000 000 packets, in milliseconds. +A simple strlen() scan of the respective message is provided for comparison. +These tests were performed on a 2.53 GHz Intel(R) Core(TM)2 Duo P9500 CPU. +

+ + + + +++ + +++ ++ + +++ ++ + +++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
 strlenlibpsycjson-cjson-gliblibxml saxlibxmlrapidxml
user profile556084715165037350123772477
psyc-unfriendly70286289212567553886591896
json-unfriendly49430232810006514178751751
xml-unfriendly3729621569591557187691765
+ +

+Pure syntax comparisons above, protocol performance comparisons below: +

+ + + + +++ + +++ ++ ++ + +++ ++ + +++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
 strlenlibpsyclibpsyc compactjson-cjson-gliblibxml saxlibxmlrapidxml
presence30236122246310016499775571719
chat msg4029525821479526591189991850
activity4235327946661632713357288584356
+ +

+Parsing large amounts of binary data. For JSON & XML base64 encoding was used. +Note that the results below include only the parsing time, base64 decoding was +not performed. +

+ + + + +++ + +++ ++ + +++ ++ + +++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
 strlenlibpsycjson-cjson-gliblibxml saxlibxmlrapidxml
7K97877186099800011445192998701
70K96137718754010039009620916773874296
700K95888771883500106160008420251909428729419
7M1347300782635900012081000012466610167513637581169
70M1441400080357010000124100000016962211029601782075308906
+ +

+In each case we compared performance of parsing and re-rendering +these messages, but consider also that the applicative processing +of an XML DOM tree is more complicated than just accessing +certain elements in a JSON data structure or PSYC variable mapping. +

+
+
+ +
+

4 Explanations

+
+

+As you can tell the PSYC data format outpaces its rivals in all circumstances. +Extremely so when delivering binary data as PSYC simply returns the starting +point and the length of the given buffer while the other parsers have to scan +for the end of the transmission, but also with many simpler operations, when +PSYC quickly figures out where the data starts and ends and passes such +information back to the application while the other formats are forced to +generate a copy of the data in order to process possibly embedded special +character sequences. PSYC essentially operates like a binary data protocol +even though it is actually text-based. +

+
+
+ +
+

5 Criticism

+
+

+Are we comparing apples and oranges? Yes and no, depends on what you +need. XML is a syntax best suited for complex structured data in +well-defined formats - especially good for text mark-up. JSON is a syntax +intended to hold arbitrarily structured data suitable for immediate +inclusion in Javascript source codes. The PSYC syntax is an evolved +derivate of RFC 822, the syntax used by HTTP and E-Mail. It is currently +limited in the kind and depth of data structures that can be represented +with it, but it is highly efficient in exchange. +

+ +

+In fact we are currently looking into suitable syntax extensions to represent +generic structures and semantic signatures, but for now PSYC only +provides for simple typed values and lists of typed values. +

+
+
+ +
+

6 Ease of Implementation

+
+

+Another aspect is the availability of these formats for spontaneous +use. You could generate and parse JSON yourself but you have to be +careful about escaping. XML can be rendered manually if you know your +data will not break the syntax, but you shouldn't dare to parse it without +a bullet proof parser. PSYC is easy to render and parse yourself for +simple tasks, as long as the body does not contain "\n|\n" and your +variables do not contain newlines. +

+
+
+ +
+

7 Conclusions

+
+

+After all it is up to you to find out which format fulfils your +requirements the best. We use PSYC for the majority of messaging where +JSON and XMPP aren't efficient and opaque enough, but we employ XML and +JSON as payloads within PSYC for data that doesn't fit the PSYC model. +For some reason all three formats are being used for messaging, although +only PSYC was actually designed for that purpose. +

+ +

+The Internet has developed two major breeds of protocol formats. +The binary ones are extremely efficient but in most cases you have +to recompile all instances each time you change something +while the plain-text ones are reaching out for achieving perfection +in data representation while leaving the path of efficiency. Some +protocols such as HTTP and SIP are in-between these two schools, +offering both a text-based extensible syntax (it's actually easier to +add a header to HTTP than to come up with a namespace for XMPP…) +and the ability to deliver binary data. But these protocols do not +come with native data structure support. PSYC is a protocol that +combines the compactness and efficiency of binary protocols with the +extensibility of text-based protocols and still provides for enough +data structuring to rarely require the use of other data formats. +

+
+
+ +
+

8 Futures

+
+

+After a month of development libpsyc is already performing pretty +well, but we presume various optimizations, like rewriting parts +in assembler, are possible. +

+
+
+ +
+

9 Related Work

+
+

+If this didn't help, you can also look into: +

+ +
    +
  • Adobe AMF +
  • +
  • ASN.1 +
  • +
  • BSON +
  • +
  • Cisco Etch +
  • +
  • Efficient XML +
  • +
  • Facebook Thrift +
  • +
  • Google Protocol Buffers +
  • +
+ +

+The drawback of these binary formats is, unlike PSYC, JSON and XML +you can't edit them manually and you can't produce valid messages +by replacing variables in a simple text template. You depend on +specialized parsers and renderers to be provided. +

+ +

+There's also +

+ +
    +
  • Bittorrent's bencode +
  • +
+ +

+This format is formally text-based, but not easy to read as it doesn't +have any visual separators and isn't easy to edit as everything is +prefixed by lengths even for very short items. +

+
+
+ +
+

10 Further Reading

+
+

+http://about.psyc.eu/Spec:Syntax provides you with the ABNF grammar +of the PSYC 1.0 syntax. You may also be interested in PSYC's decentralized +state mechanism provided by the +/-/= operators. +

+ +

+See http://about.psyc.eu/XML and http://about.psyc.eu/JSON for more +biased information on the respective formats. +

+
+
+ +
+

11 Appendix

+
+
+

11.1 Tools used

+
+

+This document and its benchmarks are distributed with libpsyc. +See http://about.psyc.eu/libpsyc on how to obtain it. +

+ +

+The benchmarks can be run with the following command +(xmlbench is needed for the xml tests): +

+ +
+make bench
+
+
+
+
+
+
+

Created: 2015-08-14 Fri 10:43

+

Emacs 24.4.1 (Org mode 8.2.6)

+

Validate

+
+ + diff --git a/bench/benchmark.org b/bench/benchmark.org index 05bcf77..affdc13 100644 --- a/bench/benchmark.org +++ b/bench/benchmark.org @@ -1,4 +1,5 @@ #+TITLE: libpsyc Performance Benchmarks +#+HTML: In this document we present the results of performance benchmarks of libpsyc compared to json-c, libjson-glib, rapidxml and libxml2. diff --git a/src/memmem.c b/src/memmem.c index 3387c38..609e656 100644 --- a/src/memmem.c +++ b/src/memmem.c @@ -1,23 +1,3 @@ -/* - This file is part of libpsyc. - Copyright (C) 2011,2012 Carlo v. Loesch, Gabor X Toth, Mathias L. Baumann, - and other contributing authors. - - libpsyc is free software: you can redistribute it and/or modify it under the - terms of the GNU Affero General Public License as published by the Free - Software Foundation, either version 3 of the License, or (at your option) any - later version. As a special exception, libpsyc is distributed with additional - permissions to link libpsyc libraries with non-AGPL works. - - libpsyc is distributed in the hope that it will be useful, but WITHOUT - ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS - FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more - details. - - You should have received a copy of the GNU Affero General Public License and - the linking exception along with libpsyc in a COPYING file. -*/ - /*- * Copyright (c) 2005 Pascal Gloor *