Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data

From: Philip Van Hoof <philip codeminded be>
To: tracker-list gnome org
Subject: Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
Date: Mon, 14 Mar 2011 20:21:52 +0100


Oh hi, I forgot to mention something important,

Note that the test for E-D-S does not first delete a possibly existing
contact. So each run you append 1000 contacts.

The test for Tracker does delete existing contacts before inserting the
new one. This means that you replace and don't append 1000 contacts.

This is of course a clear benefit that was given to the E-D-S
performance number (and still were the numbers close for both).

For 1000 INSERTs without any DELETE is Tracker's RDF store going to
finish in about 2.30s instead of 12s - 15s (just tested on a new test
run). It's indeed the case that the DELETEs create the biggest
performance penalty (this is why INSERT OR REPLACE is much faster).

To adapt the E-D-S test to do a more similar thing as what the Tracker
RDF store's test did, you would need to lookup each Contact and then use
e_book_remove_contact before doing the e_book_add_contact and the
e_book_commit_contact. I kinda didn't realize this while making the
test.

The lookup and the remove are expected to be bad news for E-D-S's
performance number, of course. But it's evening for me now! ;-)

I invite readers to make tests that do this, and feel free to report
back with some comparisons and numbers.


Cheers,

Philip


On Mon, 2011-03-14 at 19:57 +0100, Philip Van Hoof wrote:

Hi there,

I've done a performance analysis and comparison of Tracker's RDF store
versus E-D-S (Evolution Data Server).

I'll first make a summary of what I think is necessary for readers to
know and understand because both Tracker's RDF store and E-D-S are in
fact different products service a similar but different purpose.

o. The VCard that I'm testing with for E-D-S isn't as complex as the
   Nepomuk-ontology based contact that the tests are saving in the RDF
   example for Tracker.

   This means that this performance test isn't perfectly compared. But
   I'm not a VCard expert myself. And results were rather clear (for me)
   early on.

   I have included the source code of both test softwares in attachment
   so that you can match the complexity to produce better numbers.

o. Given that E-D-S is a single-purpose store for PIM data we expected
   E-D-S to perform much better than Tracker's RDF store. Tracker is of
   course not a single-purpose store. It can store all within the realm
   of Nepomuk's ontology (which has a lot more application and use-case
   domains than PIM)

   It is not the case that E-D-S performs much better, as the report
   below will illustrate.

o. I personally expected E-D-S to scale to large amounts of contacts.

   But a simple loop of adding 2x 1000 contacts makes Evolution's UI go
   flat on its face. The UI doesn't respond anymore at all. The only
   thing that helps is a evolution --force-shutdown.

   The Evolution UI also makes the entire Desktop unresponsive.

   This is important for for example a platform like MeeGo: a version of
   the Evolution UI _does_ run on MeeGo devices (making it part of E-D-S
   as solution on aforementioned platform -- the UI can and often will
   be running, my poor batteries).

   o. For a very very long time after the test.c 1000 contacts loop has
      finished is Evolution running at 95% CPU. Doing who knows what
      (draining massive amounts of power, in any case)

      This makes me conclude that the API is cheating and returning
      earlier than allowed. Maybe not everything is finished, and I
      wonder what would happen if the system would crash. Would all data
      be guaranteed to be persistently stored on the storage hardware?

      Tracker's RDF store guarantees that at return of the API the data
      will be stored (it has a journal for this).

o. After+while the data has been/is being entered into the store can
   Tracker's RDF store perform more complex queries and at the same time
   it can allow queries that cross multiple use-case and application
   domains (deeply link contacts to IM, E-mail, Photos, Videos, Events,
   GEO locations, and many more classes _and_ query using the links).

   This is something that E-D-S, given that it's a single-purpose store,
   can't offer at this moment. Nor does E-D-S provide a rich query lang.
   like SPARQL for this purpose.

   E-D-S is more a get and set system with a rather flat query language.

   We didn't test query performance this time. Given the huge
   differences in capabilities it's probably not an interesting
   comparison the make between Tracker's RDF store and E-D-S.

   With Tracker's GraphUpdated you can do a auto-update-model comparable
   to EBookView's API. QSparql have such a model for Qt implemented.

o. Both E-D-S and Tracker's RDF store have at this moment the same or
   comparable security capabilities.

   With E-D-S you can have multiple addressbooks, with Tracker's RDF
   store you can have either GRAPHs or grouping with a nie:DataSource;
   or any other typical Nepomuk technique for this purpose (like adding
   a addressbook class, if necessary -- there are plenty of ways for
   this).


A few somewhat larger tests:
----------------------------

o. Tracker's RDF store with its newest INSERT OR REPLACE support:

   INSERT OR REPLACE is not yet available in master). For reference is
   the original that doesn't use INSERT OR REPLACE also being ran.

   The support for INSERT OR REPLACE is in branch sparql-update.

   More info on INSERT OR REPLACE here:
   http://pvanhoof.be/blog/index.php/2011/03/09/a-replace-extension-for-trackers-sparqls-update

pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 12.997943
ORIGINAL: 1000 contacts: 13.850410
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 15.442745
ORIGINAL: 1000 contacts: 27.525495
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 17.257888
ORIGINAL: 1000 contacts: 27.712315
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ 

The GraphUpdated stuff was cleanly being emitted, dbus-daemon was not
being flooded as GraphUpdated is engineered for these volumes of data
deltas.

More info on how GraphUpdated works here:
http://pvanhoof.be/blog/index.php/2010/08/24/trackers-new-class-signal-system-being-developed


o. And now E-D-S:

   I had to kill Evolution's UI because after the second run was my
   entire Desktop computer completely unresponsive and was Evolution's
   shell using 95% CPU. I do mention this because for just 2x 1000
   contacts is this behaviour in my opinion unacceptable.

   I know I will receive a lot of typical hate for saying this, but the
   Evolution team shouldn't feel too proud of this. In my opinion should
   NO amount of input make it possible to let Evolution's UI hang.

pvanhoof lors:~$ gcc test.c `pkg-config libebook-1.2 --cflags --libs`
pvanhoof lors:~$ ./a.out 
EDS 1000 contacts: 10.604454
pvanhoof lors:~$ ./a.out 
EDS 1000 contacts: 11.362855
pvanhoof lors:~$ evolution --force-shutdown
No response from Evolution -- killing the process

   That "No response from Evolution" should illustrate how bad the
   situation actually became; the process is not even responding to a
   private IPC asking it to cleanly shutdown. Forcing the tool to use
   the kernel's KILL instead. Yikes.


Some smaller tests (yields smaller differences too):
----------------------------------------------------

Luckily didn't Evolution die and crash on these smaller tests ...

pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 100 contacts: 1.348496
ORIGINAL: 100 contacts: 2.839032
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ cd /home/pvanhoof/
pvanhoof lors:~$ gcc test.c `pkg-config libebook-1.2 --cflags --libs`
pvanhoof lors:~$ ./a.out 
EDS 100 contacts: 1.000067
pvanhoof lors:~$ ./a.out 
EDS 100 contacts: 0.886793
pvanhoof lors:~$ ./a.out 
EDS 100 contacts: 0.902376
pvanhoof lors:~$ cd ~/repos/gnome/tracker/master/tests/functional-tests/ipc
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 100 contacts: 1.375554
ORIGINAL: 100 contacts: 2.631252
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 100 contacts: 1.448647
ORIGINAL: 100 contacts: 2.700024
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 100 contacts: 1.400238
ORIGINAL: 100 contacts: 2.787517
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ 


I'd love to receive performance numbers from different people, using
different circumstances. Please feel free to use Tracker's mailing list
for publicizing your results.


Cheers,

Philip

_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list


-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

Follow-Ups:
- Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
  - From: Adrien Bustany

References:
- [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
  - From: Philip Van Hoof

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]