Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data

From: Adrien Bustany <abustany gnome org>
To: Philip Van Hoof <philip codeminded be>
Cc: tracker-list gnome org
Subject: Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
Date: Tue, 15 Mar 2011 00:24:10 +0200

Quick followup, using qtcontacts-tracker to import a batch of 1000
contacts vs Philip's EDS test:

First run was on an empty EDS DB vs. an empty Tracker DB:
qtcontacts-tracker: 12 seconds
EDS: 190 seconds (WTF!)

The very likely answer is that I'm running btrfs, and for some reason
it does not like fsyncs (at all). EDS does not have a batching API, so
I suspect it does a fsync after each save, which would explain all the
disk trashing.

So I decided to run the tests again, this time LD-preloading
libeatmydata (http://www.flamingspork.com/projects/libeatmydata/) under
both EDS and Tracker:
qtcontacts-tracker: 12 seconds
EDS: 5,7 seconds

For this second run the CPU (core 2 duo 2ghz) is maxed and no disk
activity is seen.

Note that the version of qtcontacts-tracker I ran is not the fastest we
have, Matthias worked on optimizing the queries (~20% faster IIRC) and
Philip added INSERT OR REPLACE, not used here either. So a followup of
the followup with those features used would also be interesting.

So now, with those numbers in mind, take a decision: do you like apples
or oranges better?

Cheers

Adrien

Disclaimer: I drink apple juice in the morning

Le Mon, 14 Mar 2011 20:21:52 +0100,
Philip Van Hoof <philip codeminded be> a écrit :


Oh hi, I forgot to mention something important,

Note that the test for E-D-S does not first delete a possibly existing
contact. So each run you append 1000 contacts.

The test for Tracker does delete existing contacts before inserting
the new one. This means that you replace and don't append 1000
contacts.

This is of course a clear benefit that was given to the E-D-S
performance number (and still were the numbers close for both).

For 1000 INSERTs without any DELETE is Tracker's RDF store going to
finish in about 2.30s instead of 12s - 15s (just tested on a new test
run). It's indeed the case that the DELETEs create the biggest
performance penalty (this is why INSERT OR REPLACE is much faster).

To adapt the E-D-S test to do a more similar thing as what the Tracker
RDF store's test did, you would need to lookup each Contact and then
use e_book_remove_contact before doing the e_book_add_contact and the
e_book_commit_contact. I kinda didn't realize this while making the
test.

The lookup and the remove are expected to be bad news for E-D-S's
performance number, of course. But it's evening for me now! ;-)

I invite readers to make tests that do this, and feel free to report
back with some comparisons and numbers.


Cheers,

Philip


On Mon, 2011-03-14 at 19:57 +0100, Philip Van Hoof wrote:

Hi there,

I've done a performance analysis and comparison of Tracker's RDF
store versus E-D-S (Evolution Data Server).

I'll first make a summary of what I think is necessary for readers
to know and understand because both Tracker's RDF store and E-D-S
are in fact different products service a similar but different
purpose.

o. The VCard that I'm testing with for E-D-S isn't as complex as the
   Nepomuk-ontology based contact that the tests are saving in the
RDF example for Tracker.

   This means that this performance test isn't perfectly compared.
But I'm not a VCard expert myself. And results were rather clear
(for me) early on.

   I have included the source code of both test softwares in
attachment so that you can match the complexity to produce better
numbers.

o. Given that E-D-S is a single-purpose store for PIM data we
expected E-D-S to perform much better than Tracker's RDF store.
Tracker is of course not a single-purpose store. It can store all
within the realm of Nepomuk's ontology (which has a lot more
application and use-case domains than PIM)

   It is not the case that E-D-S performs much better, as the report
   below will illustrate.

o. I personally expected E-D-S to scale to large amounts of
contacts.

   But a simple loop of adding 2x 1000 contacts makes Evolution's
UI go flat on its face. The UI doesn't respond anymore at all. The
only thing that helps is a evolution --force-shutdown.

   The Evolution UI also makes the entire Desktop unresponsive.

   This is important for for example a platform like MeeGo: a
version of the Evolution UI _does_ run on MeeGo devices (making it
part of E-D-S as solution on aforementioned platform -- the UI can
and often will be running, my poor batteries).

   o. For a very very long time after the test.c 1000 contacts loop
has finished is Evolution running at 95% CPU. Doing who knows what
      (draining massive amounts of power, in any case)

      This makes me conclude that the API is cheating and returning
      earlier than allowed. Maybe not everything is finished, and I
      wonder what would happen if the system would crash. Would all
data be guaranteed to be persistently stored on the storage
hardware?

      Tracker's RDF store guarantees that at return of the API the
data will be stored (it has a journal for this).

o. After+while the data has been/is being entered into the store can
   Tracker's RDF store perform more complex queries and at the same
time it can allow queries that cross multiple use-case and
application domains (deeply link contacts to IM, E-mail, Photos,
Videos, Events, GEO locations, and many more classes _and_ query
using the links).

   This is something that E-D-S, given that it's a single-purpose
store, can't offer at this moment. Nor does E-D-S provide a rich
query lang. like SPARQL for this purpose.

   E-D-S is more a get and set system with a rather flat query
language.

   We didn't test query performance this time. Given the huge
   differences in capabilities it's probably not an interesting
   comparison the make between Tracker's RDF store and E-D-S.

   With Tracker's GraphUpdated you can do a auto-update-model
comparable to EBookView's API. QSparql have such a model for Qt
implemented.

o. Both E-D-S and Tracker's RDF store have at this moment the same
or comparable security capabilities.

   With E-D-S you can have multiple addressbooks, with Tracker's RDF
   store you can have either GRAPHs or grouping with a
nie:DataSource; or any other typical Nepomuk technique for this
purpose (like adding a addressbook class, if necessary -- there are
plenty of ways for this).


A few somewhat larger tests:
----------------------------

o. Tracker's RDF store with its newest INSERT OR REPLACE support:

   INSERT OR REPLACE is not yet available in master). For reference
is the original that doesn't use INSERT OR REPLACE also being ran.

   The support for INSERT OR REPLACE is in branch sparql-update.

   More info on INSERT OR REPLACE here:
   http://pvanhoof.be/blog/index.php/2011/03/09/a-replace-extension-for-trackers-sparqls-update

pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 12.997943
ORIGINAL: 1000 contacts: 13.850410
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 15.442745
ORIGINAL: 1000 contacts: 27.525495
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 17.257888
ORIGINAL: 1000 contacts: 27.712315
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ 

The GraphUpdated stuff was cleanly being emitted, dbus-daemon was
not being flooded as GraphUpdated is engineered for these volumes
of data deltas.

More info on how GraphUpdated works here:
http://pvanhoof.be/blog/index.php/2010/08/24/trackers-new-class-signal-system-being-developed


o. And now E-D-S:

   I had to kill Evolution's UI because after the second run was my
   entire Desktop computer completely unresponsive and was
Evolution's shell using 95% CPU. I do mention this because for just
2x 1000 contacts is this behaviour in my opinion unacceptable.

   I know I will receive a lot of typical hate for saying this, but
the Evolution team shouldn't feel too proud of this. In my opinion
should NO amount of input make it possible to let Evolution's UI
hang.

pvanhoof lors:~$ gcc test.c `pkg-config libebook-1.2 --cflags
--libs` pvanhoof lors:~$ ./a.out 
EDS 1000 contacts: 10.604454
pvanhoof lors:~$ ./a.out 
EDS 1000 contacts: 11.362855
pvanhoof lors:~$ evolution --force-shutdown
No response from Evolution -- killing the process

   That "No response from Evolution" should illustrate how bad the
   situation actually became; the process is not even responding to
a private IPC asking it to cleanly shutdown. Forcing the tool to use
   the kernel's KILL instead. Yikes.


Some smaller tests (yields smaller differences too):
----------------------------------------------------

Luckily didn't Evolution die and crash on these smaller tests ...

pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 100 contacts: 1.348496
ORIGINAL: 100 contacts: 2.839032
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$
cd /home/pvanhoof/ pvanhoof lors:~$ gcc test.c `pkg-config
libebook-1.2 --cflags --libs` pvanhoof lors:~$ ./a.out 
EDS 100 contacts: 1.000067
pvanhoof lors:~$ ./a.out 
EDS 100 contacts: 0.886793
pvanhoof lors:~$ ./a.out 
EDS 100 contacts: 0.902376
pvanhoof lors:~$ cd
~/repos/gnome/tracker/master/tests/functional-tests/ipc
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 100 contacts: 1.375554 ORIGINAL: 100 contacts: 2.631252
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 100 contacts: 1.448647
ORIGINAL: 100 contacts: 2.700024
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace 
REPLACE: 100 contacts: 1.400238
ORIGINAL: 100 contacts: 2.787517
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ 


I'd love to receive performance numbers from different people, using
different circumstances. Please feel free to use Tracker's mailing
list for publicizing your results.


Cheers,

Philip

_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list

Follow-Ups:
- Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
  - From: Adrien Bustany

References:
- [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
  - From: Philip Van Hoof
- Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
  - From: Philip Van Hoof

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]