Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data

From: Adrien Bustany <abustany gnome org>
To: Adrien Bustany <abustany gnome org>
Cc: Philip Van Hoof <philip codeminded be>, tracker-list gnome org
Subject: Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
Date: Tue, 15 Mar 2011 00:28:48 +0200

Le Tue, 15 Mar 2011 00:24:10 +0200,
Adrien Bustany <abustany gnome org> a écrit :

Quick followup, using qtcontacts-tracker to import a batch of 1000
contacts vs Philip's EDS test:


Quick precision: I used the "batch saving" benchmark from
qtcontacts-tracker with --contacts=1000 --batch-size=1000 :
http://gitorious.org/qtcontacts-tracker/qtcontacts-tracker/blobs/master/benchmarks/bm_qtcontacts_trackerplugin_batchsaving.cpp


First run was on an empty EDS DB vs. an empty Tracker DB:
qtcontacts-tracker: 12 seconds
EDS: 190 seconds (WTF!)

The very likely answer is that I'm running btrfs, and for some reason
it does not like fsyncs (at all). EDS does not have a batching API, so
I suspect it does a fsync after each save, which would explain all the
disk trashing.

So I decided to run the tests again, this time LD-preloading
libeatmydata (http://www.flamingspork.com/projects/libeatmydata/)
under both EDS and Tracker:
qtcontacts-tracker: 12 seconds
EDS: 5,7 seconds

For this second run the CPU (core 2 duo 2ghz) is maxed and no disk
activity is seen.

Note that the version of qtcontacts-tracker I ran is not the fastest
we have, Matthias worked on optimizing the queries (~20% faster IIRC)
and Philip added INSERT OR REPLACE, not used here either. So a
followup of the followup with those features used would also be
interesting.

So now, with those numbers in mind, take a decision: do you like
apples or oranges better?

Cheers

Adrien

Disclaimer: I drink apple juice in the morning

Le Mon, 14 Mar 2011 20:21:52 +0100,
Philip Van Hoof <philip codeminded be> a écrit :


Oh hi, I forgot to mention something important,

Note that the test for E-D-S does not first delete a possibly
existing contact. So each run you append 1000 contacts.

The test for Tracker does delete existing contacts before inserting
the new one. This means that you replace and don't append 1000
contacts.

This is of course a clear benefit that was given to the E-D-S
performance number (and still were the numbers close for both).

For 1000 INSERTs without any DELETE is Tracker's RDF store going to
finish in about 2.30s instead of 12s - 15s (just tested on a new
test run). It's indeed the case that the DELETEs create the biggest
performance penalty (this is why INSERT OR REPLACE is much faster).

To adapt the E-D-S test to do a more similar thing as what the
Tracker RDF store's test did, you would need to lookup each Contact
and then use e_book_remove_contact before doing the
e_book_add_contact and the e_book_commit_contact. I kinda didn't
realize this while making the test.

The lookup and the remove are expected to be bad news for E-D-S's
performance number, of course. But it's evening for me now! ;-)

I invite readers to make tests that do this, and feel free to report
back with some comparisons and numbers.


Cheers,

Philip


On Mon, 2011-03-14 at 19:57 +0100, Philip Van Hoof wrote:

Hi there,

I've done a performance analysis and comparison of Tracker's RDF
store versus E-D-S (Evolution Data Server).

I'll first make a summary of what I think is necessary for readers
to know and understand because both Tracker's RDF store and E-D-S
are in fact different products service a similar but different
purpose.

o. The VCard that I'm testing with for E-D-S isn't as complex as
the Nepomuk-ontology based contact that the tests are saving in
the RDF example for Tracker.

This means that this performance test isn't perfectly compared.
But I'm not a VCard expert myself. And results were rather clear
(for me) early on.

I have included the source code of both test softwares in
attachment so that you can match the complexity to produce better
numbers.

o. Given that E-D-S is a single-purpose store for PIM data we
expected E-D-S to perform much better than Tracker's RDF store.
Tracker is of course not a single-purpose store. It can store all
within the realm of Nepomuk's ontology (which has a lot more
application and use-case domains than PIM)

It is not the case that E-D-S performs much better, as the
report below will illustrate.

o. I personally expected E-D-S to scale to large amounts of
contacts.

But a simple loop of adding 2x 1000 contacts makes Evolution's
UI go flat on its face. The UI doesn't respond anymore at all. The
only thing that helps is a evolution --force-shutdown.

The Evolution UI also makes the entire Desktop unresponsive.

This is important for for example a platform like MeeGo: a
version of the Evolution UI _does_ run on MeeGo devices (making it
part of E-D-S as solution on aforementioned platform -- the UI can
and often will be running, my poor batteries).

o. For a very very long time after the test.c 1000 contacts
loop has finished is Evolution running at 95% CPU. Doing who
knows what (draining massive amounts of power, in any case)

This makes me conclude that the API is cheating and
returning earlier than allowed. Maybe not everything is finished,
and I wonder what would happen if the system would crash. Would
all data be guaranteed to be persistently stored on the storage
hardware?

Tracker's RDF store guarantees that at return of the API the
data will be stored (it has a journal for this).

o. After+while the data has been/is being entered into the store
can Tracker's RDF store perform more complex queries and at the
same time it can allow queries that cross multiple use-case and
application domains (deeply link contacts to IM, E-mail, Photos,
Videos, Events, GEO locations, and many more classes _and_ query
using the links).

This is something that E-D-S, given that it's a single-purpose
store, can't offer at this moment. Nor does E-D-S provide a rich
query lang. like SPARQL for this purpose.

E-D-S is more a get and set system with a rather flat query
language.

We didn't test query performance this time. Given the huge
differences in capabilities it's probably not an interesting
comparison the make between Tracker's RDF store and E-D-S.

With Tracker's GraphUpdated you can do a auto-update-model
comparable to EBookView's API. QSparql have such a model for Qt
implemented.

o. Both E-D-S and Tracker's RDF store have at this moment the same
or comparable security capabilities.

With E-D-S you can have multiple addressbooks, with Tracker's
RDF store you can have either GRAPHs or grouping with a
nie:DataSource; or any other typical Nepomuk technique for this
purpose (like adding a addressbook class, if necessary -- there
are plenty of ways for this).

A few somewhat larger tests:
----------------------------

o. Tracker's RDF store with its newest INSERT OR REPLACE support:

INSERT OR REPLACE is not yet available in master). For
reference is the original that doesn't use INSERT OR REPLACE also
being ran.

The support for INSERT OR REPLACE is in branch sparql-update.

More info on INSERT OR REPLACE here:
http://pvanhoof.be/blog/index.php/2011/03/09/a-replace-extension-for-trackers-sparqls-update

pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 1000 contacts: 12.997943
ORIGINAL: 1000 contacts: 13.850410
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 1000 contacts: 15.442745
ORIGINAL: 1000 contacts: 27.525495
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 1000 contacts: 17.257888
ORIGINAL: 1000 contacts: 27.712315
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$

The GraphUpdated stuff was cleanly being emitted, dbus-daemon was
not being flooded as GraphUpdated is engineered for these volumes
of data deltas.

More info on how GraphUpdated works here:
http://pvanhoof.be/blog/index.php/2010/08/24/trackers-new-class-signal-system-being-developed

o. And now E-D-S:

I had to kill Evolution's UI because after the second run was
my entire Desktop computer completely unresponsive and was
Evolution's shell using 95% CPU. I do mention this because for
just 2x 1000 contacts is this behaviour in my opinion
unacceptable.

I know I will receive a lot of typical hate for saying this,
but the Evolution team shouldn't feel too proud of this. In my
opinion should NO amount of input make it possible to let
Evolution's UI hang.

pvanhoof lors:~$ gcc test.c `pkg-config libebook-1.2 --cflags
--libs` pvanhoof lors:~$ ./a.out
EDS 1000 contacts: 10.604454
pvanhoof lors:~$ ./a.out
EDS 1000 contacts: 11.362855
pvanhoof lors:~$ evolution --force-shutdown
No response from Evolution -- killing the process

That "No response from Evolution" should illustrate how bad the
situation actually became; the process is not even responding
to a private IPC asking it to cleanly shutdown. Forcing the tool
to use the kernel's KILL instead. Yikes.

Some smaller tests (yields smaller differences too):
----------------------------------------------------

Luckily didn't Evolution die and crash on these smaller tests ...

pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 100 contacts: 1.348496
ORIGINAL: 100 contacts: 2.839032
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$
cd /home/pvanhoof/ pvanhoof lors:~$ gcc test.c `pkg-config
libebook-1.2 --cflags --libs` pvanhoof lors:~$ ./a.out
EDS 100 contacts: 1.000067
pvanhoof lors:~$ ./a.out
EDS 100 contacts: 0.886793
pvanhoof lors:~$ ./a.out
EDS 100 contacts: 0.902376
pvanhoof lors:~$ cd
~/repos/gnome/tracker/master/tests/functional-tests/ipc
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 100 contacts: 1.375554 ORIGINAL: 100 contacts: 2.631252
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 100 contacts: 1.448647
ORIGINAL: 100 contacts: 2.700024
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 100 contacts: 1.400238
ORIGINAL: 100 contacts: 2.787517
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$

I'd love to receive performance numbers from different people,
using different circumstances. Please feel free to use Tracker's
mailing list for publicizing your results.

Cheers,

Philip

_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list


_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list

References:
- [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
  - From: Philip Van Hoof
- Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
  - From: Philip Van Hoof
- Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
  - From: Adrien Bustany

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]