Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
- From: Philip Van Hoof <philip codeminded be>
- To: tracker-list gnome org
- Subject: Re: [Tracker] Performance comparison between E-D-S and Tracker's RDF store for storing contact data
- Date: Mon, 14 Mar 2011 20:21:52 +0100
Oh hi, I forgot to mention something important,
Note that the test for E-D-S does not first delete a possibly existing
contact. So each run you append 1000 contacts.
The test for Tracker does delete existing contacts before inserting the
new one. This means that you replace and don't append 1000 contacts.
This is of course a clear benefit that was given to the E-D-S
performance number (and still were the numbers close for both).
For 1000 INSERTs without any DELETE is Tracker's RDF store going to
finish in about 2.30s instead of 12s - 15s (just tested on a new test
run). It's indeed the case that the DELETEs create the biggest
performance penalty (this is why INSERT OR REPLACE is much faster).
To adapt the E-D-S test to do a more similar thing as what the Tracker
RDF store's test did, you would need to lookup each Contact and then use
e_book_remove_contact before doing the e_book_add_contact and the
e_book_commit_contact. I kinda didn't realize this while making the
test.
The lookup and the remove are expected to be bad news for E-D-S's
performance number, of course. But it's evening for me now! ;-)
I invite readers to make tests that do this, and feel free to report
back with some comparisons and numbers.
Cheers,
Philip
On Mon, 2011-03-14 at 19:57 +0100, Philip Van Hoof wrote:
Hi there,
I've done a performance analysis and comparison of Tracker's RDF store
versus E-D-S (Evolution Data Server).
I'll first make a summary of what I think is necessary for readers to
know and understand because both Tracker's RDF store and E-D-S are in
fact different products service a similar but different purpose.
o. The VCard that I'm testing with for E-D-S isn't as complex as the
Nepomuk-ontology based contact that the tests are saving in the RDF
example for Tracker.
This means that this performance test isn't perfectly compared. But
I'm not a VCard expert myself. And results were rather clear (for me)
early on.
I have included the source code of both test softwares in attachment
so that you can match the complexity to produce better numbers.
o. Given that E-D-S is a single-purpose store for PIM data we expected
E-D-S to perform much better than Tracker's RDF store. Tracker is of
course not a single-purpose store. It can store all within the realm
of Nepomuk's ontology (which has a lot more application and use-case
domains than PIM)
It is not the case that E-D-S performs much better, as the report
below will illustrate.
o. I personally expected E-D-S to scale to large amounts of contacts.
But a simple loop of adding 2x 1000 contacts makes Evolution's UI go
flat on its face. The UI doesn't respond anymore at all. The only
thing that helps is a evolution --force-shutdown.
The Evolution UI also makes the entire Desktop unresponsive.
This is important for for example a platform like MeeGo: a version of
the Evolution UI _does_ run on MeeGo devices (making it part of E-D-S
as solution on aforementioned platform -- the UI can and often will
be running, my poor batteries).
o. For a very very long time after the test.c 1000 contacts loop has
finished is Evolution running at 95% CPU. Doing who knows what
(draining massive amounts of power, in any case)
This makes me conclude that the API is cheating and returning
earlier than allowed. Maybe not everything is finished, and I
wonder what would happen if the system would crash. Would all data
be guaranteed to be persistently stored on the storage hardware?
Tracker's RDF store guarantees that at return of the API the data
will be stored (it has a journal for this).
o. After+while the data has been/is being entered into the store can
Tracker's RDF store perform more complex queries and at the same time
it can allow queries that cross multiple use-case and application
domains (deeply link contacts to IM, E-mail, Photos, Videos, Events,
GEO locations, and many more classes _and_ query using the links).
This is something that E-D-S, given that it's a single-purpose store,
can't offer at this moment. Nor does E-D-S provide a rich query lang.
like SPARQL for this purpose.
E-D-S is more a get and set system with a rather flat query language.
We didn't test query performance this time. Given the huge
differences in capabilities it's probably not an interesting
comparison the make between Tracker's RDF store and E-D-S.
With Tracker's GraphUpdated you can do a auto-update-model comparable
to EBookView's API. QSparql have such a model for Qt implemented.
o. Both E-D-S and Tracker's RDF store have at this moment the same or
comparable security capabilities.
With E-D-S you can have multiple addressbooks, with Tracker's RDF
store you can have either GRAPHs or grouping with a nie:DataSource;
or any other typical Nepomuk technique for this purpose (like adding
a addressbook class, if necessary -- there are plenty of ways for
this).
A few somewhat larger tests:
----------------------------
o. Tracker's RDF store with its newest INSERT OR REPLACE support:
INSERT OR REPLACE is not yet available in master). For reference is
the original that doesn't use INSERT OR REPLACE also being ran.
The support for INSERT OR REPLACE is in branch sparql-update.
More info on INSERT OR REPLACE here:
http://pvanhoof.be/blog/index.php/2011/03/09/a-replace-extension-for-trackers-sparqls-update
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 1000 contacts: 12.997943
ORIGINAL: 1000 contacts: 13.850410
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 1000 contacts: 15.442745
ORIGINAL: 1000 contacts: 27.525495
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 1000 contacts: 17.257888
ORIGINAL: 1000 contacts: 27.712315
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$
The GraphUpdated stuff was cleanly being emitted, dbus-daemon was not
being flooded as GraphUpdated is engineered for these volumes of data
deltas.
More info on how GraphUpdated works here:
http://pvanhoof.be/blog/index.php/2010/08/24/trackers-new-class-signal-system-being-developed
o. And now E-D-S:
I had to kill Evolution's UI because after the second run was my
entire Desktop computer completely unresponsive and was Evolution's
shell using 95% CPU. I do mention this because for just 2x 1000
contacts is this behaviour in my opinion unacceptable.
I know I will receive a lot of typical hate for saying this, but the
Evolution team shouldn't feel too proud of this. In my opinion should
NO amount of input make it possible to let Evolution's UI hang.
pvanhoof lors:~$ gcc test.c `pkg-config libebook-1.2 --cflags --libs`
pvanhoof lors:~$ ./a.out
EDS 1000 contacts: 10.604454
pvanhoof lors:~$ ./a.out
EDS 1000 contacts: 11.362855
pvanhoof lors:~$ evolution --force-shutdown
No response from Evolution -- killing the process
That "No response from Evolution" should illustrate how bad the
situation actually became; the process is not even responding to a
private IPC asking it to cleanly shutdown. Forcing the tool to use
the kernel's KILL instead. Yikes.
Some smaller tests (yields smaller differences too):
----------------------------------------------------
Luckily didn't Evolution die and crash on these smaller tests ...
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 100 contacts: 1.348496
ORIGINAL: 100 contacts: 2.839032
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ cd /home/pvanhoof/
pvanhoof lors:~$ gcc test.c `pkg-config libebook-1.2 --cflags --libs`
pvanhoof lors:~$ ./a.out
EDS 100 contacts: 1.000067
pvanhoof lors:~$ ./a.out
EDS 100 contacts: 0.886793
pvanhoof lors:~$ ./a.out
EDS 100 contacts: 0.902376
pvanhoof lors:~$ cd ~/repos/gnome/tracker/master/tests/functional-tests/ipc
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 100 contacts: 1.375554
ORIGINAL: 100 contacts: 2.631252
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 100 contacts: 1.448647
ORIGINAL: 100 contacts: 2.700024
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$ ./test-insert-or-replace
REPLACE: 100 contacts: 1.400238
ORIGINAL: 100 contacts: 2.787517
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests/ipc$
I'd love to receive performance numbers from different people, using
different circumstances. Please feel free to use Tracker's mailing list
for publicizing your results.
Cheers,
Philip
_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list
--
Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]