Re: [Tracker] Performance issue in Tracker 0.7.x due to the sparql to sql translator
- From: Philip Van Hoof <spam pvanhoof be>
- To: Florent Viard <fviard lacie com>
- Cc: tracker-list gnome org
- Subject: Re: [Tracker] Performance issue in Tracker 0.7.x due to the sparql to sql translator
- Date: Fri, 15 Jan 2010 12:28:02 +0100
On Fri, 2010-01-15 at 11:43 +0100, Florent Viard wrote:
Hi,
We are working on using Tracker and faced some performance issues.
The objective is to retrieve multiple informations about audio files.
So we do a "big" Sparql query on a test base almost 10 000 files and
we face a really slow reply causing dbus to timeout.
DBus timeouts are always for the client to handle. Using this API in
dbus-glib:
dbus_g_proxy_set_default_timeout
First we thought it come from the a limit of the complexity of the
Sparql query causing a "parser stack overflow" error that is described
here:
http://mail.gnome.org/archives/tracker-list/2009-December/msg00106.html
But after simplifying the query we discover that the problem may come
from the conversion of the Sparql to SQL query in
tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala
It seems that the SQL generated query is not totally efficient and may
be improved.
Okay, I'll leave this part for JÃrg
After further investigations, it appears that
tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala is the
source of the problem. There is an issue with the way the SQL query is
generated.
You can often use function variables instead of optionals. They are
faster.
For example
SELECT nie:title (?thing) { ?thing a nie:InformationElement) }
Perhaps try rewriting some of them to use that?
Here is the Sparql query that we did:
"SELECT ?url ?title ?len ?creadate ?artnam ?sbpm ?genre ?bpm ?albtit ?ntrack ?com ?lastaddmod WHERE "
//?nbtack
"{ ?x a nmm:MusicPiece . "
"optional { ?x nie:isStoredAs ?url } ."
"optional { ?x nie:title ?title } ."
"optional { ?x nie:comment ?com } ."
"optional { ?x nie:contentCreated ?creadate } ."
"optional { ?x tracker:added ?lastaddmod } ."
"optional { ?x nfo:genre ?genre } ."
"optional { ?x nfo:averageBitrate ?sbpm} ."
"optional { ?x nfo:sampleRate ?bpm } ."
"optional { ?x nmm:length ?len } ."
"optional { ?x nmm:performer ?artist . ?artist
nmm:artistName ?artnam } ."
"optional { ?x nmm:musicAlbum ?album . ?album
nmm:albumTitle ?albtit } ."
"optional { ?x nmm:trackNumber ?ntrack } ."
// "optional { ?x
nmm:albumTrackCount ?nbtrack} ." //Removed because of the parser
limited stack size
" } LIMIT %d OFFSET %d";
And attached to this mail [ugly-tracker-sql-sparql.txt] is the
automatically generated SQL (Reformatted to be human readable).
Trying directly on SQLite this query with the 11k files dataset takes
around 2 / 3 minutes to return a result.
In the second attached file [good-tracker-sql.txt] is the same SQL
request, simplified by me, that is like tracker is supposed to
generate the SQL from the Sparql query.
That query executed with the same 11k files dataset on SQLite, gave a
result in only 4 / 5 seconds. (And there, the complexity of the Sparql
query could be increased).
I tried to modify myself the source code for this function, but it was
too difficult for me because of lack of knowledge on the variables
used by the Tracker's internal Sparql parser.
So I think you should modify the code in tracker-sparql-query.vala to
generate a query like this by changing the creation of LEFT JOIN by
the creation of a SELECT sub-query in the function
translate_group_graph_pattern.
I know the common SQL "tips" on Internet that say that LEFT (and
RIGHT) JOIN are more efficient than sub-queries. But here, always
having ID as an index in all the tables could explain the difference.
A good idea could be to use SELECT sub-queries inside the main SELECT
for the optionals of a same subject.
Regards,
--
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be
gnome: pvanhoof at gnome dot org
http://pvanhoof.be/blog
http://codeminded.be
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]