[Tracker] =?iso-8859-1?q?RE=A0=3A__Performance_issue_in_Tracker_0?= =?iso-8859-1?q?=2E7=2Ex_due_to_the_sparql_to_sql_translator?=



Title: RE : [Tracker] Performance issue in Tracker 0.7.x due to the sparql to sql translator

Hi,

Thank you for your replies and for the help provided on IRC.
In our situation, we don't have the choice to do a "smaller" query.

But such a request is not so hard (well formulated) for Sqlite.
For your information, here is how we had improved the performance of our query (maybe it could be useful for you on Maemo):
First using functional variables improved a lot: (With a LIMIT 3000) >From 40 seconds a query, to only 15 seconds.


                SELECT  nie:isStoredAs(?x) nie:title(?x) nmm:length(?x) nie:contentCreated(?x) nmm:artistName(?artist) nfo:averageBitrate(?x) nfo:genre(?x) nfo:sampleRate(?x)  nmm:albumTitle(?album) nmm:trackNumber(?x) nie:comment(?x) tracker:added(?x)
                WHERE
                { ?x a nmm:MusicPiece .
                optional { ?x nmm:performer ?artist } .
                optional { ?x nmm:musicAlbum ?album } .
                 } LIMIT 1000

Then putting in optionals all the items that are in the "domain"(don't know if it is the proper term) of ?x : query executed in only 5 secondes

                "SELECT  ?url ?title ?len ?creadate nmm:artistName(?artist) ?sbpm ?genre nfo:sampleRate(?x)  nmm:albumTitle(?album) ?ntrack ?com ?lastaddmod nmm:albumTrackCount(?album) WHERE "
                "{ ?x a nmm:MusicPiece . "
                "optional { ?x nmm:length ?len } ."
                "optional { ?x nmm:trackNumber ?ntrack } ."
                "optional { ?x nie:isStoredAs ?url } ."
                "optional { ?x nie:title ?title } ."
                "optional { ?x nie:comment ?com } ."
                "optional { ?x nie:contentCreated ?creadate } ."
                "optional { ?x tracker:added ?lastaddmod } ."
                "optional { ?x nfo:genre ?genre } ."
                "optional { ?x nfo:averageBitrate ?sbpm} ."
                "optional { ?x nmm:performer ?artist } ."
                "optional { ?x nmm:musicAlbum ?album } ."
                " } LIMIT 1000"

(We could even have more properties now, without parser stack overflow!)

I think that an improvement of the Sparql to SQL translator could be to automatically transform the optionals that have the same subject (ie ?x) and a property outside of the domain (ex. nfo:sampleRate) to a functional variable. Then, you could forget to change YY STACK DEPTH, that is not really interesting.

By the way, have you any idea of any other optimization that could be done on this query? (And do you know how we can insert a SQL query directly inside a Sparql query?)

Regards,

Florent Viard
Software Development Engineer
........................................
fviard lacie com
Phone: +33 1 58 49 57 20
Fax: +33 1 58 49 57 20
33, boulevard du Général Martial Valin
75015 Paris - FRANCE
www.lacie.com
Please print only if necessary




-------- Message d'origine--------
De: Ivan Frade [mailto:ivan frade gmail com]
Date: ven. 15/01/2010 16:09
À: Florent Viard
Cc: tracker-list gnome org
Objet : Re: [Tracker] Performance issue in Tracker 0.7.x due to the sparql to sql translator

Hi Florent

On Fri, Jan 15, 2010 at 12:43 PM, Florent Viard <fviard lacie com> wrote:

>  Hi,
>
> We are working on using Tracker and faced some performance issues.
> The objective is to retrieve multiple informations about audio files.
> So we do a "big" Sparql query on a test base almost 10 000 files
>
That is the first problem. Retrieve the information of 10.000 files is
always a bad idea, in Tracker, SQLite or MS SQL Server. In general, the easy
solution of "give me all and i keep it in memory" doesn't work with tracker.
You need to retrieve _only_ what you need, and better in an _incremental_
way.

Are you showing 10.000 items in the UI? Maybe you can ask the results in
windows of 100 items (check COUNT and LIMIT).
Do you need all those details in the UI? like "sampleRate"? if not, don't
ask for them. Every string you select add time to the request.

So, in general: Retrieve only what you need to show in the UI, either in
amount of items and in amount of details per-item.

Obviously our SparQL -> SQL translation is not perfect and we will fix that
also :) Thanks for the report!

BTW, in maemo we compile sqlite with extra YY STACK DEPTH to allow more
complex queries.

Regards,

Ivan


>
> First we thought it come from the a limit of the complexity of the Sparql
> query causing a "parser stack overflow" error that is described here:
> http://mail.gnome.org/archives/tracker-list/2009-December/msg00106.html
> But after simplifying the query we discover that the problem may come from
> the conversion of the Sparql to SQL query in
> tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala
> It seems that the SQL generated query is not totally efficient and may be
> improved.
>
> After further investigations, it appears that
> tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala is the source
> of the problem. There is an issue with the way the SQL query is generated.
>
> Here is the Sparql query that we did:
>                 "SELECT ?url ?title ?len ?creadate ?artnam ?sbpm ?genre
> ?bpm ?albtit ?ntrack ?com ?lastaddmod WHERE " //?nbtack
>                 "{ ?x a nmm:MusicPiece . "
>                 "optional { ?x nie:isStoredAs ?url } ."
>                 "optional { ?x nie:title ?title } ."
>                 "optional { ?x nie:comment ?com } ."
>                 "optional { ?x nie:contentCreated ?creadate } ."
>                 "optional { ?x tracker:added ?lastaddmod } ."
>                 "optional { ?x nfo:genre ?genre } ."
>                 "optional { ?x nfo:averageBitrate ?sbpm} ."
>                 "optional { ?x nfo:sampleRate ?bpm } ."
>                 "optional { ?x nmm:length ?len } ."
>                 "optional { ?x nmm:performer ?artist . ?artist
> nmm:artistName ?artnam } ."
>                 "optional { ?x nmm:musicAlbum ?album . ?album
> nmm:albumTitle ?albtit } ."
>                 "optional { ?x nmm:trackNumber ?ntrack } ."
>         //       "optional { ?x nmm:albumTrackCount ?nbtrack} ."  //Removed
> because of the parser limited stack size
>                 " } LIMIT %d OFFSET %d";
>
> And attached to this mail [ugly-tracker-sql-sparql.txt] is the
> automatically generated SQL (Reformatted to be human readable).
> Trying directly on SQLite this query with the 11k files dataset takes
> around 2 / 3 minutes to return a result.
>
> In the second attached file [good-tracker-sql.txt] is the same SQL request,
> simplified by me, that is like tracker is supposed to generate the SQL from
> the Sparql query.
> That query executed with the same 11k files dataset on SQLite, gave a
> result in only 4 / 5 seconds. (And there, the complexity of the Sparql query
> could be increased).
>
> I tried to modify myself the source code for this function, but it was too
> difficult for me because of lack of knowledge on the variables used by the
> Tracker's internal Sparql parser.
>
> So I think you should modify the code in tracker-sparql-query.vala to
> generate a query like this by changing the creation of LEFT JOIN by the
> creation of a SELECT sub-query in the function
> translate_group_graph_pattern.
>
> I know the common SQL "tips" on Internet that say that LEFT (and RIGHT)
> JOIN are more efficient than sub-queries. But here, always having ID as an
> index in all the tables could explain the difference.
>
> A good idea could be to use SELECT sub-queries inside the main SELECT for
> the optionals of a same subject.
>
> Regards,
>
>
> Florent Viard
> Software Development Engineer
> ........................................
> fviard lacie com
> www.lacie.com
>
>
>
>
>
>
> *This e-mail and any attachment are confidential and intended solely for
> the use of the individual to whom it is addressed. If you are not the
> intended recipient, please telephone or email the sender and delete this
> message and any attachment from your system. Unauthorized publication, use,
> dissemination, forwarding, printing or copying of this e-mail and its
> associated attachments is strictly prohibited.*
>
> _______________________________________________
> tracker-list mailing list
> tracker-list gnome org
> http://mail.gnome.org/mailman/listinfo/tracker-list
>
>


This e-mail and any attachment are confidential and intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system. Unauthorized publication, use, dissemination, forwarding, printing or copying of this e-mail and its associated attachments is strictly prohibited.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]