Re: [Tracker] Clues regarding improving performance of tracker-store



On 13 July 2013 09:29, Philip Van Hoof <philip codeminded be> wrote:
Ivan Frade schreef op 12/07/2013 18:52:

Hi guys,

I plan to write a more detailed guide to improving insert performance when I
have more time. This weekend I'm very busy with moving from my gf's
appartment to my newly renovated house ;), so i'll keep it short.

I hope the move goes well! That guide you're mentioning sounds very
interesting indeed. I'll keep an eye out for it!


Important is to use the INSERT OR REPLACE feature instead of DELETE+INSERT,
another thing you can do is increase the LRU cache size and tweak the
various buffer sizes we have in tracker-data-update.c.

Interesting, I will have a look at this.


Finally changing the ontology could help. But because of the decomposed
schema you wouln't touch tables of ontology domains that aren't related to
your insert of data a lot.

Except indeed when there are hierarchies. So if a total ontology rewrite is
fine, try to reduce inheritance. Aggregation over inheritance ('has a'
instead of 'is a') in the ontology will often be faster, but it also depends
on a variety of things for which you should study the insert queries that we
generate for a given insert v. ontology situation. Aggregation will often
make your SELECT queries more complicated (and if you need the data,
probably slower too). We optimized first for read speed, then write speed.

It would be very unfortunate to kill the reading speed in favor of
writing. I will have a good look at re-working some ontologies
however, and see how speeds are affected overall, perhaps it is
possible to find a good balance. If I come up with anything
interesting I will of course let you guys know.


The inserting, updating and deleting on the SQL layer itself is by the way
not the only thing that influences insert performance. The SPARQL parsing,
buffering and grouping into transations among other things (like IPC
overhead) also play a role. Although I must say that after so many years of
being plagued by Nokians who didn't like Tracker because it was Not Invented
Here (not by their own team) and somewhat enforced upon them, we did ensure
that it's really really optimized (and teams where challenged to find
performance improvements and open bugs on them, instead of making empty
arguments that it's not). It would surprise me if you'd find a single strdup
or malloc that shouldn't be there, for example. But I'll be more than happy
if you eliminate one.

Next you have indexes and domain specific indexes that'll slow down
inserting. And you have the signals on changes that you can turn off on a
class, which will have a memory usage and performance impact while inserting
(not doing something is always faster than doing something).

Also interesting. I assume these indexes are crucial for good lookup
speeds however? I will definitely have a look at them and again try to
find a good balance here.


If you don't need FTS, then disabling FTS should make a huge performance
improvement. For the same reason (a lot of things wont be done anymore,
which is always faster than doing them. But FTS is also a nice feature to
have. So make your choice).

Good point! I remember this being mentioned earlier on the list. It
might be possible to work around FTS not being available if
performance is motivating enough. Thanks for all the tips!


--
Regards,
Jonatan Pålsson

Pelagicore AB
Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]