Re: [Tracker] Disk usage optimization



Op 13/06/2013 1:22, Ivan Frade schreef:

Hi Ivan,

Some other ideas, if your use cases are limited:

 You could disable the indeces you dont need. They use some space in sqlite.

True. Minizing the usage of tracker:indexed and especially tracker:domainIndex is a good idea to reduce storage. Although this will have a serious impact on performance of queries using the fields. So I wouldn't recommend this for everybody.

 For some properties, we store its value and collation to sort correctly in different locales. If you don't need that sorting, you could remove this duplication.
Correct. I almost forgot about that one. This will, however, mean that it's not possible to sort correctly on that field anymore? Ideally if we remove the collation column we can still sort correctly but then only slower. Afaik that should be possible and/or is already the case, no?

 You could also prune the extractors to get *only* the information you need... specially text properties.
Right. I wonder if it's worthwhile to try to make this possible for upstream by having it configurable per extractor. For example in the .rule file of an extractor module we could specify which properties to extract (if they are available), and then having some infrastructure to avoid huge amounts of if-then-else in the extractor modules' code.

Tanks for the tips, especially the one about the collator column which I had forget about myself.

Kind regards,

Philip





On Wed, Jun 12, 2013 at 2:50 AM, Martyn Russell <martyn lanedo com> wrote:
On 12/06/13 09:00, Philip Van Hoof wrote:
HI guys,

Hello Philip,


For one of my customers I'm getting the question how to reduce the disk
usage.

Do you have a requirement here?
How much are you looking to reduce it by?
What is it now?
What are your limits, etc?


I wrote the journalling and periodic backup of meta.db myself so I of
course know how to disable these, what the consequences are and how to
ensure that all still works and all that ;)

My question to the team is to think with me on how we can further reduce
disk space usage for products where this is a consideration (for example
embedded appliances where additional storage is an expensive component
if it has to be large).

Next to disabling journaling and using synchronous mode in SQLite after
putting meta.db in .local and adapting the Backup/Restore to operate on
the main meta.db instead of the journal or periodic backup, I was
thinking about disabling fts, but also disabling extracting and mining
of nie:plainTextContent.

Absolutely, this should make quite some difference to the DB size.

  ./configure --disable-tracker-fts

I would start here.


But also a perhaps crazy idea would be to implement a virtual table for
SQLite that can compress certain literals' columns. A kind of the
opposite of a indexed property: it'll be very slow, but as it is rarely
queried on it's fine that it is slow. Just that the property's value
must still be stored for the times when it is needed.

Do you have a real use case in mind here?


For example for properties like nie:plainTextContent, but then per
resource would the cell be stored compressed or not (and all SQLite
access to it would decompress it, for example collation would).

The problem is that many users want nie:plainTextContent to be there,
but they don't want it to consume so much diskspace (and it can be slow
to access it).

Another idea could be filesystem specific: pointing in SQLite, somehow,
to the inode of the FS straight to the contents of the file whenever the
file is a plain text one. This might be even more crazy. I don't know.

You may sacrifice speed here, we would also need to consider how to cater for cases where the file is not tracker:available of course.


Putting all of meta.db on a compressed filesystem is also an idea.

We need more information about what you're limits are first I would say.

--
Regards,
Martyn

Founder and CEO of Lanedo GmbH.

_______________________________________________
tracker-list mailing list
tracker-list gnome org
https://mail.gnome.org/mailman/listinfo/tracker-list



_______________________________________________
tracker-list mailing list
tracker-list gnome org
https://mail.gnome.org/mailman/listinfo/tracker-list



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]