Re: [Tracker] Disk usage optimization

From: Philip Van Hoof <philip codeminded be>
To: Ivan Frade <ivan frade gmail com>
Cc: Jürg Billeter <juerg billeter codethink co uk>, Philip Van Hoof <philip codeminded be>, Tracker mailing list <tracker-list gnome org>
Subject: Re: [Tracker] Disk usage optimization
Date: Fri, 14 Jun 2013 23:24:01 +0200

Op 14/06/2013 20:13, Ivan Frade schreef:

Hi Philip,

My ideas are more in the line of "fine tune Tracker for your specific use case". Don't think they apply to master.

Hmm, Given that Tracker is fit and designed for embedded use-cases I don't think the project should not allow compile time and/or configure time configurability of behaviour.

For example the --disable-journal and the --disable-fts are also behaviour changes. We could equally easily add a --disable-collator-column and a --disable-plaintext-extraction so that system integrators can easily build a Tracker package that is optimized for storage instead of optimized for performance.

For longer term future I'd even go as far as to easily allow replacing the entire ontology. Although I think for that we should rather bring libtracker-sparql and tracker-store together as libsparql-store, have a semantic-nepomuk-desktop package that installs the ontology and let libtracker-miner and tracker-miner-fs be packages that depend on semantic-nepomuk-desktop and libsparql-store.

And then on tracker-miner-fs have a --disable-plaintext-extraction and on libsparql-store have a --disable-journal, --disable-fts and --disable-collator-column.

This would effectively mean so-called splitting the project. But I've always felt that in the long term this should happen. It would also allow tracker-miner-fs to focus more on the mining and indexing of files, and libsparql-store on being a embedded and/or highly efficient and reliable SPARQL endpoint and SPARQL INSERT store.

I also think that libtracker-extract should probably move towards a truly publicly usable libmetadata-extract which exposes buffer and stream based metadata extraction for not just tracker-extract but for any program that needs this. Although it would use this libmetadata-extract just like how it uses libtracker-extract now, the tracker-extract binary should be an implementation detail of tracker-miner-fs after that. The problem I see with the current architecture of tracker-extract as the service to do metadata extraction is that it can only work well for file based metadata extraction, while the world of metadata is massively, insanely massivele larger than just files on your filesystem. If you just open your eyes to see it.

On Fri, Jun 14, 2013 at 6:39 AM, Philip Van Hoof <philip codeminded be> wrote:

Op 13/06/2013 1:22, Ivan Frade schreef:

Hi Ivan,

For some properties, we store its value and collation to sort correctly in different locales. If you don't need that sorting, you could remove this duplication.

Correct. I almost forgot about that one. This will, however, mean that it's not possible to sort correctly on that field anymore? Ideally if we remove the collation column we can still sort correctly but then only slower. Afaik that should be possible and/or is already the case, no?

Without the collation the order can be wrong in some locales. It is not about speed, IIRC.

Can it be made to be correct without the collation column? Surely the collation column got created out of the same data the current property's column stores? Meaning that collation data can be made in alloca() buffers on the fly (which is of course going to be a lot slower).

Kind regards.

References:
- [Tracker] Disk usage optimization
  - From: Philip Van Hoof
- Re: [Tracker] Disk usage optimization
  - From: Martyn Russell
- Re: [Tracker] Disk usage optimization
  - From: Ivan Frade
- Re: [Tracker] Disk usage optimization
  - From: Philip Van Hoof
- Re: [Tracker] Disk usage optimization
  - From: Ivan Frade

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]