Re: [Tracker] Running tracker on an Ubuntu server box?



On Tue, 2009-04-21 at 17:30 +0200, tigerf wrote:
Philip,
thanks for the insights.

Philip Van Hoof schrieb:

[CUT]


 - We have internal caching, too. Direct access to the database will
   often simply yield incorrect and inconsistent results.

Well, that alone makes this idea obsolete. Results have to be reliable.

Just a thought: isn't SQLite itself already offering table caching?

The caching is not done much (if at all) at the level of pure storage.
Rather at the level of 'what we are going to do in a near future'.

For example we might be grouping together scans of several files, or we
might group together triples of a resource, or of several resources, in
a cache. For example the update_buffer in tracker-data-update.c if you
are interested in learning about the implementation itself.

I have no experience with SQLite, but in most other DBs one can specify
tables to be kept in RAM or parametrize behaviours. Multi-level caching
strategies (harddisk, filesystem, back-, middle- and front-end) can slow
down the overall performance.

True. We are also planning to fine-tune SQLite's cache strategy to get
rid of the pausing/unpausing of processes so that another process can
gain access to transactions more easily. This is quite technical to
explain in a single E-mail. But if you are interested then I can explain
you the technical details. Just ask. We're here to answer people who are
interested in getting kneedeep into Tracker's code.

The reason why we have longstanding transactions is because this
aggressively improves INSERT performance of SQLite. Differently put, if
we wouldn't do this, then SQLite would be aggressively slow.

Instead we provide you with a SPARQL query interface:

 - Currently over DBus

I've googled a while for a working DBus <-> PHP interface with no that
much promising results. That would complicate the solution quite a bit,
from what I understood.
---
Old school approach:
Isn't there a DBus- (or even better: tracker-) commandline application
available, which could be fed and executed by PHP and whose results
(=matching filenames found) are then parsed, beautified and sent to the
end-user.

system ("tracker-sparql -q \"SELECT ?o WHERE { ?o a nmo:Email }\"");

Not really elegant, but doable and rather fast to implement and debug.
If such a mechanism exists, I could start coding right away because
what's "behind" this tool doesn't matter much anymore, as long as it
delivers the right results.

... if I had tracker working on my Ubuntu server.

My application won't have more than 10 concurrent users at a time,
currently, btw.

 - We might someday provide a thin API to call SPARQL queries avoiding
   DBus involvement. This would improve performance a very small amount,
   mostly because of DBus marshaling not having to be performed.

   DBus is indeed not a very good IPC to transfer a lot of data between
   processes. Such a thin API would be most beneficial for use-cases
   where you do queries that'll yield large result sets. For which atm
   Tracker is in general not designed to be honest (Tracker is aiming
   more towards desktop usage, which means fetching pages of results per
   round trip, instead of the entire result-set in one round trip)

 - We have a mechanism in place that'll tell you about changes that
   might (or will) require your client-side to synchronize itself.

My application would be stateless, more or less.

Instead of a SQL schema we provide you with Nepomuk as ontology to use
in your SPARQL queries. We have also added a few specialized ontologies
and we have plans to make it possible for application developers to
extend Nepomuk's ontologies with their application specific custom
ontologies.

Based on my experience (...read: age), I'm a bit reluctant in
introducing brand-new, cutting-edge technologies in productive
environments. Apologies for beeing somewhat old fashioned here.

Nepomuk nor SPARQL are cutting-edge technologies. SPARQL is a w3
recommended query language for RDF. Nepomuk is going to be used by both
KDE and GNOME desktops as primary RDF schema.

You better get used to it ;-)


If a reader of this has a better matching solution for my problem, I'd
appreciate your hints. Tracker was clearly intended and designed more
towards a personal desktop search application, my needs don't match this
scenario 100%, I know.

Tracker is being redesigned to not only serve as desktop search
application, but also to implement efficient storage & querying of any
kind of RDF metadata.

It allows you to UPDATE/INSERT and DELETE metadata, and it'll find &
extract certain metadata from file resources if its indexer is enabled.

How are all these NAT applications out there solving this problem? They
do need an index, otherwise terabytes of documents are more or less lost
after a while.

No idea.

Those usually aren't free software and specifically aimed towards that
use-case.

Tracker is designed more for single-user on a desktop or mobile device.

Different market area, actually. But any contribution that improves
Tracker's core use-case or even adds another core use-case, is welcomed.


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]