Re: [Tracker] Running tracker on an Ubuntu server box?



Philip,
thanks for the insights.

Philip Van Hoof schrieb:
On Tue, 2009-04-21 at 14:20 +0200, tigerf wrote:

I like the SQLite idea, because PHP offers a proven interface to SQLite,
and SQL is known nowadays. Is it thinkable that I query the database via
PHP in a read-only manner while the tracker deamon is updating it "from
the other side"?

No, this is unthinkable. 

...too bad, but I understand your arguments below.

The reasons are:

 - We have a decomposed schema, this isn't at all what you expect if you
   are into normalized database schemas. Your SQL queries will be
   insanely hideously difficult

OK, makes sense to me, even if my db schemas often aren't full
normalized for performance reasons either ;)

 - We have longstanding transactions. This means that your process will
   very often see its sqlite3_next() yield SQLITE_BUSY. In fact, it'll
   yield that results the majority of the times. This means that your
   webserver (if you exec the sqlite API from API in process with
   apache) will be constantly waiting for us to release the transaction.
   And we hold transactions for as much time as possible)

 - We have internal caching, too. Direct access to the database will
   often simply yield incorrect and inconsistent results.

Well, that alone makes this idea obsolete. Results have to be reliable.

Just a thought: isn't SQLite itself already offering table caching? I
have no experience with SQLite, but in most other DBs one can specify
tables to be kept in RAM or parametrize behaviours. Multi-level caching
strategies (harddisk, filesystem, back-, middle- and front-end) can slow
down the overall performance.

The reason why we have longstanding transactions is because this
aggressively improves INSERT performance of SQLite. Differently put, if
we wouldn't do this, then SQLite would be aggressively slow.

Instead we provide you with a SPARQL query interface:

 - Currently over DBus

I've googled a while for a working DBus <-> PHP interface with no that
much promising results. That would complicate the solution quite a bit,
from what I understood.
---
Old school approach:
Isn't there a DBus- (or even better: tracker-) commandline application
available, which could be fed and executed by PHP and whose results
(=matching filenames found) are then parsed, beautified and sent to the
end-user.

Not really elegant, but doable and rather fast to implement and debug.
If such a mechanism exists, I could start coding right away because
what's "behind" this tool doesn't matter much anymore, as long as it
delivers the right results.

... if I had tracker working on my Ubuntu server.

My application won't have more than 10 concurrent users at a time,
currently, btw.

 - We might someday provide a thin API to call SPARQL queries avoiding
   DBus involvement. This would improve performance a very small amount,
   mostly because of DBus marshaling not having to be performed.

   DBus is indeed not a very good IPC to transfer a lot of data between
   processes. Such a thin API would be most beneficial for use-cases
   where you do queries that'll yield large result sets. For which atm
   Tracker is in general not designed to be honest (Tracker is aiming
   more towards desktop usage, which means fetching pages of results per
   round trip, instead of the entire result-set in one round trip)

 - We have a mechanism in place that'll tell you about changes that
   might (or will) require your client-side to synchronize itself.

My application would be stateless, more or less.

Instead of a SQL schema we provide you with Nepomuk as ontology to use
in your SPARQL queries. We have also added a few specialized ontologies
and we have plans to make it possible for application developers to
extend Nepomuk's ontologies with their application specific custom
ontologies.

Based on my experience (...read: age), I'm a bit reluctant in
introducing brand-new, cutting-edge technologies in productive
environments. Apologies for beeing somewhat old fashioned here.

--

If a reader of this has a better matching solution for my problem, I'd
appreciate your hints. Tracker was clearly intended and designed more
towards a personal desktop search application, my needs don't match this
scenario 100%, I know.

How are all these NAT applications out there solving this problem? They
do need an index, otherwise terabytes of documents are more or less lost
after a while.

Thanks
Tiger



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]