Re: [Tracker] Running tracker on an Ubuntu server box?



Hi Martyn,

thx for the quick response.

Martyn Russell schrieb:
tigerf wrote:
Hello from Germany,

Hi there :)

I'm looking for a way how to use tracker on an Ubuntu 8.10 server
edition (!) to index huge numbers (60.000+) of .doc and .pdf-files.

I should mention, that for Tracker 0.6.x, there is a hard limitation on
the amount of data you can store for full text searching. We currently
use QDBM for the index and it has a 2Gb file size limit.

High time to the overcome this limitation in times where 2 TB drives
cost less than 250 Euros.

This means that once you get to a sizeable index, it will not index any
further. This has recently lead to this "Can not index word" error we
have been seeing in bug reports.

In the 0.7 branch which we are working on in parallel, we are using
SQLite instead of QDBM. This should extend the possibilities here not to
mention add partial match searching (i.e. "foo*" finds "foobar") which
is another feature missing.

Unfortunately, I couldn't say if 60k files would actually be reaching
the limit or not because it really does depend on how many words are in
those files. My estimate is that you wouldn't be far off the limit with
that many files though. Perhaps others which have had this QDBM error
can comment on how many files they have to give some rough estimation here.

Thanks for mentioning this. It doesn't hurt too much because for the
moment it's not more than 13.000 files, but the solution needs potential
for much more during the server's lifetime of 5-20 years.

I like the SQLite idea, because PHP offers a proven interface to SQLite,
and SQL is known nowadays. Is it thinkable that I query the database via
PHP in a read-only manner while the tracker deamon is updating it "from
the other side"?

This would shortcut the the overhead of accessing tracker's index quite
a bit.

Is there somewhere a how-to or is the whole idea simply unrealistic?

I don't think so, Tracker just might not be able to cope with the
volumes for now. Of course, the said limit I mentioned above is per
user, if you are doing this on a multiple user level, things get
trickier but the QDBM limit is less of a problem.

When will be (or: is) 0.7 reliable enough for tests? I'm not in a hurry
here beacuse I have this half-baked solution with find, as I mentioned.

I like the tracker -> SQLite -> PHP/Apache -> webbrowser -> client
filesystem approach, because it is very flexible and modular. It would
allow very nice, platform independant document retrival solutions with
little effort, once it runs. ;)

As I mentioned, I'm a linux noob but I have some C/SQL/PHP/HTML/JS
knowledege. So the first step for me would be to install tracker and
make it indexing on my ubuntu server. Any idea where to start?

Regards
Tiger



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]